9.13. Examples: Reading lines of a file that contain the searched value in a specific column

Print the line(s) of file3.csv and file4.txt that have value 11 in the third column.

Space-separated file Comma-separated file
$ awk '$3 == "11" {print $1,$2}' file4.txt
D11144030 Group3
C11137159 Group3
$ awk -F',' '$3 == "11" {print $1,$2}' file3.csv
C11137443 Group3

Print the first and second columns of those lines in file3.csv and file4.txt that have value 11 in the third column.

Space-separated file Comma-separated file
$ awk '$3 == "11" {print $1,$2}' file4.txt
D11144030 Group3
C11137159 Group3
$ awk -F',' '$3 == "11" {print $1,$2}' file3.csv
C11137443 Group3

Print the line(s) of file1.csv and file2.txt that have value "Group1" (including the colons "") in the second column.

Space-separated file Comma-separated file
$ awk '$2 == "\"Group1\""' file2.txt
"B11137244" "Group1" "450" "555"
"B11154534" "Group1" "456" "456"
"B11144100" "Group1" "450" "886"
"B11137244" "Group1" "450" "456"
"B11134987" "Group1" "900" "MissingData"
"B11144345" "Group1" "900" "776"
"B11110676" "Group1" "900" "10"
"B11154532" "Group1" "456" "886"
"B11131605" "Group1" "456" "MissingData"
"B11137784" "Group1" "900" "436"
"B11156098" "Group1" "500" "886"
"B11133232" "Group1" "500" "MissingData"
"B11152577" "Group1" "900" "756"
"B11156098" "Group1" "456" "886"
"B11133232" "Group1" "456" "MissingData"
$ awk -F',' '$2 == "\"Group1\""' file1.csv
"B33199522","Group1","0",""
"B11137879","Group1","0",""
"B11153927","Group1","0",""
"B11177806","Group1","MD",""
"B11152799","Group1","0",""
"B11154358","Group1","0",""
"B11110925","Group1","0",""
"B11137879","Group1","0",""
"B11110603","Group1","0",""
"B11110927","Group1","0",""
"B11147712","Group1","0",""
"B11157974","Group1","0",""
"B33199522","Group1","0",""

Print the first column of those lines in file1.csv and file2.txt that have value "Group1" (including the colons "") in the second column.

Space-separated file Comma-separated file
$ awk '$2 == "\"Group1\"" {print $1}' file2.txt
"B11137244"
"B11154534"
"B11144100"
"B11137244"
"B11134987"
"B11144345"
"B11110676"
"B11154532"
"B11131605"
"B11137784"
"B11156098"
"B11133232"
"B11152577"
"B11156098"
"B11133232"
$ awk -F',' '$2 == "\"Group1\"" {print $1}' file1.csv
"B33199522"
"B11137879"
"B11153927"
"B11177806"
"B11152799"
"B11154358"
"B11110925"
"B11137879"
"B11110603"
"B11110927"
"B11147712"
"B11157974"
"B33199522"

Print the line(s) of file1.csv and file2.txt that do not have value "Group1" (including the colons "") in the second column.

Space-separated file Comma-separated file
$ awk '$2 != "\"Group1\""' file2.txt
"AnonymizedID" "SubjectGroup" "TEST1" "TEST2"
"B11130912" "Group2b" "900" "MissingData"
"B12226566" "Group2b" "450" "MissingData"
"C11137159" "Group3" "MissingData" "MissingData"
"B11156453" "Group4" "456" "2"
"C11138929" "Group2b" "2" "MissingData"
"B11155267" "Group3" "900" "10"
"B11137120" "Group2b" "450" "456"
"B33191224" "Group2b" "450" "776"
"B11155267" "Group3" "900" "10"
"C11138999" "Group2b" "900" "MissingData"
"B11135292" "Group3" "MissingData" "MissingData"
"C11138912" "Group2b" "900" "MissingData"
"B11150911" "Group2b" "900" "117"
$ awk -F',' '$2 != "\"Group1\""' file1.csv
"Anonymized ID","Subject Group","HASCONDITION","CONDITION"
"B33199603","Group3","0",""
"B11144410","Group2 b","0",""
"B11110455","Group2 b","0",""
"B11135291","Group3","0",""
"B11177579","Group2 b","0",""
"B11157958","Group3","0",""
"B11110690","Group3","0",""
"B11135291","Group3","9","mTBI"
"B11135072","MISSING","0",""
"B33199603","Group3","0",""
"B33191224","Group2 b","0",""
"B11131290","Group2 b","0",""
"B33191224","Group2 b","0",""
"B11141503","Group3","0",""
"C11137159","Group3","9","mTBI"

Print the first column of those lines in file1.csv and file2.txt that do not have value "Group1" (including the colons "") in the second column.

Space-separated file Comma-separated file
$ awk '$2 != "\"Group1\"" {print $1}' file2.txt
"AnonymizedID"
"B11130912"
"B12226566"
"C11137159"
"B11156453"
"C11138929"
"B11155267"
"B11137120"
"B33191224"
"B11155267"
"C11138999"
"B11135292"
"C11138912"
"B11150911"
$ awk -F',' '$2 != "\"Group1\"" {print $1}' file1.csv
"Anonymized ID"
"B33199603"
"B11144410"
"B11110455"
"B11135291"
"B11177579"
"B11157958"
"B11110690"
"B11135291"
"B11135072"
"B33199603"
"B33191224"
"B11131290"
"B33191224"
"B11141503"
"C11137159"

Print the ID (first column) of those subjects in file3.csv and file4.txt that have age (third column) greater than 20.

Space-separated file Comma-separated file
$ awk '$3 > "20" {print $1}' file4.txt
AnonymizedID
B11108326
B11110893
B11119909
B11119903
C11131039
C11133100
C11135566
C11137439
C11137544
C11137123
C11138150
C11138797
C11138184
C11138122
C11138122
C11138192
B12226507
B12226546
$ awk -F',' '$3 > "20" {print $1}' file3.csv
Anonymized ID
C11138122
C11138192
B12226507
B12226546
C11138122
C11138184
C11138797
C11138152
C11138150
C11131039
C11135566
B11119903
C11137544
C11137123
C11137439
C11133100
B11108399
B11108326
B11119909
B11110893

Print the ID (first column) of those subjects in file3.csv and file4.txt that have age (third column) less than 20.

Space-separated file Comma-separated file
$ awk '$3 < "20" {print $1}' file4.txt
D11144030
D11144030
C11137159
C11137159
C11137167
C11137167
C11137439
C11137443
C11138152
$ awk -F',' '$3 < "20" {print $1}' file3.csv
C11137167
C11137159
C11137167
C11137159
C11137443
C11137439
D11144030

Print the line(s) of file3.csv that have value "Group1" or "Group3" in the second column. When there is more than one rule, the easiest and more organized way to run the command is to put all the rules in a text file and call that text file using the flag -f. In the following example, patterns3.txt contains the rules to filter the lines that are to be printed ($2 == "Group1" and $2 == "Group3").

$ cat patterns3.txt
$2 == "Group1"
$2 == "Group3"

$ awk -F',' -f patterns3.txt file3.csv
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11137544,Group1,21
C11137443,Group3,11
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11110893,Group1,28

$ cat patterns4.txt
$2 == "Group1" || $2 == "Group3"

$ awk -F',' -f test.txt file3.csv
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11137544,Group1,21
C11137443,Group3,11
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11110893,Group1,28

In this example, we obtained the same result using either patterns3.txt or patterns4.txt. When you want to select any line that contains any pattern in a list of patterns, you can either put each pattern in a different line of the text file or use the or (||) symbol to concatenate all the patterns or rules.

Print the value of one column for those lines that have specific values in other columns Print the ID (first column) of those subjects in file3.csv that belong to "Group1" (second column), and have age (third column) greater than 60, or that belong to "Group3" (second column) and have age (third column) less than 20.

In this example, we want to print any line that contains one of the following rules:

So, the content of our pattern file must be:
$ cat patterns5.txt
$2 == "Group1" && $3 > 60
$2 == "Group3" && $3 < 20

To print all the columns from the selected lines:
$ awk -F',' -f patterns5.txt file3.csv
B12226507,Group1,68
B12226546,Group1,67
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11137443,Group3,11
C11137439,Group3,15
D11144030,Group3,13

To print the first column:
$ awk -F',' -f patterns5.txt file3.csv | awk -F',' '{print $1}'
B12226507
B12226546
C11137167
C11137159
C11137167
C11137159
C11137443
C11137439
D11144030

The following page contains a summary of other patterns that can be included in a pattern file: https://ss64.com/bash/awk.html