9.13. Examples: Reading lines of a file that contain the searched value in a specific column

Print the line(s) of file3.csv and file4.txt that have value 11 in the third column.

Space-separated file	Comma-separated file
$ awk '$3 == "11" {print $1,$2}' file4.txt D11144030 Group3 C11137159 Group3	$ awk -F',' '$3 == "11" {print $1,$2}' file3.csv C11137443 Group3

Print the first and second columns of those lines in file3.csv and file4.txt that have value 11 in the third column.

Space-separated file	Comma-separated file
$ awk '$3 == "11" {print $1,$2}' file4.txt D11144030 Group3 C11137159 Group3	$ awk -F',' '$3 == "11" {print $1,$2}' file3.csv C11137443 Group3

Print the line(s) of file1.csv and file2.txt that have value "Group1" (including the colons "") in the second column.

Space-separated file	Comma-separated file
$ awk '$2 == "\"Group1\""' file2.txt "B11137244" "Group1" "450" "555" "B11154534" "Group1" "456" "456" "B11144100" "Group1" "450" "886" "B11137244" "Group1" "450" "456" "B11134987" "Group1" "900" "MissingData" "B11144345" "Group1" "900" "776" "B11110676" "Group1" "900" "10" "B11154532" "Group1" "456" "886" "B11131605" "Group1" "456" "MissingData" "B11137784" "Group1" "900" "436" "B11156098" "Group1" "500" "886" "B11133232" "Group1" "500" "MissingData" "B11152577" "Group1" "900" "756" "B11156098" "Group1" "456" "886" "B11133232" "Group1" "456" "MissingData"	$ awk -F',' '$2 == "\"Group1\""' file1.csv "B33199522","Group1","0","" "B11137879","Group1","0","" "B11153927","Group1","0","" "B11177806","Group1","MD","" "B11152799","Group1","0","" "B11154358","Group1","0","" "B11110925","Group1","0","" "B11137879","Group1","0","" "B11110603","Group1","0","" "B11110927","Group1","0","" "B11147712","Group1","0","" "B11157974","Group1","0","" "B33199522","Group1","0",""

Space-separated file

Comma-separated file

$ awk '$2 == "\"Group1\""' file2.txt
"B11137244" "Group1" "450" "555"
"B11154534" "Group1" "456" "456"
"B11144100" "Group1" "450" "886"
"B11137244" "Group1" "450" "456"
"B11134987" "Group1" "900" "MissingData"
"B11144345" "Group1" "900" "776"
"B11110676" "Group1" "900" "10"
"B11154532" "Group1" "456" "886"
"B11131605" "Group1" "456" "MissingData"
"B11137784" "Group1" "900" "436"
"B11156098" "Group1" "500" "886"
"B11133232" "Group1" "500" "MissingData"
"B11152577" "Group1" "900" "756"
"B11156098" "Group1" "456" "886"
"B11133232" "Group1" "456" "MissingData"

$ awk -F',' '$2 == "\"Group1\""' file1.csv
"B33199522","Group1","0",""
"B11137879","Group1","0",""
"B11153927","Group1","0",""
"B11177806","Group1","MD",""
"B11152799","Group1","0",""
"B11154358","Group1","0",""
"B11110925","Group1","0",""
"B11137879","Group1","0",""
"B11110603","Group1","0",""
"B11110927","Group1","0",""
"B11147712","Group1","0",""
"B11157974","Group1","0",""
"B33199522","Group1","0",""

Print the first column of those lines in file1.csv and file2.txt that have value "Group1" (including the colons "") in the second column.

Space-separated file	Comma-separated file
$ awk '$2 == "\"Group1\"" {print $1}' file2.txt "B11137244" "B11154534" "B11144100" "B11137244" "B11134987" "B11144345" "B11110676" "B11154532" "B11131605" "B11137784" "B11156098" "B11133232" "B11152577" "B11156098" "B11133232"	$ awk -F',' '$2 == "\"Group1\"" {print $1}' file1.csv "B33199522" "B11137879" "B11153927" "B11177806" "B11152799" "B11154358" "B11110925" "B11137879" "B11110603" "B11110927" "B11147712" "B11157974" "B33199522"

Print the line(s) of file1.csv and file2.txt that do not have value "Group1" (including the colons "") in the second column.

Space-separated file	Comma-separated file
$ awk '$2 != "\"Group1\""' file2.txt "AnonymizedID" "SubjectGroup" "TEST1" "TEST2" "B11130912" "Group2b" "900" "MissingData" "B12226566" "Group2b" "450" "MissingData" "C11137159" "Group3" "MissingData" "MissingData" "B11156453" "Group4" "456" "2" "C11138929" "Group2b" "2" "MissingData" "B11155267" "Group3" "900" "10" "B11137120" "Group2b" "450" "456" "B33191224" "Group2b" "450" "776" "B11155267" "Group3" "900" "10" "C11138999" "Group2b" "900" "MissingData" "B11135292" "Group3" "MissingData" "MissingData" "C11138912" "Group2b" "900" "MissingData" "B11150911" "Group2b" "900" "117"	$ awk -F',' '$2 != "\"Group1\""' file1.csv "Anonymized ID","Subject Group","HASCONDITION","CONDITION" "B33199603","Group3","0","" "B11144410","Group2 b","0","" "B11110455","Group2 b","0","" "B11135291","Group3","0","" "B11177579","Group2 b","0","" "B11157958","Group3","0","" "B11110690","Group3","0","" "B11135291","Group3","9","mTBI" "B11135072","MISSING","0","" "B33199603","Group3","0","" "B33191224","Group2 b","0","" "B11131290","Group2 b","0","" "B33191224","Group2 b","0","" "B11141503","Group3","0","" "C11137159","Group3","9","mTBI"

Space-separated file

Comma-separated file

$ awk '$2 != "\"Group1\""' file2.txt
"AnonymizedID" "SubjectGroup" "TEST1" "TEST2"
"B11130912" "Group2b" "900" "MissingData"
"B12226566" "Group2b" "450" "MissingData"
"C11137159" "Group3" "MissingData" "MissingData"
"B11156453" "Group4" "456" "2"
"C11138929" "Group2b" "2" "MissingData"
"B11155267" "Group3" "900" "10"
"B11137120" "Group2b" "450" "456"
"B33191224" "Group2b" "450" "776"
"B11155267" "Group3" "900" "10"
"C11138999" "Group2b" "900" "MissingData"
"B11135292" "Group3" "MissingData" "MissingData"
"C11138912" "Group2b" "900" "MissingData"
"B11150911" "Group2b" "900" "117"

$ awk -F',' '$2 != "\"Group1\""' file1.csv
"Anonymized ID","Subject Group","HASCONDITION","CONDITION"
"B33199603","Group3","0",""
"B11144410","Group2 b","0",""
"B11110455","Group2 b","0",""
"B11135291","Group3","0",""
"B11177579","Group2 b","0",""
"B11157958","Group3","0",""
"B11110690","Group3","0",""
"B11135291","Group3","9","mTBI"
"B11135072","MISSING","0",""
"B33199603","Group3","0",""
"B33191224","Group2 b","0",""
"B11131290","Group2 b","0",""
"B33191224","Group2 b","0",""
"B11141503","Group3","0",""
"C11137159","Group3","9","mTBI"

Print the first column of those lines in file1.csv and file2.txt that do not have value "Group1" (including the colons "") in the second column.

Space-separated file	Comma-separated file
$ awk '$2 != "\"Group1\"" {print $1}' file2.txt "AnonymizedID" "B11130912" "B12226566" "C11137159" "B11156453" "C11138929" "B11155267" "B11137120" "B33191224" "B11155267" "C11138999" "B11135292" "C11138912" "B11150911"	$ awk -F',' '$2 != "\"Group1\"" {print $1}' file1.csv "Anonymized ID" "B33199603" "B11144410" "B11110455" "B11135291" "B11177579" "B11157958" "B11110690" "B11135291" "B11135072" "B33199603" "B33191224" "B11131290" "B33191224" "B11141503" "C11137159"

Print the ID (first column) of those subjects in file3.csv and file4.txt that have age (third column) greater than 20.

Space-separated file	Comma-separated file
$ awk '$3 > "20" {print $1}' file4.txt AnonymizedID B11108326 B11110893 B11119909 B11119903 C11131039 C11133100 C11135566 C11137439 C11137544 C11137123 C11138150 C11138797 C11138184 C11138122 C11138122 C11138192 B12226507 B12226546	$ awk -F',' '$3 > "20" {print $1}' file3.csv Anonymized ID C11138122 C11138192 B12226507 B12226546 C11138122 C11138184 C11138797 C11138152 C11138150 C11131039 C11135566 B11119903 C11137544 C11137123 C11137439 C11133100 B11108399 B11108326 B11119909 B11110893

Print the ID (first column) of those subjects in file3.csv and file4.txt that have age (third column) less than 20.

Space-separated file	Comma-separated file
$ awk '$3 < "20" {print $1}' file4.txt D11144030 D11144030 C11137159 C11137159 C11137167 C11137167 C11137439 C11137443 C11138152	$ awk -F',' '$3 < "20" {print $1}' file3.csv C11137167 C11137159 C11137167 C11137159 C11137443 C11137439 D11144030

Print the line(s) of file3.csv that have value "Group1" or "Group3" in the second column. When there is more than one rule, the easiest and more organized way to run the command is to put all the rules in a text file and call that text file using the flag -f. In the following example, patterns3.txt contains the rules to filter the lines that are to be printed ($2 == "Group1" and $2 == "Group3").

$ cat patterns3.txt
$2 == "Group1"
$2 == "Group3"

$ awk -F',' -f patterns3.txt file3.csv
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11137544,Group1,21
C11137443,Group3,11
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11110893,Group1,28

$ cat patterns4.txt
$2 == "Group1" || $2 == "Group3"

$ awk -F',' -f test.txt file3.csv
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11137544,Group1,21
C11137443,Group3,11
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11110893,Group1,28

In this example, we obtained the same result using either patterns3.txt or patterns4.txt. When you want to select any line that contains any pattern in a list of patterns, you can either put each pattern in a different line of the text file or use the or (||) symbol to concatenate all the patterns or rules.

Print the value of one column for those lines that have specific values in other columns Print the ID (first column) of those subjects in file3.csv that belong to "Group1" (second column), and have age (third column) greater than 60, or that belong to "Group3" (second column) and have age (third column) less than 20.

In this example, we want to print any line that contains one of the following rules:

Second column equals Group1 and third greater than 60: $2 == "Group1" && $3 > 60
Second column equals and Group3 third less than 20: $2 == "Group3" && $3 < 20

So, the content of our pattern file must be:
$ cat patterns5.txt
$2 == "Group1" && $3 > 60
$2 == "Group3" && $3 < 20

To print all the columns from the selected lines:
$ awk -F',' -f patterns5.txt file3.csv
B12226507,Group1,68
B12226546,Group1,67
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11137443,Group3,11
C11137439,Group3,15
D11144030,Group3,13

To print the first column:
$ awk -F',' -f patterns5.txt file3.csv | awk -F',' '{print $1}'
B12226507
B12226546
C11137167
C11137159
C11137167
C11137159
C11137443
C11137439
D11144030

The following page contains a summary of other patterns that can be included in a pattern file: https://ss64.com/bash/awk.html