9.12. Examples: Reading specific columns or lines in a file or a list of files, that contain a searched value

Print the line(s) of file3.csv that contain the string C11137439

Using awk:
$ awk '/C11137439/' file3.csv
C11137439,Group3,79
C11137439,Group3,15

Using grep:
$ grep C11137439 file3.csv
C11137439,Group3,79
C11137439,Group3,15

Print the line(s) of file3.csv that contain the string AAA (which is stored in a variable) $ VAR=C11137439

Using awk:
$ awk -v var=$VAR '$0~var' file3.csv
C11137439,Group3,79
C11137439,Group3,15

Using grep:
$ grep $VAR file3.csv
C11137439,Group3,79
C11137439,Group3,15

Print the line(s) of file3.csv that contain the strings C11137439 or B11119909 To search for more than one expression, add the flag -e in front of each expression.

$ grep -e "C11137439" -e "B11119909" file3.csv
C11137439,Group3,79
C11137439,Group3,15
B11119909,Group2 b,61

Print the line(s) of file3.csv that contain the strings C11137439, B11119909 or B11110893

$ grep -e "C11137439" -e "B11119909" -e "B11110893" file3.csv
C11137439,Group3,79
C11137439,Group3,15
B11119909,Group2 b,61
B11110893,Group1,28

Print the line(s) of file3.csv that contain the strings C11137439 or B11119909 (which are stored in a file called patterns.txt)

$ cat patterns.txt
C11137439
B11119909

$ grep -f patterns.txt file3.csv
C11137439,Group3,79
C11137439,Group3,15
B11119909,Group2 b,61

Print the line(s) of file3.csv that do not contain the string C11137439

$ grep -v "C11137439" file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28

Print the line(s) of any file in the current directory that contain the string C11137439

$ awk '/C11137439/' *
C11137439,Group3,79
C11137439,Group3,15
C11137439 Group3 15
C11137439 Group3 79
C11137439

$ grep C11137439 *
file3.csv:C11137439,Group3,79
file3.csv:C11137439,Group3,15
file4.txt:C11137439 Group3 15
file4.txt:C11137439 Group3 79
patterns.txt:C11137439

To omit the file names in the grep option, use the flag -h:
$ grep -h C11137439 *
C11137439,Group3,79
C11137439,Group3,15
C11137439 Group3 15
C11137439 Group3 79
C11137439

If you wanted to include the line number for each match, you can add the flag -n:
$ grep -n "C11137439" *
file3.csv:21:C11137439,Group3,79
file3.csv:22:C11137439,Group3,15
file4.txt:15:C11137439 Group3 15
file4.txt:16:C11137439 Group3 79
patterns.txt:1:C11137439

$ grep -h -n "C11137439" *
21:C11137439,Group3,79
22:C11137439,Group3,15
15:C11137439 Group3 15
16:C11137439 Group3 79
1:C11137439

If you want to show only the first three matches, you can add the flag -m with the parameter 3 (to print only three lines):
$ grep -m 3 "C11137439" *
file3.csv:C11137439,Group3,79
file3.csv:C11137439,Group3,15
file4.txt:C11137439 Group3 15

$ grep -m 3 "C11137439" * -h
C11137439,Group3,79
C11137439,Group3,15
C11137439 Group3 15

Print the number of lines in each file of the current directory that contain the string C11137439.

$ grep -c "C11137439" *
file1.csv:0
file1_reordered.csv:0
file2.txt:0
file3.csv:2
file4.txt:2
patterns.txt:1
patterns2.txt:0

Print only the name of the files in the current directory that contain the string C11137439.

$ grep -l "C11137439" *
file3.csv
file4.txt
patterns.txt

Print the line(s) of any file in the current directory that contain the string C11137439, each line followed by the next three lines in the corresponding file (if there is three or more lines after the matched one).

$ grep -A 3 "C11137439" *
file3.csv:C11137439,Group3,79
file3.csv:C11137439,Group3,15

file3.csv-C11133100,Group1,23
file3.csv-D11144030,Group3,13
file3.csv-B11108399,Group1,23
--
file4.txt:C11137439 Group3 15
file4.txt:C11137439 Group3 79

file4.txt-C11137443 Group3 15
file4.txt-C11137544 Group1 22
file4.txt-C11137123 Group2 68
--
patterns.txt:C11137439

patterns.txt-B11119909

Print the line(s) of any file in the current directory that contain the string C11137439, each line preceded by the previous three lines in the corresponding file (if there is three or more lines before the matched one).

$ grep -B 3 "C11137439" *
file3.csv-C11137544,Group1,21
file3.csv-C11137443,Group3,11
file3.csv-C11137123,Group2 b,69
file3.csv:C11137439,Group3,79
file3.csv:C11137439,Group3,15

--
file4.txt-C11137159 Group3 12
file4.txt-C11137167 Group3 14
file4.txt-C11137167 Group3 16
file4.txt:C11137439 Group3 15
file4.txt:C11137439 Group3 79

--
patterns.txt:C11137439

Print the line(s) of any file in the current directory that contain the string C11137439, each line preceded by the previous three lines and followed by the next three lines in the corresponding file (if there is three or more lines before/after the matched one).

$ grep -C 3 "C11137439" *
file3.csv-C11137544,Group1,21
file3.csv-C11137443,Group3,11
file3.csv-C11137123,Group2 b,69
file3.csv:C11137439,Group3,79
file3.csv:C11137439,Group3,15

file3.csv-C11133100,Group1,23
file3.csv-D11144030,Group3,13
file3.csv-B11108399,Group1,23
--
--
file4.txt-C11137159 Group3 12
file4.txt-C11137167 Group3 14
file4.txt-C11137167 Group3 16
file4.txt:C11137439 Group3 15
file4.txt:C11137439 Group3 79

file4.txt-C11137443 Group3 15
file4.txt-C11137544 Group1 22
file4.txt-C11137123 Group2 68
--
--
patterns.txt:C11137439
patterns.txt-B11119909

$ grep -A 3 -B 3 "C11137439" *
file3.csv-C11137544,Group1,21
file3.csv-C11137443,Group3,11
file3.csv-C11137123,Group2 b,69
file3.csv:C11137439,Group3,79
file3.csv:C11137439,Group3,15

file3.csv-C11133100,Group1,23
file3.csv-D11144030,Group3,13
file3.csv-B11108399,Group1,23
--
--
file4.txt-C11137159 Group3 12
file4.txt-C11137167 Group3 14
file4.txt-C11137167 Group3 16
file4.txt:C11137439 Group3 15
file4.txt:C11137439 Group3 79

file4.txt-C11137443 Group3 15
file4.txt-C11137544 Group1 22
file4.txt-C11137123 Group2 68
--
--
patterns.txt:C11137439
patterns.txt-B11119909

Print the line(s) of any file in the current directory that contain the string "B11133232" (including the quotation marks ").

$ awk '/"B11133232"/' *
"B11133232" "Group1" "500" "MissingData"
"B11133232" "Group1" "456" "MissingData"

$ grep \"B11133232\" *
file2.txt:"B11133232" "Group1" "500" "MissingData"
file2.txt:"B11133232" "Group1" "456" "MissingData"

With the grep command, you must include the backslash (\) before the quotation marks (\") if you want to search only those lines that contain the string that includes the quotation marks. Otherwise, it will interpret the search value as B11133232 and not "B11133232".

Print the line(s) of file3.csv that contain the values B11108399 or B11108326. This search has the following rules: We're looking for words that start with the following seven characters: B111083, the 8th character can be a 9 or a 2, and the last character can be a 9 or a 6. So, in the grep command, we replace the 8th character by [92] to indicate that it can have any of those two values, and the last character by [96] to indicate that it can have value 9 or 6.

$ grep B111083[92][96] file3.csv
B11108399,Group1,23
B11108326,Group1,59

Print the line(s) of file3.csv that contain the values Group1 or Group2.

$ grep Group[12] file3.csv
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137123,Group2 b,69
C11133100,Group1,23
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28

With the grep command, you must include the backslash (\) before the quotation marks (\") if you want to search only those lines that contain the string that includes the quotation marks. Otherwise, it will interpret the search value as B11133232 and not "B11133232".

Print the first column of file2.txt and file3.csv for those lines that contain the values Group1 or Group2. Remember that you have to use the flag -F',' with the command awk when the columns of the file are separated by commas and not spaces.

Space-separated file Comma-separated file
$ grep Group[12] file2.txt | awk '{print $1}'
"B11130912"
"B11137244"
"B11154534"
"B11144100"
"B11137244"
"B12226566"
"B11134987"
"B11144345"
"B11110676"
"C11138929"
"B11154532"
"B11137120"
"B33191224"
"C11138999"
"B11131605"
"B11137784"
"B11156098"
"B11133232"
"C11138912"
"B11150911"
"B11152577"
"B11156098"
"B11133232"
$ grep Group[12] file3.csv | awk -F',' '{print $1}'
C11138192
B12226507
B12226546
C11138122
C11138184
C11138797
C11138152
C11138150
C11131039
C11135566
B11119903
C11137544
C11137123
C11133100
B11108399
B11108326
B11119909
B11110893

Print the first and second columns of file2.txt and file3.csv for those lines that contain the values Group1 or Group2.

Space-separated file Comma-separated file
$ grep Group[12] file2.txt | awk '{print $1,$2}'
"B11130912" "Group2b"
"B11137244" "Group1"
"B11154534" "Group1"
"B11144100" "Group1"
"B11137244" "Group1"
"B12226566" "Group2b"
"B11134987" "Group1"
"B11144345" "Group1"
"B11110676" "Group1"
"C11138929" "Group2b"
"B11154532" "Group1"
"B11137120" "Group2b"
"B33191224" "Group2b"
"C11138999" "Group2b"
"B11131605" "Group1"
"B11137784" "Group1"
"B11156098" "Group1"
"B11133232" "Group1"
"C11138912" "Group2b"
"B11150911" "Group2b"
"B11152577" "Group1"
"B11156098" "Group1"
"B11133232" "Group1"
$ grep Group[12] file3.csv | awk -F',' '{print $1,$2}'
C11138192 Group1
B12226507 Group1
B12226546 Group1
C11138122 Group1
C11138184 Group1
C11138797 Group1
C11138152 Group1
C11138150 Group1
C11131039 Group2 b
C11135566 Group2 b
B11119903 Group2 b
C11137544 Group1
C11137123 Group2 b
C11133100 Group1
B11108399 Group1
B11108326 Group1
B11119909 Group2 b
B11110893 Group1