9.6. The grep command

Searches a given pattern or text in a file or list of files. grep is is able to find simple patterns and basic regular expressions, egrep can perform search of extended regular expressions. fgrep is quicker than both tools but can only handle fixed patterns. zgrep, zegrep, and zfgrep act like grep, egrep, and fgrep, respectively, but accept compressed files as input.

Syntax: grep [flag] [pattern] [file(s)]

grep format meaning
grep string files Searches the list of files for lines that contain string. See examples here.
grep '^expression' files

Search for any lines that start with expression in the list of files. If expression contains a back slash (\), the special meaning of the next special character is turned off. If expression contains a dot (.) that is not preceded by a black slash, it will match a single character of any value in the position of the dot.

e.i. grep '^string' file.txt will search for any lines in file.txt that start with string. See more examples here.

grep 'expression$' files

Search for any lines that end with expression in the list of files. If expression contains a back slash (\), the special meaning of the next special character is turned off. If expression contains a dot (.) that is not preceded by a black slash, it will match a single character of any value in the position of the dot.

Examples:
grep 'string$' file matches any lines in file that end with string. See more examples here.

grep '^string$' files matches any lines in file that start and end with string (lines containing only string).

grep '[characters]' files

Search for any lines that contain any of the characters enclosed between the brackets. Use a hyphen for a range of values.

Examples:
grep '[abcde]' files matches any lines in file that contain a, b, c, d or e.

grep '[Ss]tring' file matches any lines in file that contain the words string or String.

grep 'B[ai][dt]' file matches any lines in file that contain the words Bad, Bat, Bid or Bit (the second character can be a or i and the third character d or t).

grep '[0-9][0-9]' file matches any lines in file that contain a pair of numeric digits.

grep '[a-zA-Z]' file matches any lines in file with at least one letter.

grep '^$' file matches any empty lines.

grep '[^characters]' files

Search for any lines that don't contain any of the characters enclosed between he brackets. Use a hyphen for a range of values.

e.i. grep '[^a-zA-Z0-9]' file matches any lines in file that don't contain any letter or number (any lines that contain only special characters).

grep 'character*' files

The character preceding the asterisk is optional when matching lines.

e.i. grep '"*smug"*' file matches any lines in file that contain smug or "smug" (with or without the quotes that precede the asterisks).

grep 'expression\{n\}' files

Match exactly n occurrences of expression.

e.i. grep '[0-9]\{3\}-[0-9]\{4\}' file matches any lines in file that contain three digits, followed by a line and four digits.

grep 'expression \{n,\}' files

Matches n or more occurrences of expression.

e.i. grep '[0-9]\{3,\}' file matches any lines in file that contain three or more digits.

Flag Meaning
-A num Print num lines of trailing context after each match. See here for an example.
-B num Print num lines of leading context before each match. See here for an example.
-C num Print num lines of leading and trailing context surrounding each match. If num is not specified, num=2. See here for an example.
-c Print the number of matched lines per file instead of the actual lines.
--colour=when Mark up the matching text with the expression stored in GREP_COLOR environment variable. The possible values of when can be: never, always or auto.
-d action Specify the demanded action for directories. The possible values of action are: read (default), which means that the directories are read in the same manner as normal files; skip to silently ignore the directories, and recourse to read them recursively, which has the same effect as the -R and -r option.
-e pattern To search for more than one pattern/expression, add the flag -e in front of each expression.
--exclude If specified, it excludes files matching the given filename pattern from the search. Note that --exclude patterns take priority over --include patterns. Patterns are matched to the full path specified, not only to the filename component.
--exclude-dir filename_pattern If -R is specified, it excludes directories matching the given filename_pattern from the search.
-f file Read one or more newline separated patterns from file. Empty pattern lines match every input line. Newlines are not considered part of a pattern. If file is empty, nothing is matched.
-h Omit the filename headers with output lines.
--help Print a brief help message.
--include If specified, only files matching the given filename pattern are searched. Note that --exclude patterns take priority over --include patterns. Patterns are matched to the full path specified, not only to the filename component.
--include-dir filename_pattern If -R is specified, only directories matching the given filename_pattern are searched. Note that --exclude-dir patterns take priority over --include-dir patterns.
-L Only the names of files not containing selected lines are listed.
-l Only the names of files containing selected lines are listed.
-m num Stop reading the file after num matches.
-n Each output line is preceded by its relative line number in the file, starting at line 1. The line number counter is reset for each file processed. This option is ignored if -c, -L, -l, or -q is specified.
--null Prints a zero-byte after the file name.
-O If -R is specified, follow symbolic links only if they were explicitly listed on the command line. The default is not to follow symbolic links.
-o Prints only the matching part of the lines.
-q Suppress normal output.
-R or -r Recursively search subdirectories listed.
-S If -R is specified, all symbolic links are followed. The default is not to follow symbolic links.
-s Suppress error messages from nonexistent or unreadable files.
-V Display version information and exit.
-v Selected lines are those not matching any of the specified patterns.
-w The expression is searched for as a whole word.
-x Show only the cases where the whole line equals the expression.
-Z or -z Accepts compressed input files.
--line-buffered Force output to be line buffered. By default, output is line buffered when standard output is a terminal and block buffered otherwise.

The following examples will show how to read and manipulate files using different command line tools. Each example will read one or more of the following files. file1.csv and file3.csv use comma as the separator between columns. On the other hand, file2.txt and file file4.txt use a space as the separator between columns.

Content of file1.csv:
"Anonymized ID","Subject Group","HASCONDITION","CONDITION"
"B33199522","Group1","0",""
"B33199603","Group3","0",""
"B11137879","Group1","0",""
"B11144410","Group2 b","0",""
"B11110455","Group2 b","0",""
"B11135291","Group3","0",""
"B11153927","Group1","0",""
"B11177579","Group2 b","0",""
"B11177806","Group1","MD",""
"B11157958","Group3","0",""
"B11110690","Group3","0",""
"B11152799","Group1","0",""
"B11154358","Group1","0",""
"B11110925","Group1","0",""
"B11135291","Group3","9","mTBI"
"B11135072","MISSING","0",""
"B33199603","Group3","0",""
"B11137879","Group1","0",""
"B11110603","Group1","0",""
"B11110927","Group1","0",""
"B11147712","Group1","0",""
"B33191224","Group2 b","0",""
"B11131290","Group2 b","0",""
"B11157974","Group1","0",""
"B33191224","Group2 b","0",""
"B11141503","Group3","0",""
"C11137159","Group3","9","mTBI"
"B33199522","Group1","0",""

Content of file2.txt:
"AnonymizedID" "SubjectGroup" "TEST1" "TEST2"
"B11130912" "Group2b" "900" "MissingData"
"B11137244" "Group1" "450" "555"
"B11154534" "Group1" "456" "456"
"B11144100" "Group1" "450" "886"
"B11137244" "Group1" "450" "456"
"B12226566" "Group2b" "450" "MissingData"
"B11134987" "Group1" "900" "MissingData"
"B11144345" "Group1" "900" "776"
"C11137159" "Group3" "MissingData" "MissingData"
"B11156453" "Group4" "456" "2"
"B11110676" "Group1" "900" "10"
"C11138929" "Group2b" "2" "MissingData"
"B11154532" "Group1" "456" "886"
"B11155267" "Group3" "900" "10"
"B11137120" "Group2b" "450" "456"
"B33191224" "Group2b" "450" "776"
"B11155267" "Group3" "900" "10"
"C11138999" "Group2b" "900" "MissingData"
"B11131605" "Group1" "456" "MissingData"
"B11137784" "Group1" "900" "436"
"B11156098" "Group1" "500" "886"
"B11133232" "Group1" "500" "MissingData"
"B11135292" "Group3" "MissingData" "MissingData"
"C11138912" "Group2b" "900" "MissingData"
"B11150911" "Group2b" "900" "117"
"B11152577" "Group1" "900" "756"
"B11156098" "Group1" "456" "886"
"B11133232" "Group1" "456" "MissingData"

Content of file3.csv:
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28

Content of file4.txt:
AnonymizedID SubjectGroup AGE
B11108326 Group1 59
B11108399 Group1 23
B11110893 Group1 28
B11119909 Group2 61
D11144030 Group3 11
D11144030 Group3 13
B11119903 Group2 84
C11131039 Group2 67
C11133100 Group1 23
C11135566 Group2 72
C11137159 Group3 11
C11137159 Group3 12
C11137167 Group3 14
C11137167 Group3 16
C11137439 Group3 15
C11137439 Group3 79
C11137443 Group3 15
C11137544 Group1 22
C11137123 Group2 68
C11138150 Group1 44
C11138152 Group1 10
C11138797 Group1 24
C11138184 Group1 57
C11138122 Group1 23
C11138122 MISSING 25
C11138192 Group1 45
B12226507 Group1 26
B12226546 Group1 55