9.15. Find and replace text

Replace all occurrences of C11137159 in file3.csv with XXXXXXXXX and save the modified content in file3_mod.csv.

Command to execute the substitution:
$ sed 's/C11137159/XXXXXXXXX/' file3.csv > file3_mod.csv

Content of file3.csv before the substitution: Content of the file after the substitution:
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3_mod.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
XXXXXXXXX,Group3,13
C11137167,Group3,16
XXXXXXXXX,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28

In the following example, instead of replacing a fix string as we did in example 1, we will replace a group of characters (i.e. all upper-case characters in the file) by a single character or another group of characters (i.e. replace with lower-case characters). The groups of characters that can be used are listed in the following table:

Expression Group of characters
[:alnum:] Letters and digits
[:alpha:] Letters
[:blank:] Horizontal white space
[:cntrl:] Control characters
[:digit:] Digits
[:graph:] Printable characters, excluding space
[:lower:] Lower-case letters
[:print:] Printable characters, including space
[:punct:] Punctuation characters
[:space:] Horizontal or vertical white space
[:upper:] Upper-case letters
[:xdigit:] Hexadecimal digits

Replace all upper-case letters in file3.csv by lower-case.

Original content of the file Content after replacements
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3.csv | tr '[:upper:]' '[:lower:]'
anonymized id,subject group,age
c11138122,missing,21
c11138192,group1,54
b12226507,group1,68
b12226546,group1,67
c11138122,group1,24
c11138184,group1,59
c11138797,group1,22
c11138152,group1,53
c11138150,group1,41
c11137167,group3,14
c11137159,group3,13
c11137167,group3,16
c11137159,group3,13
c11131039,group2 b,67
c11135566,group2 b,73
b11119903,group2 b,83
c11137544,group1,21
c11137443,group3,11
c11137123,group2 b,69
c11137439,group3,79
c11137439,group3,15
c11133100,group1,23
d11144030,group3,13
b11108399,group1,23
b11108326,group1,59
b11119909,group2 b,61
b11110893,group1,28

Replace all lower-case letters in file3.csv by upper-case.

Original content of the file Content after replacements
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3.csv | tr '[:lower:]' '[:upper:]'
ANONYMIZED ID,SUBJECT GROUP,AGE
C11138122,MISSING,21
C11138192,GROUP1,54
B12226507,GROUP1,68
B12226546,GROUP1,67
C11138122,GROUP1,24
C11138184,GROUP1,59
C11138797,GROUP1,22
C11138152,GROUP1,53
C11138150,GROUP1,41
C11137167,GROUP3,14
C11137159,GROUP3,13
C11137167,GROUP3,16
C11137159,GROUP3,13
C11131039,GROUP2 B,67
C11135566,GROUP2 B,73
B11119903,GROUP2 B,83
C11137544,GROUP1,21
C11137443,GROUP3,11
C11137123,GROUP2 B,69
C11137439,GROUP3,79
C11137439,GROUP3,15
C11133100,GROUP1,23
D11144030,GROUP3,13
B11108399,GROUP1,23
B11108326,GROUP1,59
B11119909,GROUP2 B,61
B11110893,GROUP1,28

Replace all alphabetical characters in file3.csv by the number 0

Original content of the file Content after replacements
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3.csv | tr '[:alpha:]' 0
0000000000 00,0000000 00000,000
011138122,0000000,21
011138192,000001,54
012226507,000001,68
012226546,000001,67
011138122,000001,24
011138184,000001,59
011138797,000001,22
011138152,000001,53
011138150,000001,41
011137167,000003,14
011137159,000003,13
011137167,000003,16
011137159,000003,13
011131039,000002 0,67
011135566,000002 0,73
011119903,000002 0,83
011137544,000001,21
011137443,000003,11
011137123,000002 0,69
011137439,000003,79
011137439,000003,15
011133100,000001,23
011144030,000003,13
011108399,000001,23
011108326,000001,59
011119909,000002 0,61
011110893,000001,28

Replace all digits in file3.csv by the letter X.

Original content of the file Content after replacements
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3.csv | tr '[:digit:]' X
Anonymized ID,Subject Group,AGE
CXXXXXXXX,MISSING,XX
CXXXXXXXX,GroupX,XX
BXXXXXXXX,GroupX,XX
BXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX b,XX
CXXXXXXXX,GroupX b,XX
BXXXXXXXX,GroupX b,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX b,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
CXXXXXXXX,GroupX,XX
DXXXXXXXX,GroupX,XX
BXXXXXXXX,GroupX,XX
BXXXXXXXX,GroupX,XX
BXXXXXXXX,GroupX b,XX
BXXXXXXXX,GroupX,XX

Replace all punctuation characters in file3.csv by a space (' ')

Original content of the file Content after replacements
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3.csv | tr '[:punct:]' ' '
Anonymized ID Subject Group AGE
C11138122 MISSING 21
C11138192 Group1 54
B12226507 Group1 68
B12226546 Group1 67
C11138122 Group1 24
C11138184 Group1 59
C11138797 Group1 22
C11138152 Group1 53
C11138150 Group1 41
C11137167 Group3 14
C11137159 Group3 13
C11137167 Group3 16
C11137159 Group3 13
C11131039 Group2 b 67
C11135566 Group2 b 73
B11119903 Group2 b 83
C11137544 Group1 21
C11137443 Group3 11
C11137123 Group2 b 69
C11137439 Group3 79
C11137439 Group3 15
C11133100 Group1 23
D11144030 Group3 13
B11108399 Group1 23
B11108326 Group1 59
B11119909 Group2 b 61
B11110893 Group1 28

Replace all white spaces in file3.csv by an underscore '_'.

Original content of the file Content after replacements
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3.csv | tr '[:blank:]' '_'
Anonymized_ID,Subject_Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2_b,67
C11135566,Group2_b,73
B11119903,Group2_b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2_b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2_b,61
B11110893,Group1,28

You can also replace a range of letters or numbers:

Replace any A, B or C (letters in the range A-C) in file3.csv by the letter D.

Original content of the file Content after replacements
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3.csv | tr 'A-C' 'D'
Dnonymized ID,Subject Group,DGE
D11138122,MISSING,21
D11138192,Group1,54
D12226507,Group1,68
D12226546,Group1,67
D11138122,Group1,24
D11138184,Group1,59
D11138797,Group1,22
D11138152,Group1,53
D11138150,Group1,41
D11137167,Group3,14
D11137159,Group3,13
D11137167,Group3,16
D11137159,Group3,13
D11131039,Group2 b,67
D11135566,Group2 b,73
D11119903,Group2 b,83
D11137544,Group1,21
D11137443,Group3,11
D11137123,Group2 b,69
D11137439,Group3,79
D11137439,Group3,15
D11133100,Group1,23
D11144030,Group3,13
D11108399,Group1,23
D11108326,Group1,59
D11119909,Group2 b,61
D11110893,Group1,28

Replace A by W, B by X, C by Y, and D by Z in file3.csv (replace letters in the range A-D with letters in the range W-Z)

Original content of the file Content after replacements
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3.csv | tr 'A-D' 'W-Z'
Wnonymized IZ,Subject Group,WGE
Y11138122,MISSING,21
Y11138192,Group1,54
X12226507,Group1,68
X12226546,Group1,67
Y11138122,Group1,24
Y11138184,Group1,59
Y11138797,Group1,22
Y11138152,Group1,53
Y11138150,Group1,41
Y11137167,Group3,14
Y11137159,Group3,13
Y11137167,Group3,16
Y11137159,Group3,13
Y11131039,Group2 b,67
Y11135566,Group2 b,73
X11119903,Group2 b,83
Y11137544,Group1,21
Y11137443,Group3,11
Y11137123,Group2 b,69
Y11137439,Group3,79
Y11137439,Group3,15
Y11133100,Group1,23
Z11144030,Group3,13
X11108399,Group1,23
X11108326,Group1,59
X11119909,Group2 b,61
X11110893,Group1,28

The command tr doesn't allow you to replace by an empty space in order to delete a character or a set of characters, but you can use the -d flag for deletion.

Remove all spaces in file3.csv.

Original content of the file Content after replacements
$ cat file3.csv
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3.csv | tr -d '[:blank:]'
AnonymizedID,SubjectGroup,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2b,67
C11135566,Group2b,73
B11119903,Group2b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2b,61
B11110893,Group1,28

In order to delete any repeated (continuous) character or sequence use the -s flag.

Remove any repeated characters ([:alnum:]) in file3.csv.

Original content of the file Content after replacements
$ cat file3.csv
Anonymized ID,Subject Group,AGE
C11138122,MISSING,21
C11138192,Group1,54
B12226507,Group1,68
B12226546,Group1,67
C11138122,Group1,24
C11138184,Group1,59
C11138797,Group1,22
C11138152,Group1,53
C11138150,Group1,41
C11137167,Group3,14
C11137159,Group3,13
C11137167,Group3,16
C11137159,Group3,13
C11131039,Group2 b,67
C11135566,Group2 b,73
B11119903,Group2 b,83
C11137544,Group1,21
C11137443,Group3,11
C11137123,Group2 b,69
C11137439,Group3,79
C11137439,Group3,15
C11133100,Group1,23
D11144030,Group3,13
B11108399,Group1,23
B11108326,Group1,59
B11119909,Group2 b,61
B11110893,Group1,28
$ cat file3.csv | tr -s '[:alnum:]'
Anonymized ID,Subject Group,AGE
C13812,MISING,21
C138192,Group1,54
B126507,Group1,68
B126546,Group1,67
C13812,Group1,24
C138184,Group1,59
C138797,Group1,2
C138152,Group1,53
C138150,Group1,41
C137167,Group3,14
C137159,Group3,13
C137167,Group3,16
C137159,Group3,13
C131039,Group2 b,67
C1356,Group2 b,73
B1903,Group2 b,83
C13754,Group1,21
C13743,Group3,1
C137123,Group2 b,69
C137439,Group3,79
C137439,Group3,15
C1310,Group1,23
D14030,Group3,13
B10839,Group1,23
B108326,Group1,59
B1909,Group2 b,61
B10893,Group1,28