So far, we have learned that using the for loop and the cat utility you can read each line of a file and separate it into different fields using a separator. However, csv files can become very difficult to separate into fields if some of them contain a comma (the same character that is being used as a separator), a space, or both.
Example: Obtain the last field of $line using the concepts learned before.
$ line="SUBJ20"," Age 22-30","VISIT1","1","DIAGN: Major Depressive Disorder, Single Episode, In Full Remission"
$ IFS=',' read -a ARRAY >>> "$line"
$ echo "The last field of line is: "${ARRAY[4]}
The last field of line is: DIAGN: Major Depressive Disorder
However, this is not the correct result. The last field of $line is "DIAGN: Major Depressive Disorder, Single Episode, In Full Remission". But because we are using a comma as a separator, bash is separating this field into separate columns. To solve this problem, you can read from the file descriptor and save each field in a separate variable using the read utility.
The first step is to assign a file descriptor (which must be an integer number) to the input file:
$ exec 3< $INPUT_FILE
Then, to read each line of the file and save each field in a different variable:
$ read -u 3 a b
The previous command will read the next line of the input file with descriptor 3 (the number following flag -u) and save the first field in variable a and the rest of the line in variable b.
$ read -u 3 a b c
The previous command will read the next line of the input file with descriptor 3 and save the first field in variable a, the second field in variable b, and the rest of the line in variable c (variable c would be empty if there are no more fields to read). If you want to separate the fields using comma as a separator, you should use the following command instead:
$ IFS=',' read -u 3 a b c d
If one of the columns contains a comma but is surrounded by quotation marks, then it will read the text inside the quotation marks as a single field. In the example before, it will read the following text as the last field: $ "DIAGN: Major Depressive Disorder, Single Episode, In Full Remission".
Each time you type the command $ FS=',' read -u 3 a b c d it will read the following line.
Read each line of a file and save the first and last fields into a new file
Given the file example.csv with the following content:
"SUBJ1","Age 22-30","VISIT1","DIAGN: Major Depressive Disorder, Single Episode"
"SUBJ2","Age 22-30","VISIT1","DIAGN: Bipolar, Schizophrenia"
"SUBJ3","Age 22-30","VISIT1","DIAGN: Major Depressive Disorder"
"SUBJ4","Age 22-30","VISIT1","DIAGN: Autism, Dyslexia, ADHD"
Read each line of the file and save the first and last fields into a new file called result.csv.