I posted about extracting lines from a text file ([1], [2]).
Enough about lines for now. Let's turn our attention to extracting columns and delimited fields in a text file. For instance, one task is to extract columns 5 to 7 in a file. Sometimes, the data you want reside in variable-length fields that are delimited by some character, say ",". A sample task is to extract the second field in a comma-delimited file.
As usual, there are more than 1 way to accomplish the tasks. The tools that we will use are cut, awk, and perl.
The text file is somefile.
$ cat > somefile 1234567890 1234567890 1234567890 1234567890
To extract fixed columns (say columns 5-7 of a file):
$ cut -c5-7 somefile 567 567 567 567
$ perl -pe '$_ = substr($_, 4, 3) . "\n"' somefile 567 567 567 567
The current line ($_) is replaced with substr($_, 4, 3), the substring starting from column 4 (perl is 0-based) for 3 characters.
To illustrate extracting a particular field, let's use /etc/passwd, a colon-delimited file. Say we extract the 6th field (home directory of users).
$ cut -d: -f6 /etc/passwd
$ awk -F : '{print $6}' /etc/passwd
$ perl -p -e '$_ = (split(/[:\n]/))[5] . "\n"' /etc/passwd
Here, I used the split function to separate out the words delimited by colon and the new line. The output of split is a list, and we assign the 5th element (perl is 0-based) to the current line. \n is necessary as a delimiter [:\n]; otherwise extracting the last field will have an extra new line.
If you think of some simple way to do this, please share with us using comments.
2 comments:
its a bikeshed, but the shorter, slightly more readable perl expression is:
perl -F: -anle 'print $F[5]' /etc/passwd
and
perl -F'|' -anle 'print @F[4..6]'
Please can you tell me how can i save the extraction column in variable
Post a Comment