As examples, I showed some simple cases of using sed to extract a single line and a block of lines in a file.
An anonymous reader asked how one would extract every nth line from a large file.
Suppose somefile contains the following lines:
$ cat > somefile line 1 line 2 line 3 line 4 line 5 line 6 line 7 line 8 line 9 line 10 $
Below, I show 2 ways to extract every 4th line: lines 4 and lines 8 in somefile.
- sed
$ sed -n '0~4p' somefile line 4 line 8 $
0~4 means select every 4th line, beginning at line 0.
Line 0 has nothing, so the first printed line is line 4.
-n means only explicitly printed lines are included in the output.
- perl
$ perl -ne 'print ((0 == $. % 4) ? $_ : "")' somefile line 4 line 8 $
$. is the current input line number.
% is the remainder operator.
$_ is the current line.
The above perl statement prints out a line if its line number
can be evenly divided by 4 (remainder = 0).
Alternatively,
$ perl -ne 'print unless (0 != $. % 4)' somefile line 4 line 8 $
Click here for a more recent post on sed tricks.
Thank you. I don't use SED enough and this was a good reminder.
ReplyDeleteNote that your last perl example (already much more readable than the 1st) can be further simplified to
ReplyDeleteperl -ne 'print unless ($. % 4)' somefile
Since in perl, 0 is false in a boolean context, the "0 != " test is redundant.
I am sure the author knows that. Adding "0 != " adds clarity to the code and makes it readable, and it doesnt cost any extra machine cycles FYI!
ReplyDeleteI am currently using exactly what you suggest in your sed example. My problem is that my file is quite large - almost 5 million lines. I also need certain blocks of lines, e.g. every other set of say 10 lines. So, I wrote a bash script for it, but it is taking a very long time. I am wondering if it is so, because although -n represses the output of the majority of the lines, it is still traversing them all. I don't know if this is true.
ReplyDeleteIn any case - would you be able to suggest a more efficient way of doing what I am trying to do?
sed -n '3~3p'
ReplyDeletethe above command is not working. Its saying Unrecognized command:3~3P
Can you please Help me on this
Gopi, it is working for me.
ReplyDelete[root@dachis-centos ~]# sed -n '3~3p' sample.sh
for i in {1..3}
done