As examples, I showed some simple cases of using sed to extract a single line and a block of lines in a file.
An anonymous reader asked how one would extract every nth line from a large file.
Suppose somefile contains the following lines:
$ cat > somefile line 1 line 2 line 3 line 4 line 5 line 6 line 7 line 8 line 9 line 10 $
Below, I show 2 ways to extract every 4th line: lines 4 and lines 8 in somefile.
- sed
$ sed -n '0~4p' somefile line 4 line 8 $
0~4 means select every 4th line, beginning at line 0.
Line 0 has nothing, so the first printed line is line 4.
-n means only explicitly printed lines are included in the output.
- perl
$ perl -ne 'print ((0 == $. % 4) ? $_ : "")' somefile line 4 line 8 $
$. is the current input line number.
% is the remainder operator.
$_ is the current line.
The above perl statement prints out a line if its line number
can be evenly divided by 4 (remainder = 0).
Alternatively,
$ perl -ne 'print unless (0 != $. % 4)' somefile line 4 line 8 $
Click here for a more recent post on sed tricks.
6 comments:
Thank you. I don't use SED enough and this was a good reminder.
Note that your last perl example (already much more readable than the 1st) can be further simplified to
perl -ne 'print unless ($. % 4)' somefile
Since in perl, 0 is false in a boolean context, the "0 != " test is redundant.
I am sure the author knows that. Adding "0 != " adds clarity to the code and makes it readable, and it doesnt cost any extra machine cycles FYI!
I am currently using exactly what you suggest in your sed example. My problem is that my file is quite large - almost 5 million lines. I also need certain blocks of lines, e.g. every other set of say 10 lines. So, I wrote a bash script for it, but it is taking a very long time. I am wondering if it is so, because although -n represses the output of the majority of the lines, it is still traversing them all. I don't know if this is true.
In any case - would you be able to suggest a more efficient way of doing what I am trying to do?
sed -n '3~3p'
the above command is not working. Its saying Unrecognized command:3~3P
Can you please Help me on this
Gopi, it is working for me.
[root@dachis-centos ~]# sed -n '3~3p' sample.sh
for i in {1..3}
done
Post a Comment