Linux Commando: Using sed to extract lines in a text file

Saturday, March 22, 2008

Using sed to extract lines in a text file

If you write bash scripts a lot, you are bound to run into a situation where you want to extract some lines from a file. Yesterday, I needed to extract the first line of a file, say named somefile.txt.

$ cat somefile.txt
Line 1
Line 2
Line 3
Line 4

This specific task can be easily done with this:

$ head -1 somefile.txt
Line 1

For a more complicated task, like extract the second to third lines of a file. head is inadequate.

So, let's try extracting lines using sed: the stream editor.

My first attempt uses the p sed command (for print):

$ sed 1p somefile.txt
Line 1
Line 1
Line 2
Line 3
Line 4

Note that it prints the whole file, with the first line printed twice. Why? The default output behavior is to print every line of the input file stream. The explicit 1p command just tells it to print the first line .... again.

To fix it, you need to suppress the default output (using -n), making explicit prints the only way to print to default output.

$ sed -n 1p somefile.txt
Line 1

Alternatively, you can tell sed to delete all but the first line.

$ sed '1!d' somefile.txt
Line 1

'1!d' means if a line is not(!) the first line, delete.

Note that the single quotes are necessary. Otherwise, the !d will bring back the last command you executed that starts with the letter d.

To extract a range of lines, say lines 2 to 4, you can execute either of the following:

$ sed -n 2,4p somefile.txt
$ sed '2,4!d' somefile.txt

Note that the comma specifies a range (from the line before the comma to the line after). What if the lines you want to extract are not in sequence, say lines 1 to 2, and line 4?

$ sed -n -e 1,2p -e 4p somefile.txt
Line 1
Line 2
Line 4

If you know some different ways to extract lines in a file, please share with us by filling out a comment. P.S. Related articles from this blog:

62 comments:

AnonymousApril 16, 2008 at 10:57 PM
So what if I had a huge file and I wanted to extract, say, every 4th line?
ReplyDelete
Replies
Peter LeungApril 17, 2008 at 9:27 PM
Thanks for the question.
Please see the blog entry
http://linuxcommando.blogspot.com/2008/04/use-sed-or-perl-to-extract-every-nth.html
ReplyDelete
Replies
AnonymousApril 22, 2008 at 4:17 PM
Another way to do it is with head and tail. Your way might be easier. But for example if you wanted to read ONLY line 5 from a file you could do
$ head -n 5 | tail -n 1

I'm not sure if it is more or less complicated ;)
ReplyDelete
Replies
Peter LeungApril 22, 2008 at 8:21 PM
Interesting use of head n tail together.
ReplyDelete
Replies
AnonymousMay 19, 2008 at 9:38 AM
...and what about if you want to extract a line ... let#s say starting with # (Komment) ?
ReplyDelete
Replies
Peter LeungMay 19, 2008 at 9:52 AM
Try this:

sed -ne '/^#/p' somefile
ReplyDelete
Replies
makMay 21, 2008 at 10:12 AM
the head tail combination will not work if the file has lesser number of lines than 'n'
ReplyDelete
Replies
AnonymousJune 2, 2008 at 3:50 AM
I want just to edit first line of file by appending something to the first line and rest is same.How should I achieve it??
ReplyDelete
Replies
Peter LeungJune 2, 2008 at 12:58 PM
This is one way to add some text to the end of the first line of a file

sed -e '1s/$/new text/' yourfile.txt
ReplyDelete
Replies
AnonymousJune 7, 2008 at 2:57 AM
Hi,

how can I find a certain string in an output and then print the following lines?

Usage could be xrandr to find the resolution modes of a TV:

VGA disconnected (normal left inverted right x axis y axis)
LVDS connected 1024x768+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
1024x768 57.6*+ 85.0 75.0 70.1 60.0
832x624 74.6
800x600 85.1 72.2 75.0 60.3 56.2
640x480 85.0 72.8 75.0 59.9
720x400 85.0
640x400 85.1
640x350 85.1
TV connected (normal left inverted right x axis y axis)
1024x768 30.0
800x600 30.0
848x480 30.0
640x480 30.0

I would like to read all the lines after TV connected...

Thanks!
ReplyDelete
Replies
Peter LeungJune 7, 2008 at 11:08 AM
Try this...

xrandr | sed -n '/TV connect/,$p'

which tells sed to Print only those lines within the range, defined by where it matches "TV connect" to the end of file ($).
ReplyDelete
Replies
AnonymousJune 13, 2008 at 11:05 AM
Is there a way to delete line 5 if it contains some string?

Thanks.
ReplyDelete
Replies
Peter LeungJune 13, 2008 at 7:43 PM
A quick solution using awk:

awk ' NR!=5 || !/SomeRegExpr/ {print;} ' yourfile.txt

Basically, it says, print the line if it is NOT line 5 OR (if it is line 5), it does not match Some Regular Expression.

Hope that helps.
ReplyDelete
Replies
UnknownJuly 8, 2008 at 8:05 AM
Hi Peter,

I'd like to extract values for several variables located in the file.txt on the same line:
"foo1=10 foo2==20 ... foo10=100"

in this way:

10 20 ... 100

or

10
20
...
100

For this purpose I was trying to use sed:
sed -n 's/.*foo.*=$[0-9]*$.*/\1/p' file.txt

but can got only one value: 100.

How can I got all values?

Thank you.

-Val
ReplyDelete
Replies
Peter LeungJuly 8, 2008 at 7:27 PM
Hi, Val

I believe the following will do

sed 's/foo[0-9]*=//g' file.txt

It simply removes all occurences of the "foo1=", "foo2=", ...
"foo10=" strings.

The -g means globally replace (not just the first occurence on a line by default).

(I can't verify the correctness because I am away on vacation and does not have access to a Linux computer.. sigh)

Peter
ReplyDelete
Replies
UnknownJuly 9, 2008 at 7:53 AM
Thank you, Peter!

This works!

Now I'm considering more difficult case when in the sample line present other words or numbers, like this:

"foo1=10 foo2=20 some text foo3=30 1234 foo4=40 ... foo10=100"

But the same output should be:

10 20 30 40 ... 100

Thanks,
Val
ReplyDelete
Replies
Peter LeungJuly 10, 2008 at 8:20 AM
Val,

Try this (it takes your original approach, and simplifies it somewhat).

sed 's/foo[0-9]*=$[0-9]* ?$/\1/g' file.txt

Peter
ReplyDelete
Replies
SundaramSeptember 4, 2008 at 2:43 AM
Hi Peter,

How to extract a random line assigned by a script variable, in sed?
ReplyDelete
Replies
Peter LeungSeptember 4, 2008 at 8:38 AM
sundaram

Assuming $myvar is the var

sed -n "${myvar}p" somefile.txt

Peter
ReplyDelete
Replies
AnonymousNovember 9, 2008 at 10:05 AM
I've got one for you :)

I've got an html file that needs parsing. I'd like to use sed/awk/grep if possible.

I need to find a text string (xx:xx:xx) in an html file and extract the 3 lines under it and on those lines, I would like to strip off the html table tags (td).

Any ideas?
Thanks
Ryan
ReplyDelete
Replies
AnonymousNovember 9, 2008 at 4:13 PM
No problem, with some time and google I got it working with a combination of things. :)

"#EXTRACT THE LINES THAT WE NEED FROM THE HTML SOURCE FILE
cat $ARCHIVELOCATION/$CURRENTDATE.html | grep -A 3 "what you are looking for" > $SCRIPTTEMP/output

#DELETE EXTRA CHARACTERS/SPACES/ETC
cat $SCRIPTTEMP/output | tr -d " tabletag," > $SCRIPTTEMP/output"

Thanks!
Ryan
ReplyDelete
Replies
Peter LeungNovember 9, 2008 at 5:48 PM
Ryan, grep -A is clever. Thanks

I come up with this awk one-liner (maybe a bit terse)

awk '/what you are looking for/ {for (i=0; i<3; i++) {getline;sub(/tabletag/,"");print}}' $ARCHIVELOCATION/$CURRENTDATE.html > $SCRIPTTEMP/output

Peter
ReplyDelete
Replies
AnonymousDecember 1, 2008 at 6:25 AM
Hi, thanks for all the tips.

I have a file with indented lines similar to this:

line 1
indent 1
indent 2
line 2
line 3
indent 1
indent 2
line 8
indent 1
line 55

I want to print lines 2 and 55 because they do not have indented comments under them. Also it would be nice to assign 2 and 55 to a variable (array).

I've googled quite a lot, but no luck yet.
Would be interesting in seeing how you guys would solve this.
ReplyDelete
Replies
UnknownDecember 4, 2008 at 7:22 PM
Peter, nice post. sed is always helpful for printing lines. Thanks.

I have some sed works done on my blog

http://unstableme.blogspot.com/search/label/Sed

// Jadu
ReplyDelete
Replies
DayaDecember 5, 2008 at 7:11 AM
how can i delete some contain using sed or grep command without using line number
ReplyDelete
Replies
Peter LeungDecember 6, 2008 at 8:15 PM
Daya

Say you want to delete all lines containing this string daya, then run this

sed '/daya/d' somefile.txt
ReplyDelete
Replies
AnonymousJanuary 6, 2009 at 12:29 AM
Interesting...
I tried this on 6GB file.

[sangoh@*** transmit]$ date; perl -ne "print if($. == 551985 || $. == 552085)" multi_122708_trimmed_eur.tped >> /dev/null; date; sed -n 551985,552085p multi_122708_trimmed_eur.tped >> /dev/null; date; head -n552085 multi_122708_trimmed_eur.tped | tail -n100 >> /dev/null; date;
Tue Jan 6 00:18:54 PST 2009
Tue Jan 6 00:20:56 PST 2009
Tue Jan 6 00:22:58 PST 2009
Tue Jan 6 00:25:02 PST 2009
ReplyDelete
Replies
Louis CasillasSeptember 22, 2009 at 6:47 PM
Thanks a lot! Wanted a quick solution and this was perfect!
ReplyDelete
Replies
EriOctober 24, 2009 at 7:23 AM
This comment has been removed by the author.
ReplyDelete
Replies
EriOctober 24, 2009 at 7:24 AM
Hi!

How can I extract randomly lines from a file?

Thanks!
ReplyDelete
Replies
AnonymousFebruary 24, 2010 at 2:25 PM
Hi guys, I'm not sure if I'll get a reply from this but I wanted to give it a shot, my situation is similar to all the rest exposed before and at the same time particularly different so here it is and I hope someone could give me a hand since I simply can't figure this one out.

I have a file which contains certain blocks of text like for example the next ones
**********************************

---
---Current Info `aaaa`
---

more info
....
more info

...

more info with spaces and stuff

---
---Current Info `bbbb`
---

and again more info down here until finding again the same starting block of code

---
---Current Info `cccc`
---
***********************************

So basically what I need is a way for me to extract all the lines between the

---Current Info `aaaa`

...and

---Current Info `bbbb`

I need it to be inclusive on the top part (the aaaa line) but not necessarily on the bottom one but shouldn't matter either since I could remove X amount of lines from the bottom up easily once the real chunk of data is extracted.

I hope I was very clear and hopefully someone can give me a hand.

Thank you
ReplyDelete
Replies
ChristofJuly 14, 2010 at 1:34 AM
Thanks for that post, exactly what I was looking for.

And thanks in the name of all the other commenters for all your personalized problem solving for them, Peter! This has become some sort of one-man advice forum...
ReplyDelete
Replies
ChristofJuly 14, 2010 at 1:47 AM
As a reply to Eri: bash (and I don't know which other shells) can provide you with pseudo-random numbers from 0 - 32767 using $RANDOM. If you want a pseudo-random number from 0-9 say, you can use the mod operator % for example like this:

$ randnr=$((RANDOM%10))

"10" could be replaced by the number of lines in your text:

$ nrlines=`cat yourtext.txt | wc -l`

so

$ randlinenr=$((RANDOM%${nrlines}+1))

would pick a random line number of your text. Then use $randlinenr with sed

$ sed -n ${randlinenr}p yourtext.txt

Repeat until you got all you wanted.

Lastly, you give too little information, Eri, to take your question seriously. But I wanted to reply a) to help Peter and b) because I just recently had to use $RANDOM which had been totally unknown to me.
ReplyDelete
Replies
FlowerpowerMarch 10, 2011 at 10:23 PM
I was wondering if you could help me on this one. I would like to search for two strings per line then print the next X integers after each string. The integers may vary in length and there is always a space between the string and integers. It should search like this for each line.

Example:
line1-sometext Lat: 36.76565 Long: -119.09011 moretext
line2-sometext Lat: 36.47777 Long: -118.0411 moretext

Desired output:
36.76565 -119.09011
36.47777 -118.0411

I hope this doesn't stump you.
ReplyDelete
Replies
AnonymousAugust 31, 2011 at 3:27 PM
Nice article as for me. It would be great to read something more about that topic. Thnx for giving this info. Linda
Kiev escort tours
ReplyDelete
Replies
AnonymousMarch 5, 2012 at 4:06 AM
Regarding extracting random lines, I like to use shuf command:
shuf file.txt | head
ReplyDelete
Replies
Andreas PapadopoulosApril 18, 2012 at 4:06 AM
Thank you man for sharing.
I had a huge file and sed start,endp saved me.
ReplyDelete
Replies
Srikanth G KAugust 12, 2013 at 1:54 AM
Hello,
I wish to extract all lines from a text file, which contains a particular word.

like:
i wnat to print all the lines which contain "http"

How can I do this?
ReplyDelete
Replies
Peter LeungAugust 12, 2013 at 6:23 PM
Srikanth

How about this?

sed -n -e '/http/ p' yourfile
ReplyDelete
Replies
wholesale soccer JerseySeptember 4, 2013 at 2:15 AM
I wish to extract all lines from a text file, which contains a particular word.
ReplyDelete
Replies
AnonymousSeptember 17, 2013 at 1:51 PM
cat file.txt | head -$desiredline | tail -1

for line 12

cat file.txt | head -12 | tail -1
ReplyDelete
Replies
UnknownNovember 8, 2013 at 11:55 AM
Hi Peter,

Thanks for the original article and for still answering questions 5.5 years later!!

You've come very close to answering my question but not quite. If I want to extract certain lines you suggest using
$ sed -n -e 1,2p -e 4p somefile.txt

Now, what if I have a very large file and need to extract a subset of the lines, the index of which is too long to type manually.

Can I somehow use an index file combined with the above?

Thanks in advance and I understand if I don't hear from you, you've already gone above and beyond.
ReplyDelete
Replies
UnknownDecember 11, 2013 at 2:10 PM
Hi, I want to extract lines from a file such that it extracts all the lines from line 1 to the line it encounters a string like "dump" in a new file. How can I do that ?

Thanks
ReplyDelete
Replies
Mas Ucheng 17++October 28, 2014 at 7:57 PM
So what if I had a huge file and I wanted to extract, say, every 4th line?

Dana Cepat
ReplyDelete
Replies
Peter LeungOctober 28, 2014 at 8:29 PM
Hi, Dana

See this post:
http://linuxcommando.blogspot.ca/2008/04/use-sed-or-perl-to-extract-every-nth.html

Peter
ReplyDelete
Replies
AnonymousNovember 6, 2014 at 5:05 AM
I have a file of 9 lines say as below.
1
2
3
4
5
6
7
8
9

I want to 3 lines in a single line lik mentioned below
123
456
789

Can you help me with this
ReplyDelete
Replies
Nutrisi Wajib Untuk Ibu HamilNovember 17, 2014 at 1:14 AM
great article sir
ReplyDelete
Replies
Nutrisi Wajib Untuk Ibu HamilNovember 17, 2014 at 1:16 AM
I have a file of 9 lines say as below.
1
2
3
4
5
6
7
8
9

I want to 3 lines in a single line lik mentioned below
123
456
789

Can you help me with this
ReplyDelete
Replies
Mark ZiemannJuly 20, 2015 at 12:21 AM
Dear Nutrisi Wajib Untuk Ibu Hamil,
You could try using paste command:
paste - - - < filename
ReplyDelete
Replies
AnonymousAugust 21, 2015 at 11:56 PM
Hello all,
I'm trying to run a sed command to recursively print out line 14 from all files in a directory, can someone advise how I can achieve this?

I've tried this command but its only showing the first result:
sed -n 14p *.conf
ReplyDelete
Replies
ChristofAugust 27, 2015 at 1:17 AM
There probably is a more elegant sed-only way to do this but I would solve it with a bash script like this:

for files in *.conf; do
sed -n 14p "$files"
done

Or as a one-liner:
for files in *.conf; do sed -n 14p "$files"; done
ReplyDelete
Replies
kevinAugust 28, 2015 at 2:11 PM
what if I want to search for a text string, and then display the lines containing a second string that it falls between.

For example, if I have the file foo.txt:

a
b
c
----
d
e
f
---
g
h
i
j
---
k
l
---
m
---

say I want to grep for c and display all the lines between ---

The result should be:
---
a
b
c
---

How would i do that?
ReplyDelete
Replies
UnknownMarch 4, 2017 at 9:58 PM
Eight years ago, Alan posted the question: "How to delete line 5 if it contains some string?"

The blog author Peter Leung gave answer in awk. Since the blog is about sed, a more consistent answer is to solve it using sed thus:

sed -n '5!bL0;/some string/d;:L0;p' file.txt

Explanation: Sed supports if-else conditional branching. Here, i created a label 'L0'. The solution means: If not line 5 then goto label L0 and just print the line. Else, check for presence of 'some string' and delete entire line 5 if that string is found.
ReplyDelete
Replies
UnknownMarch 8, 2017 at 8:36 PM
Hi, I have a question. Assume a file with 30 lines. I want to extract 12th character of 4th line. how can this be accomplished?
ReplyDelete
Replies
Nanoo VisotorMay 4, 2017 at 5:15 PM
Under Win10x64, ssed (http://sed.sourceforge.net/grabbag/ssed/) seems a lot faster than head or sed (from UnxUtils). Tested under TCC/LE 14 shell (JpSoft).
ReplyDelete
Replies
BrianJuly 20, 2018 at 8:20 PM
Totally late coming, but if all you want to do is print the first so many lines you can just do

sed 11q somefile.txt

If you want N lines replace "11" with N+1 .
ReplyDelete
Replies
MattAugust 9, 2018 at 6:34 AM
Nice post. I had previously known about the `sed '`1!d' file` command, but I had to update somebody else's .csh script. In .csh the `sed '1!d' file` command doesn't seem to work (it gives the error "d: Event not found.".

So thank you for providing the alternative of `sed -n 1p file`!
ReplyDelete
Replies
shyambJuly 19, 2019 at 6:54 AM
Hi,

I have file containing the following lines

000101.html:
000612.html:during ATPG, refer to the Solet article:
Difference Between the add cell const O
000612.html:regenerating a new set of vectors.
Mask Patterns Without Rerunning ATPG
000636.html: 000101

From the above excerpt I need to extract only the filename available towards the end of href tag - I want the following to be displayed:

009009.html
901030.html
901006.html
000101.html

If possible I also want to get rid of the .html towards the end and only display:

009009
901030
901006
000101

Can someone please help me with this requirement. Thanks in advance.

-Shyam
ReplyDelete
Replies
GarfieldMay 15, 2020 at 4:55 AM
Hi Guys,
i am new to greping etc

I have a Hugh text file from which i want to output a given number of lines text upon locating a line containing the { character

example

some text line 1
some text line 2
some text line 3
some text line 4
some text line 4

{some more text line 6}
--------
some text line 8
some text line 9
some text line 10
some text line 11
some text line 12
some text line 13
some text line 14
some text line 15

{some more text line 17}
------------------
some text line 19
some text line 20
some text line 21
some text line 22
some text line 23
some text line 24
some text line 25
some text line 26

output should be

{some more text line 6}
--------
some text line 8
some text line 9

{some more text line 17}
------------------
some text line 19
some text line 20

ReplyDelete
Replies
AnonymousAugust 12, 2020 at 3:48 AM
Hi,
My log file would be like mentioned below:
From that i have to refer only last "timestamp": and its timestamp, and pull the log between two timerange.
For example, if i want to collect log between 23:32 and 23:35. It should refer only last "timestamp": and pull the log between those time range.

somecontent"TransDateTime\":\"2020-07-01T09:15:01.000Z","receiveTimestamp":"2020-07-01T02:15:01.335142083Z","textPayload":"[7/1/20 23:05],","timestamp":"2020-07-01T23:32:35.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:02.000Z","receiveTimestamp":"2020-07-01T02:15:02.335142083Z","textPayload":"[7/1/20 23:06],","timestamp":"2020-07-01T23:32:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:03.000Z","receiveTimestamp":"2020-07-01T02:15:03.335142083Z","textPayload":"[7/1/20 23:07],","timestamp":"2020-07-01T23:34:35.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:04.000Z","receiveTimestamp":"2020-07-01T02:15:04.335142083Z","textPayload":"[7/1/20 23:08],","timestamp":"2020-07-01T23:34:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:05.000Z","receiveTimestamp":"2020-07-01T02:15:05.335142083Z","textPayload":"[7/1/20 23:09],","timestamp":"2020-07-01T23:35:35.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:06.000Z","receiveTimestamp":"2020-07-01T02:15:06.335142083Z","textPayload":"[7/1/20 23:10],","timestamp":"2020-07-01T23:35:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:07.000Z","receiveTimestamp":"2020-07-01T02:15:07.335142083Z","textPayload":"[7/1/20 23:11],","timestamp":"2020-07-01T23:36:36.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:08.000Z","receiveTimestamp":"2020-07-01T02:15:08.335142083Z","textPayload":"[7/1/20 23:11],","timestamp":"2020-07-01T23:36:37.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:09.000Z","receiveTimestamp":"2020-07-01T02:15:09.335142083Z","textPayload":"[7/1/20 23:12],","timestamp":"2020-07-01T23:37:10.8",somecontent
somecontent"TransDateTime\":\"2020-07-01T09:15:10.000Z","receiveTimestamp":"2020-07-01T02:15:10.335142083Z","textPayload":"[7/1/20 23:13],","timestamp":"2020-07-01T23:37:15.8",somecontent

I tried sed command,
cat "filename" | sed -r 's/"timestamp":([^ ]+).*/\1/'
But its trimming the word timestamp, not collecting the log log file.

Your early response is really appreciated.
Thanks in advance...
Thanks in advance.
ReplyDelete
Replies
AnonymousNovember 6, 2020 at 10:37 AM
Thanks a lot for this info, couldn`t find a specific line and the sed 'n!d' filename realy helped me!
ReplyDelete
Replies
AnonymousJune 14, 2021 at 1:47 AM
Hi guys!
I have a Jenksinsfile and i want to print and run fews commands from the Jenkinsfile in a script file
Can idea how i can do that?
ReplyDelete
Replies

Add comment