Linux Commando: Splitting up is easy for a PDF file

Saturday, February 23, 2013

Splitting up is easy for a PDF file

Occasionally, I needed to extract some pages from a multi-page pdf document. Suppose you have a 6-page pdf document named myoldfile.pdf. You want to extract into a new pdf file mynewfile.pdf containing only pages 1 and 2, 4 and 5 from myoldfile.pdf.

I did exactly that using pdktk, a command-line tool.

If pdftk is not already installed, install it like this on a Debian or Ubuntu-based computer.

$ sudo apt-get update
$ sudo apt-get install pdftk

Then, to make a new pdf with just pages 1, 2, 4, and 5 from the old pdf, do this:

$ pdftk myoldfile.pdf cat 1 2 4 5 output mynewfile.pdf

Note that cat and output are special pdftk keywords. cat specifies the operation to perform on the input file. output signals that what follows is the name of the output pdf file.

You can specify page ranges like this:

$ pdftk myoldfile.pdf cat 1-2 4-5 output mynewfile.pdf

pdftk has a few more tricks in its back pocket. For example, you can specify a burst operation to split each page in the input file into a separate output file.

$ pdftk myoldfile.pdf burst

By default, the output files are named pg_0001.pdf, pg_0002.pdf, etc.

pdftk is also capable of merging multiple pdf files into one pdf.

$ pdftk pg_0001.pdf pg_0002.pdf pg_0004.pdf pg_0005.pdf output mynewfile.pdf

That would merge the files corresponding to the first, second, fourth and fifth pages into a single output pdf.

If you know of another easy way to split up pages from a pdf file, please tell us in a comment. Much appreciated.

Two updates (part 2, part 3) are available for this post.

22 comments:

Chillar AnandMay 7, 2014 at 4:47 AM
Oh man... great tutorial. Thank you. keep posting!!
ReplyDelete
Replies
Tarik's BlogJune 1, 2014 at 8:18 AM
Thanks! Straight to the point. Viva Linux!
ReplyDelete
Replies
AnonymousJune 8, 2014 at 7:54 AM
Tried to get free pdf split and merge programs for windows and got warnings from my antivirus that aborted installation.

Linux does it so neatly. Thanks for the excellent post!
ReplyDelete
Replies
AnonymousJuly 2, 2014 at 4:54 PM
Great tip. Thanks.
ReplyDelete
Replies
AnonymousOctober 11, 2014 at 6:09 AM
thanks for this blog entry. it has proved very useful.
ReplyDelete
Replies
Richard GravoisOctober 14, 2014 at 6:57 AM
I split bigfile into pages.
It seems that a big watermark "Sample" shows up in Safari and chrome but not other browsers (mozilla, IE). The watermark is not in bigfile.
What switch adds the watermark?
ReplyDelete
Replies
AnonymousDecember 2, 2014 at 8:49 AM
what's the difference with print into pdf file and selecting only the desired pages ?
ReplyDelete
Replies
ChucklingMcArseoffApril 9, 2015 at 10:23 AM
pdftk looks like a pretty neat tool indeed, but if all you're trying to accomplish is splitting a PDF into separate files per page, then you can just open the PDF in Evince (or your favorite PDF viewer capable of printing) and select File > Print... and tell the print dialog which pages you want then select "Print to file".
ReplyDelete
Replies
Nazim AghabayovMay 28, 2015 at 6:01 AM
Thank dude! Your reference is really helpful. I scripted a small file to split pdf every several pages

======================
#!/bin/bash

#first arg is a file name
export file=$1

#second argument is pages per file
export ppd=$2

pagecount=$(pdfinfo -- "$file" 2> /dev/null | awk '$1 == "Pages:" {print $2}')

echo document $file has $pagecount pages
echo splitting per $ppd pages

currentp=1
secn=1
while [ "$currentp" -le "$pagecount" ]; do

let modl=$currentp%$ppd

if [ 0 -eq $modl ]; then
let pbeginning=$currentp-$ppd+1
let pend=$currentp
echo " $pbeginning $pend"
pdftk $file cat $pbeginning-$pend output "$file"_"$secn".pdf
let last=$currentp
let secn=$secn+1
fi

#last page
if [ $currentp -eq $pagecount ]; then
if [ $last -ne $currentp ]; then
let secn=$secn+1
let pbeginning=$last+1
let pend=$currentp
echo "last: $pbeginning $pend"
pdftk $file cat $pbeginning-$pend output "$file"_"$secn".pdf
fi
fi

let currentp=$currentp+1

done
ReplyDelete
Replies
AnonymousJuly 26, 2015 at 7:55 PM
Thank you a lot for sharing this.Besides, I found this PDF split resource, I'm not sure whether it supports Linux?
ReplyDelete
Replies
JRCPOctober 11, 2015 at 4:33 AM
hi! i'm Jose, from Spain

i have tried the Nazim Aghabayov script, but it's like there is a bug...
i saved the script as cortar.sh, and this is what is shown

cortar.sh: 18: cortar.sh: let: not found
cortar.sh: 20: [: -eq: argument expected
cortar.sh: 40: cortar.sh: let: not found

as far i can know, the message of line 18 is about
let modl=$currentp%$ppd
and the message of line 20 is indeed about $modl

can anybody see where the bug is, if any?

thanks a lot, guys
ReplyDelete
Replies
AnonymousJanuary 11, 2016 at 8:38 AM
very useful for breaking up pdf books, thanks!
ReplyDelete
Replies
monarch a sadistJune 9, 2016 at 5:31 AM
thanks man helped a lot... i owe u atleast a thaks
ReplyDelete
Replies
UnknownJuly 5, 2016 at 11:29 PM
Here is the link for Split pdf document. Hope this gives you a start for you file pdf program on rasteredge page http://www.rasteredge.com/how-to/csharp-imaging/pdf-split/
ReplyDelete
Replies
AnonymousJuly 28, 2016 at 8:17 PM
JUST realized that closing the left side pane containing the thumbnails of each page in the PDF allows for the file to scroll 98-99% smoothly.

Stumbled upon the solution as I was printing PDF files with regards to page ranges and chapters in order to split the book up into smaller file sizes, which was working very goooood too by the way. But simply closing the left side thumb-nails is a lot less work :)
ReplyDelete
Replies
AkomJanuary 19, 2017 at 12:59 PM
I had to write a script to split the original PDF into pages in order to allow tesseract and imagemagick to handle it without running out of memory, and to overcome the TIFF with alpha channel issues (spp not in set {1,3,4})

Script and write-up are here: http://tech.akom.net/archives/126-OCR-on-a-large-PDF-using-tesseract-and-pdftk.html

Thanks for the starting point!
ReplyDelete
Replies
UnknownApril 21, 2017 at 8:19 PM
Im linux not able to find my pdf do i have to paste in at specific place
ReplyDelete
Replies
RobiSeptember 24, 2017 at 11:26 AM
Found one really easy way to split and merge pdfs here, worked for me

https://technovechno.com/how-to-split-merge-pdf-documents-using-pdftk-in-ubuntu/

ReplyDelete
Replies
UnknownFebruary 9, 2018 at 4:37 AM
Thank you - perfect solution!
ReplyDelete
Replies
AnonymousDecember 10, 2018 at 3:27 AM
pdftk seems not be free anymore
ReplyDelete
Replies
pranjalMarch 4, 2019 at 5:26 AM
So helpful thanks..👍
ReplyDelete
Replies
AnonymousMarch 13, 2019 at 10:31 AM
please use pdfseparate on linux/ubuntu
pdfinfo
pdfunite and others
ReplyDelete
Replies

Add comment