Linux Commando: Splitting up is easy for a PDF file

Saturday, February 23, 2013

Splitting up is easy for a PDF file

Occasionally, I needed to extract some pages from a multi-page pdf document. Suppose you have a 6-page pdf document named myoldfile.pdf. You want to extract into a new pdf file mynewfile.pdf containing only pages 1 and 2, 4 and 5 from myoldfile.pdf.

I did exactly that using pdktk, a command-line tool.

If pdftk is not already installed, install it like this on a Debian or Ubuntu-based computer.

$ sudo apt-get update
$ sudo apt-get install pdftk

Then, to make a new pdf with just pages 1, 2, 4, and 5 from the old pdf, do this:

$ pdftk myoldfile.pdf cat 1 2 4 5 output mynewfile.pdf

Note that cat and output are special pdftk keywords. cat specifies the operation to perform on the input file. output signals that what follows is the name of the output pdf file.

You can specify page ranges like this:

$ pdftk myoldfile.pdf cat 1-2 4-5 output mynewfile.pdf

pdftk has a few more tricks in its back pocket. For example, you can specify a burst operation to split each page in the input file into a separate output file.

$ pdftk myoldfile.pdf burst

By default, the output files are named pg_0001.pdf, pg_0002.pdf, etc.

pdftk is also capable of merging multiple pdf files into one pdf.

$ pdftk pg_0001.pdf pg_0002.pdf pg_0004.pdf pg_0005.pdf output mynewfile.pdf

That would merge the files corresponding to the first, second, fourth and fifth pages into a single output pdf.

If you know of another easy way to split up pages from a pdf file, please tell us in a comment. Much appreciated.

Two updates (part 2, part 3) are available for this post.

22 comments:

Chillar Anand said...: Oh man... great tutorial. Thank you. keep posting!!; May 7, 2014 at 4:47 AM
Tarik's Blog said...: Thanks! Straight to the point. Viva Linux!; June 1, 2014 at 8:18 AM
Anonymous said...: Tried to get free pdf split and merge programs for windows and got warnings from my antivirus that aborted installation.

Linux does it so neatly. Thanks for the excellent post!; June 8, 2014 at 7:54 AM
Anonymous said...: Great tip. Thanks.; July 2, 2014 at 4:54 PM
Anonymous said...: thanks for this blog entry. it has proved very useful.; October 11, 2014 at 6:09 AM
Richard Gravois said...: I split bigfile into pages.
It seems that a big watermark "Sample" shows up in Safari and chrome but not other browsers (mozilla, IE). The watermark is not in bigfile.
What switch adds the watermark?; October 14, 2014 at 6:57 AM
Anonymous said...: what's the difference with print into pdf file and selecting only the desired pages ?; December 2, 2014 at 8:49 AM
ChucklingMcArseoff said...: pdftk looks like a pretty neat tool indeed, but if all you're trying to accomplish is splitting a PDF into separate files per page, then you can just open the PDF in Evince (or your favorite PDF viewer capable of printing) and select File > Print... and tell the print dialog which pages you want then select "Print to file".; April 9, 2015 at 10:23 AM
Nazim Aghabayov said...: Thank dude! Your reference is really helpful. I scripted a small file to split pdf every several pages

======================
#!/bin/bash

#first arg is a file name
export file=$1

#second argument is pages per file
export ppd=$2

pagecount=$(pdfinfo -- "$file" 2> /dev/null | awk '$1 == "Pages:" {print $2}')

echo document $file has $pagecount pages
echo splitting per $ppd pages

currentp=1
secn=1
while [ "$currentp" -le "$pagecount" ]; do

let modl=$currentp%$ppd

if [ 0 -eq $modl ]; then
let pbeginning=$currentp-$ppd+1
let pend=$currentp
echo " $pbeginning $pend"
pdftk $file cat $pbeginning-$pend output "$file"_"$secn".pdf
let last=$currentp
let secn=$secn+1
fi

#last page
if [ $currentp -eq $pagecount ]; then
if [ $last -ne $currentp ]; then
let secn=$secn+1
let pbeginning=$last+1
let pend=$currentp
echo "last: $pbeginning $pend"
pdftk $file cat $pbeginning-$pend output "$file"_"$secn".pdf
fi
fi

let currentp=$currentp+1

done; May 28, 2015 at 6:01 AM
Anonymous said...: Thank you a lot for sharing this.Besides, I found this PDF split resource, I'm not sure whether it supports Linux?; July 26, 2015 at 7:55 PM
JRCP said...: hi! i'm Jose, from Spain

i have tried the Nazim Aghabayov script, but it's like there is a bug...
i saved the script as cortar.sh, and this is what is shown

cortar.sh: 18: cortar.sh: let: not found
cortar.sh: 20: [: -eq: argument expected
cortar.sh: 40: cortar.sh: let: not found

as far i can know, the message of line 18 is about
let modl=$currentp%$ppd
and the message of line 20 is indeed about $modl

can anybody see where the bug is, if any?

thanks a lot, guys; October 11, 2015 at 4:33 AM
Anonymous said...: very useful for breaking up pdf books, thanks!; January 11, 2016 at 8:38 AM
monarch a sadist said...: thanks man helped a lot... i owe u atleast a thaks; June 9, 2016 at 5:31 AM
Unknown said...: Here is the link for Split pdf document. Hope this gives you a start for you file pdf program on rasteredge page http://www.rasteredge.com/how-to/csharp-imaging/pdf-split/; July 5, 2016 at 11:29 PM
Anonymous said...: JUST realized that closing the left side pane containing the thumbnails of each page in the PDF allows for the file to scroll 98-99% smoothly.

Stumbled upon the solution as I was printing PDF files with regards to page ranges and chapters in order to split the book up into smaller file sizes, which was working very goooood too by the way. But simply closing the left side thumb-nails is a lot less work :); July 28, 2016 at 8:17 PM
Akom said...: I had to write a script to split the original PDF into pages in order to allow tesseract and imagemagick to handle it without running out of memory, and to overcome the TIFF with alpha channel issues (spp not in set {1,3,4})

Script and write-up are here: http://tech.akom.net/archives/126-OCR-on-a-large-PDF-using-tesseract-and-pdftk.html

Thanks for the starting point!; January 19, 2017 at 12:59 PM
Unknown said...: Im linux not able to find my pdf do i have to paste in at specific place; April 21, 2017 at 8:19 PM
Robi said...: Found one really easy way to split and merge pdfs here, worked for me

https://technovechno.com/how-to-split-merge-pdf-documents-using-pdftk-in-ubuntu/; September 24, 2017 at 11:26 AM
Unknown said...: Thank you - perfect solution!; February 9, 2018 at 4:37 AM
Anonymous said...: pdftk seems not be free anymore; December 10, 2018 at 3:27 AM
pranjal said...: So helpful thanks..👍; March 4, 2019 at 5:26 AM
Anonymous said...: please use pdfseparate on linux/ubuntu
pdfinfo
pdfunite and others; March 13, 2019 at 10:31 AM

Search This Blog

Saturday, February 23, 2013

Splitting up is easy for a PDF file

22 comments: