Occasionally, I needed to extract some pages from a multi-page pdf document. Suppose you have a 6-page pdf document named myoldfile.pdf. You want to extract into a new pdf file mynewfile.pdf containing only pages 1 and 2, 4 and 5 from myoldfile.pdf.
I did exactly that using pdktk, a command-line tool.
If pdftk is not already installed, install it like this on a Debian or Ubuntu-based computer.
$ sudo apt-get update $ sudo apt-get install pdftk
Then, to make a new pdf with just pages 1, 2, 4, and 5 from the old pdf, do this:
$ pdftk myoldfile.pdf cat 1 2 4 5 output mynewfile.pdf
Note that cat and output are special pdftk keywords. cat specifies the operation to perform on the input file. output signals that what follows is the name of the output pdf file.
You can specify page ranges like this:
$ pdftk myoldfile.pdf cat 1-2 4-5 output mynewfile.pdf
pdftk has a few more tricks in its back pocket. For example, you can specify a burst operation to split each page in the input file into a separate output file.
$ pdftk myoldfile.pdf burst
By default, the output files are named pg_0001.pdf, pg_0002.pdf, etc.
pdftk is also capable of merging multiple pdf files into one pdf.
$ pdftk pg_0001.pdf pg_0002.pdf pg_0004.pdf pg_0005.pdf output mynewfile.pdf
That would merge the files corresponding to the first, second, fourth and fifth pages into a single output pdf.
If you know of another easy way to split up pages from a pdf file, please tell us in a comment. Much appreciated.
22 comments:
Oh man... great tutorial. Thank you. keep posting!!
Thanks! Straight to the point. Viva Linux!
Tried to get free pdf split and merge programs for windows and got warnings from my antivirus that aborted installation.
Linux does it so neatly. Thanks for the excellent post!
Great tip. Thanks.
thanks for this blog entry. it has proved very useful.
I split bigfile into pages.
It seems that a big watermark "Sample" shows up in Safari and chrome but not other browsers (mozilla, IE). The watermark is not in bigfile.
What switch adds the watermark?
what's the difference with print into pdf file and selecting only the desired pages ?
pdftk looks like a pretty neat tool indeed, but if all you're trying to accomplish is splitting a PDF into separate files per page, then you can just open the PDF in Evince (or your favorite PDF viewer capable of printing) and select File > Print... and tell the print dialog which pages you want then select "Print to file".
Thank dude! Your reference is really helpful. I scripted a small file to split pdf every several pages
======================
#!/bin/bash
#first arg is a file name
export file=$1
#second argument is pages per file
export ppd=$2
pagecount=$(pdfinfo -- "$file" 2> /dev/null | awk '$1 == "Pages:" {print $2}')
echo document $file has $pagecount pages
echo splitting per $ppd pages
currentp=1
secn=1
while [ "$currentp" -le "$pagecount" ]; do
let modl=$currentp%$ppd
if [ 0 -eq $modl ]; then
let pbeginning=$currentp-$ppd+1
let pend=$currentp
echo " $pbeginning $pend"
pdftk $file cat $pbeginning-$pend output "$file"_"$secn".pdf
let last=$currentp
let secn=$secn+1
fi
#last page
if [ $currentp -eq $pagecount ]; then
if [ $last -ne $currentp ]; then
let secn=$secn+1
let pbeginning=$last+1
let pend=$currentp
echo "last: $pbeginning $pend"
pdftk $file cat $pbeginning-$pend output "$file"_"$secn".pdf
fi
fi
let currentp=$currentp+1
done
Thank you a lot for sharing this.Besides, I found this PDF split resource, I'm not sure whether it supports Linux?
hi! i'm Jose, from Spain
i have tried the Nazim Aghabayov script, but it's like there is a bug...
i saved the script as cortar.sh, and this is what is shown
cortar.sh: 18: cortar.sh: let: not found
cortar.sh: 20: [: -eq: argument expected
cortar.sh: 40: cortar.sh: let: not found
as far i can know, the message of line 18 is about
let modl=$currentp%$ppd
and the message of line 20 is indeed about $modl
can anybody see where the bug is, if any?
thanks a lot, guys
very useful for breaking up pdf books, thanks!
thanks man helped a lot... i owe u atleast a thaks
Here is the link for Split pdf document. Hope this gives you a start for you file pdf program on rasteredge page http://www.rasteredge.com/how-to/csharp-imaging/pdf-split/
JUST realized that closing the left side pane containing the thumbnails of each page in the PDF allows for the file to scroll 98-99% smoothly.
Stumbled upon the solution as I was printing PDF files with regards to page ranges and chapters in order to split the book up into smaller file sizes, which was working very goooood too by the way. But simply closing the left side thumb-nails is a lot less work :)
I had to write a script to split the original PDF into pages in order to allow tesseract and imagemagick to handle it without running out of memory, and to overcome the TIFF with alpha channel issues (spp not in set {1,3,4})
Script and write-up are here: http://tech.akom.net/archives/126-OCR-on-a-large-PDF-using-tesseract-and-pdftk.html
Thanks for the starting point!
Im linux not able to find my pdf do i have to paste in at specific place
Found one really easy way to split and merge pdfs here, worked for me
https://technovechno.com/how-to-split-merge-pdf-documents-using-pdftk-in-ubuntu/
Thank you - perfect solution!
pdftk seems not be free anymore
So helpful thanks..👍
please use pdfseparate on linux/ubuntu
pdfinfo
pdfunite and others
Post a Comment