Saturday, February 23, 2013

Splitting up is easy for a PDF file

Occasionally, I needed to extract some pages from a multi-page pdf document. Suppose you have a 6-page pdf document named myoldfile.pdf. You want to extract into a new pdf file mynewfile.pdf containing only pages 1 and 2, 4 and 5 from myoldfile.pdf.

I did exactly that using pdktk, a command-line tool.

If pdftk is not already installed, install it like this on a Debian or Ubuntu-based computer.

$ sudo apt-get update
$ sudo apt-get install pdftk

Then, to make a new pdf with just pages 1, 2, 4, and 5 from the old pdf, do this:

$ pdftk myoldfile.pdf cat 1 2 4 5 output mynewfile.pdf

Note that cat and output are special pdftk keywords. cat specifies the operation to perform on the input file. output signals that what follows is the name of the output pdf file.

You can specify page ranges like this:

$ pdftk myoldfile.pdf cat 1-2 4-5 output mynewfile.pdf

pdftk has a few more tricks in its back pocket. For example, you can specify a burst operation to split each page in the input file into a separate output file.

$ pdftk myoldfile.pdf burst 

By default, the output files are named pg_0001.pdf, pg_0002.pdf, etc.

pdftk is also capable of merging multiple pdf files into one pdf.

$ pdftk pg_0001.pdf pg_0002.pdf pg_0004.pdf pg_0005.pdf output mynewfile.pdf 

That would merge the files corresponding to the first, second, fourth and fifth pages into a single output pdf.

If you know of another easy way to split up pages from a pdf file, please tell us in a comment. Much appreciated.

Two updates (part 2, part 3) are available for this post.

22 comments:

  1. Oh man... great tutorial. Thank you. keep posting!!

    ReplyDelete
  2. Thanks! Straight to the point. Viva Linux!

    ReplyDelete
  3. Tried to get free pdf split and merge programs for windows and got warnings from my antivirus that aborted installation.

    Linux does it so neatly. Thanks for the excellent post!

    ReplyDelete
  4. Great tip. Thanks.

    ReplyDelete
  5. thanks for this blog entry. it has proved very useful.

    ReplyDelete
  6. I split bigfile into pages.
    It seems that a big watermark "Sample" shows up in Safari and chrome but not other browsers (mozilla, IE). The watermark is not in bigfile.
    What switch adds the watermark?

    ReplyDelete
  7. what's the difference with print into pdf file and selecting only the desired pages ?

    ReplyDelete
  8. ChucklingMcArseoffApril 9, 2015 at 10:23 AM

    pdftk looks like a pretty neat tool indeed, but if all you're trying to accomplish is splitting a PDF into separate files per page, then you can just open the PDF in Evince (or your favorite PDF viewer capable of printing) and select File > Print... and tell the print dialog which pages you want then select "Print to file".

    ReplyDelete
  9. Thank dude! Your reference is really helpful. I scripted a small file to split pdf every several pages


    ======================
    #!/bin/bash

    #first arg is a file name
    export file=$1

    #second argument is pages per file
    export ppd=$2

    pagecount=$(pdfinfo -- "$file" 2> /dev/null | awk '$1 == "Pages:" {print $2}')

    echo document $file has $pagecount pages
    echo splitting per $ppd pages

    currentp=1
    secn=1
    while [ "$currentp" -le "$pagecount" ]; do

    let modl=$currentp%$ppd

    if [ 0 -eq $modl ]; then
    let pbeginning=$currentp-$ppd+1
    let pend=$currentp
    echo " $pbeginning $pend"
    pdftk $file cat $pbeginning-$pend output "$file"_"$secn".pdf
    let last=$currentp
    let secn=$secn+1
    fi

    #last page
    if [ $currentp -eq $pagecount ]; then
    if [ $last -ne $currentp ]; then
    let secn=$secn+1
    let pbeginning=$last+1
    let pend=$currentp
    echo "last: $pbeginning $pend"
    pdftk $file cat $pbeginning-$pend output "$file"_"$secn".pdf
    fi
    fi

    let currentp=$currentp+1

    done

    ReplyDelete
  10. Thank you a lot for sharing this.Besides, I found this PDF split resource, I'm not sure whether it supports Linux?

    ReplyDelete
  11. hi! i'm Jose, from Spain

    i have tried the Nazim Aghabayov script, but it's like there is a bug...
    i saved the script as cortar.sh, and this is what is shown

    cortar.sh: 18: cortar.sh: let: not found
    cortar.sh: 20: [: -eq: argument expected
    cortar.sh: 40: cortar.sh: let: not found

    as far i can know, the message of line 18 is about
    let modl=$currentp%$ppd
    and the message of line 20 is indeed about $modl

    can anybody see where the bug is, if any?

    thanks a lot, guys

    ReplyDelete
  12. very useful for breaking up pdf books, thanks!

    ReplyDelete
  13. thanks man helped a lot... i owe u atleast a thaks

    ReplyDelete
  14. Here is the link for Split pdf document. Hope this gives you a start for you file pdf program on rasteredge page http://www.rasteredge.com/how-to/csharp-imaging/pdf-split/

    ReplyDelete
  15. JUST realized that closing the left side pane containing the thumbnails of each page in the PDF allows for the file to scroll 98-99% smoothly.

    Stumbled upon the solution as I was printing PDF files with regards to page ranges and chapters in order to split the book up into smaller file sizes, which was working very goooood too by the way. But simply closing the left side thumb-nails is a lot less work :)

    ReplyDelete
  16. I had to write a script to split the original PDF into pages in order to allow tesseract and imagemagick to handle it without running out of memory, and to overcome the TIFF with alpha channel issues (spp not in set {1,3,4})

    Script and write-up are here: http://tech.akom.net/archives/126-OCR-on-a-large-PDF-using-tesseract-and-pdftk.html

    Thanks for the starting point!

    ReplyDelete
  17. Im linux not able to find my pdf do i have to paste in at specific place

    ReplyDelete
  18. Found one really easy way to split and merge pdfs here, worked for me

    https://technovechno.com/how-to-split-merge-pdf-documents-using-pdftk-in-ubuntu/

    ReplyDelete
  19. pdftk seems not be free anymore

    ReplyDelete
  20. please use pdfseparate on linux/ubuntu
    pdfinfo
    pdfunite and others

    ReplyDelete