Concatenating pdfs

I need to do this every once in a while, and then forget how to do it.

Instructions for Windows


1. Install Ghostscript. Get it from http://www.ghostscript.com/. On my machine it got installed under C:\Program Files\gs\gs9.05

2. Run:

gswin64c -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=temp.pdf pdf1.pdf pdf2.pdf

see http://superuser.com/questions/54041/how-to-merge-pdfs-using-imagemagick-resolution-problem

To convert xps to pdf:
1. download GhostXPS from http://www.ghostscript.com/download/gxpsdnld.html. Download the zip file, unzip it, copy to c:program files (x86). Rename exe file to gxps.exe
2. Run:

gxps.exe -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=mypdffile.pdf myxpsfile.xps

Instructions for 32bit machine:
1. Download Ghostscript 9.06 for Windows (32 bit) from http://www.ghostscript.com/download/gsdnld.html
2. Gets installed under c:\Program Files\gs\gs9.06
3. Run gswin32c.exe …

Windows batch file to concatenate all pdfs under current directory:


echo off
setlocal ENABLEDELAYEDEXPANSION
set params=
for %%i in ('dir *.pdf /b') do if exist %%i set params=!params! "%%i"
echo on
gswin64c -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=%1 %params%

Save it as catpdf.bat and run it like catpdf finaldoc.pdf. This should concatenate all pdfs under current directory and create a file finaldoc.pdf

some useful settings: if you want to decrease the size of output pdf try adding below flag

-dPDFSETTINGS=/ebook

This can reduce files to ~15% of their size (2.3M to 345K, in one case) with no obvious degradation of quality.

ghostscript -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

Other options for PDFSETTINGS:

  • /screen selects low-resolution output similar to the Acrobat Distiller “Screen Optimized” setting.
  • /ebook selects medium-resolution output similar to the Acrobat Distiller “eBook” setting.
  • /printer selects output similar to the Acrobat Distiller “Print Optimized” setting.
  • /prepress selects output similar to Acrobat Distiller “Prepress Optimized” setting.
  • /default selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file.

Source: http://ghostscript.com/doc/current/Ps2pdf.htm

Instructions for Mac or Linux

On Mac or Linux we can use the pdftk program. Install it if you don’t have it and then concatenate files by running:

$ pdftk chap1.pdf chap2.pdf chap3.pdf chap4.pdf chap5.pdf cat output edx.pdf

or to concatenate all files under a directory, run:

$ pdftk *.pdf cat output edx.pdf

To Extract a Page Range

pdftk full-pdf.pdf cat 12-15 output outfile_p12-15.pdf

Embed fonts in a PDF (Mac OS)

gs \
  -sFONTPATH=/System/Library/Fonts:/Library/Fonts \
  -o pdf-with-embedded-fonts.pdf \
  -sDEVICE=pdfwrite \
  -dPDFSETTINGS=/prepress \
   input_file.pdf

On MacOS gs can be found under /usr/local/bin/gs

Using Python

import sys
import fitz  # PyMuPDF

def concatenate_pdfs(input_pdf1, input_pdf2, output_pdf):
    pdf_writer = fitz.open()

    # Open the first PDF and add its pages to the writer
    with fitz.open(input_pdf1) as pdf_document1:
        pdf_writer.insert_pdf(pdf_document1)

    # Open the second PDF and add its pages to the writer
    with fitz.open(input_pdf2) as pdf_document2:
        pdf_writer.insert_pdf(pdf_document2)

    # Save the combined PDF
    pdf_writer.save(output_pdf)
    pdf_writer.close()

if __name__ == "__main__":
    if len(sys.argv) != 4:
        print("Usage: python script.py <input_pdf1> <input_pdf2> <output_pdf>")
        sys.exit(1)

    input_pdf1 = sys.argv[1]
    input_pdf2 = sys.argv[2]
    output_pdf = sys.argv[3]

    concatenate_pdfs(input_pdf1, input_pdf2, output_pdf)
This entry was posted in Software and tagged , , , , . Bookmark the permalink.

Leave a comment