RSS Feed

RSS
Comments RSS

Python, watermark a PDF

This blog entry shows how to use Python and two third party modules (pyPdf and ReportLab) to watermark a PDF.

#This sample uses two third part modules for Python, 
#pyPdf & ReportLab to achieve creating and placing 
#watermark text at angle on an existing PDF file. 
#This example was produced with Python 2.7 
#See http://pybrary.net/pyPdf for more informaton about pyPdf. 
#See http://www.reportlab.com for more information about ReportLab. 

#Import the needed external modules and functions from pyPdf and reportlab.
from pyPdf import PdfFileWriter, PdfFileReader 
from reportlab.pdfgen import canvas

#Use reportlab to create a PDF that will be used 
#as a watermark on another PDF.
c= canvas.Canvas("watermark.pdf") 
c.setFont("Courier", 60)
#This next setting with make the text of our 
#watermark gray, nice touch for a watermark.
c.setFillGray(0.5,0.5)
#Set up our watermark document. Our watermark 
#will be rotated 45 degrees from the direction 
#of our underlying document.
c.saveState() 
c.translate(500,100) 
c.rotate(45) 
c.drawCentredString(0, 0, "A WATERMARK!") 
c.drawCentredString(0, 300, "A WATERMARK!") 
c.drawCentredString(0, 600, "A WATERMARK!") 
c.restoreState() 
c.save() 

#Read in the PDF that will have the PDF applied to it.
output = PdfFileWriter() 
input1 = PdfFileReader(file("original_pdf.pdf", "rb")) 

#Just to demo this function from pyPdf. 
#If the PDF has a title, this will print it out.
print "title = %s" % (input1.getDocumentInfo().title)

#Open up the orgininal PDF.
page1 = input1.getPage(0)

#Read in the file created above by ReportLab for our watermark.
watermark = PdfFileReader(file("watermark.pdf", "rb"))
#Apply the watermark by merging the two PDF files.
page1.mergePage(watermark.getPage(0))
#Send the resultant PDF to the output stream.
output.addPage(page1)

#Just to demo this function from pyPdf. 
#Return the number of pages in the watermarked PDF.
print "watermarked_pdf.pdf has %s pages." % input1.getNumPages()

#write the output of our new, watermarked PDF.
outputStream = file("watermarked_pdf.pdf", "wb") 
output.write(outputStream) 
outputStream.close()

4 Responses to “Python, watermark a PDF”

  • Very cool, Bill. Where can I find documentation on the imported module that creates the watermark? I’m interested in finding out more about the parameters that control the size and placement of the watermark. Also, it appears that the module may be of British origin, given the spelling of “centred”.

    73,
    ldb

  • Larry,
    In this case, the creating and placement of the “watermark” is a combinations of techniques. What is happening here is that we are simply overlaying the contents of one PDF on top of another by merging the two. The most difficult part for me involved some experimentation at getting the placement of the rotated material correct. Without the use of the translate function, my watermark material would not show up in the output PDF because the process of rotating it moved out of the viewable area. The vaules I use in the given example are in the native coordinates of the PDF standard with the bottom left corner as 0,0. It would have probably been better for me to have used a stylesheet or to have used coordinates measured in inches. As it is, the vaules I used here may appear rather arbitrary. In the setFillGray(0.5,0.5), represents a grayscale range where 0.0 is black and 1.0 is white, so 0.5 gets me a medium gray. If you a saw the out PDF, you would see also the effect is that the pure black of the main PDF document shows through the medium gray of the watermark, aiding in the “watermark” effect.

    The watermarking itself is acheived via the mergePage() method from the PdfFileReader class found in the pyPdf module.
    The pyPdf module main page is found at:
    http://pybrary.net/pyPdf/ Documentation for it is here: http://pybrary.net/pyPdf/pythondoc-pyPdf.pdf.html The pyPdf module is primarily used for the merging, splitting, and otherwise combining of PDF documents.

    The creation of the document that is used for the watermark is done via the ReportLab module. It is a very large suite of libraries that handle almost every aspect of creation and formatting of PDF documents. The main page for is is found here:
    http://www.reportlab.com/software/opensource/
    There is a userguide:
    http://www.reportlab.com/docs/reportlab-userguide.pdf
    There is also an API reference for the main interfaces:
    http://www.reportlab.com/apis/reportlab/dev/

    I am still experiement with this technique. I cannot see why I could not also use a graphic in the watermark source doucment rather than text, in order to achieve a graphic watermark on the output document. I think that imagination may be the only limit to this technique.

    Thanks for commenting,
    Bill – WA5PB

  • Thanks, Bill. Now I’ve got some reading to do.

    73,
    ldb
    K5WLF

  • Thanks this is excellent. Ubuntu 11.x includes the Python modules in it’s repositories.









*