June 05, 2008

Using XSL-FO to create PDF files

Libraries like iText,PDFBox are some of the best libraries available for PDF creation and manipulation. But these libraries have its disadvantages. One such disadvantage is templating the PDFs you create. In a enterprise where reports (PDF files) are generated automatically and routed to work-flows or stored in content management repositories. These reports follow an enterprise level style and pattern which can be reused, can change and will require proper management. Hardcoding styles and formats of reports into your application can cause lots of issues in future. That's when XSL-FO and Java (or other language) come into picture. In this article I will introduce XSL-FO and Apache FOP.

Let me intoduce XSL-FO first. XSL-FO stand for Extensible Stylesheet Language Formatting Objects and is a language for formatting XML data. It was initially part of the XSL W3C recommendation. So W3C has come up with XSLT for transforming XML documents and XSL-FO for formatting. XSL-FO documents are XML files that contain information about the output layout and output contents. But they do not specify the content type of the output. This output can be any document type; all you need is a transformation library that transform the XSL-FO to the appropriate content. In this article I will make use of Apache FOP, which is part of Apache XML Graphics Project.

First look at XSL-FO:
Here is a sample XSL-FO file. Like any other XML file, its starts with XML declaration:
<?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

<fo:layout-master-set>
<fo:simple-page-master master-name="A4">
<fo:region-body margin="2cm" />
<fo:region-before/>
</fo:simple-page-master>
</fo:layout-master-set>

<fo:page-sequence master-reference="A4">
<fo:flow flow-name="xsl-region-body">
<fo:block>Hello World!</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
The element is the root element of XSL-FO document. contains one or more template pages and master-name attribute specify the name for the template. elements describe the page contents. contents are not directly placed, they are placed inside a flow which in turn hold block. Each block can hold the data that is displayed. For further details please go through the tutorial.

Apache FOP:
Apache FOP (Formatting Objects Processor) claims to be the first processor that reads a formatting object (FO) tree and renders the resulting pages to a specified output. The primary output format is PDF but the following formats are supported:
  • PDF (Portable Document Format)
  • PS (Adobe Postscript)
  • PCL (Printer Control Language)
  • AFP (MO:DCA)
  • SVG (Scalable Vector Graphics)
  • XML (area tree representation)
  • Print
  • AWT/Java2D
  • MIF
  • RTF (Rich Text Format)
  • TXT (text)
Using the FOP library, you will be able to process any XSL-FO and render it to PDF. The library is very simple and straight forward. Since all the details of formatting is in the XSL file, all we need is to make use of the transformation API to transform and generate the output. This content is stored as file on your file system. FOP makes use of JAXP. Have a look at the Java application to transform a simple XSL-FO document into PDF.
OutputStream outputPDF = new BufferedOutputStream(new FileOutputStream("hello.pdf"));

FopFactory ff = FopFactory.newInstance();
Fop fop = ff.newFop(MimeConstants.MIME_PDF, outputPDF);
TransformerFactory factory = TransformerFactory.newInstance();

Transformer transformer = factory.newTransformer();
Source src = new StreamSource(new File("hello.fo"));
Result res = new SAXResult(fop.getDefaultHandler());
transformer.transform(src, res);
For your reference, you may download the sample application. Like I mentioned, the main advantage will be to use XSL for formatting and designing the templates of different and simple use the API to generate the complete report. This way, all reports will have consistency and it will be easy to modify the reports style if necessary.

2 comments :

Anonymous said...

What's the status of FOP? Is apache going to really start moving on it. I've been using it for years but stopped last year since it seemed dead. It's an awesome framework, just wish it would start moving again.

Unknown said...
This comment has been removed by a blog administrator.