XML to PDF Converter
XML to PDF Converter online.
XML to PDF Converter — The Complete 2025 Guide
Converting XML data into a polished, printable PDF document is a common need across publishing, enterprise reporting, invoicing, and data archiving. XML stores structured data; PDF delivers a fixed-layout, widely shareable document. This guide explains the best ways to transform XML into PDF — both visually attractive and programmatically repeatable — including tools, step-by-step workflows, code examples, styling tips, automation patterns, and troubleshooting.
Why Convert XML to PDF?
There are many reasons organizations convert XML into PDF:
- Readable reports: Turn machine-friendly data into human-friendly documents for stakeholders.
- Printing and archiving: PDFs are ideal for legal records and long-term archives.
- Automated document generation: Generate invoices, receipts, contracts, certificates from XML sources.
- Distribution: PDFs preserve fonts, layout, and pagination across devices.
Common Approaches (Overview)
There are three main approaches to convert XML → PDF:
- XSL-FO pipeline: Use XSLT to transform XML into XSL-FO, then render to PDF with a processor like Apache FOP or Antenna House.
- XML → HTML → PDF: Transform XML into HTML/CSS and then render to PDF with engines such as PrinceXML, wkhtmltopdf, or WeasyPrint.
- Programmatic generation: Use libraries (ReportLab in Python, iText in Java/.NET) to build PDFs directly from parsed XML.
Each method has trade-offs: XSL-FO is highly controllable with powerful pagination features; HTML/CSS is easier to style for web designers; programmatic generation gives ultimate control at the code level.
Method 1 — XSL-FO with Apache FOP (Recommended for structured documents)
XSL-FO (Extensible Stylesheet Language — Formatting Objects) is the traditional XML-to-PDF route. Workflow:
- Write an XSLT stylesheet that transforms your XML into XSL-FO.
- Pass the resulting
.foto an FO processor such as Apache FOP to produce PDF.
Simple XSL-FO example (invoice fragment)
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="A4" page-height="29.7cm" page-width="21cm" margin="2cm">
<fo:region-body />
<fo:region-after extent="1.5cm" />
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="A4">
<fo:flow flow-name="xsl-region-body">
<fo:block font-size="14pt" font-weight="bold">Invoice #12345</fo:block>
<fo:block>Date: 2025-10-25</fo:block>
<fo:table table-layout="fixed" width="100%" margin-top="10pt">
<fo:table-column column-width="60%"/>
<fo:table-column column-width="20%"/>
<fo:table-column column-width="20%"/>
<fo:table-body>
<fo:table-row>
<fo:table-cell><fo:block>Product A</fo:block></fo:table-cell>
<fo:table-cell><fo:block text-align="center">2</fo:block></fo:table-cell>
<fo:table-cell><fo:block text-align="right">$50.00</fo:block></fo:table-cell>
</fo:table-row>
</fo:table-body>
</fo:table>
</fo:flow>
</fo:page-sequence>
</fo:root>
Run Apache FOP
fop -fo input.fo -pdf output.pdf
Pros: precise control over pagination, page-numbering, footnotes, table handling. Cons: XSL-FO learning curve; styling is more XML-like than CSS.
Method 2 — Transform XML to HTML/CSS then render to PDF
If designers prefer CSS, convert XML → HTML (via XSLT) and style with CSS. Then use a renderer:
- PrinceXML — professional, powerful CSS Paged Media support (page-breaks, running headers/footers).
- wkhtmltopdf — uses WebKit, good for many HTML layouts.
- WeasyPrint — Python-based, modern CSS support.
Workflow
- Create XSLT that transforms XML into semantic HTML (use classes/IDs for styling).
- Apply CSS for print media (use @page rules for margins and headers).
- Render HTML → PDF using Prince/wkhtmltopdf/WeasyPrint.
Example (command using wkhtmltopdf)
wkhtmltopdf --enable-local-file-access input.html output.pdf
Pros: CSS styling, familiar to web developers; easier to create responsive layouts. Cons: Some PDF features (precise footnote handling) are trickier than XSL-FO; rendering differences across engines.
Method 3 — Programmatic PDF Generation (ReportLab, iText)
When you need absolute control or want to embed business logic into PDF generation, use libraries:
- Python: ReportLab (low-level canvas API), Pisa/WeasyPrint (HTML to PDF)
- Java: iText (commercial licensing for newer versions), Apache PDFBox (PDF manipulation)
- .NET: iTextSharp, PdfSharp
Typical flow: parse XML with a DOM/SAX parser, map fields to PDF elements, draw text/images/tables programmatically.
# Python (high-level sketch)
from lxml import etree
from reportlab.platypus import SimpleDocTemplate, Paragraph, Table
# parse XML, build story, then doc.build(story)
Pros: total control; integrates business logic. Cons: more development effort for complex layouts (pagination, flowing tables).
Step-by-Step Example: XML → XSLT → Apache FOP (Automated)
- XML data: your source (invoice.xml).
- XSLT: transform invoice.xml → invoice.fo (XSL-FO).
- FO Processor: run Apache FOP to produce invoice.pdf.
Shell automation
xsltproc transform.xslt invoice.xml > invoice.fo
fop invoice.fo invoice.pdf
Or call these steps from a script or CI pipeline. Many enterprises wrap the pipeline inside microservices or serverless functions for on-demand PDF generation.
Styling & Layout Tips
- Fonts: embed fonts in FO or CSS to ensure consistent rendering (Apache FOP supports embedded fonts).
- Pagination: plan content flow, use page-sequence-master in XSL-FO or @page in CSS for headers/footers.
- Tables: avoid very wide tables; specify table-layout and column widths; allow row-breaks across pages where needed.
- Images: ensure high-resolution images for print (300 DPI) and provide absolute/relative paths accessible to renderer.
- Localization: handle numeric/date formats and RTL languages in your transform.
Programmatic Example — XML to PDF using Python + XSLT + FOP
#!/usr/bin/env python3
import subprocess
from lxml import etree
# Transform XML -> FO using XSLT
xml = etree.parse('invoice.xml')
xslt = etree.parse('invoice.xslt')
transform = etree.XSLT(xslt)
fo = transform(xml)
with open('invoice.fo', 'wb') as f:
f.write(etree.tostring(fo, pretty_print=True, encoding='utf-8'))
# Run Apache FOP (must be installed)
subprocess.run(['fop', 'invoice.fo', 'invoice.pdf'], check=True)
This gives a repeatable, scriptable pipeline suitable for automation.
Batch Processing & Automation
For high volume, implement:
- Message queues (RabbitMQ, SQS) to accept XML jobs.
- Worker processes that run XSLT + FO or HTML → PDF conversions.
- Monitoring, retry, and human-in-the-loop review for low-confidence renders.
Tooling Summary
- Apache FOP: free, XSL-FO to PDF, production-proven.
- PrinceXML: commercial, best CSS Paged Media support.
- wkhtmltopdf: open-source, WebKit-based HTML to PDF.
- WeasyPrint: Python-based HTML/CSS to PDF.
- ReportLab: programmatic PDFs in Python.
- iText / iTextSharp: Java/.NET programmatic PDF library (license considerations).
Common Problems & Troubleshooting
- Poor pagination: use explicit page masters (XSL-FO) or CSS page-break rules.
- Missing fonts or garbled characters: embed fonts and ensure UTF-8 encoding.
- Images not found: use absolute paths or ensure renderer has access to image files.
- Performance: precompile XSLTs where possible and scale worker nodes for concurrency.
Privacy & Security
When generating PDFs from sensitive XML (invoices, medical records, PII):
- Prefer on-premise or private cloud rendering (avoid public online converters).
- Encrypt transport (HTTPS) and store final PDFs securely.
- Limit retention of intermediate artifacts (delete FO files after generation if not needed).
FAQs
Q: Which approach should I choose — XSL-FO or HTML?
A: If you need precise pagination, footnotes, or print-quality control, XSL-FO (Apache FOP) is best. If your team is already strong with HTML/CSS and wants rapid styling, use HTML → Prince/weasyprint/wkhtmltopdf.
Q: Can I convert XML to PDF for dynamic templates (invoices, certificates)?
A: Yes — parameterize your XSLT or HTML templates and generate per-document PDFs in an automated pipeline.
Q: Do I need to learn XSL-FO?
A: Not always — if your team prefers CSS, transform XML to HTML + CSS. But XSL-FO gives very fine control for print workflows.
Final Thoughts
Converting XML to PDF is a solved but nuanced problem: pick the right tool for your needs. Use XSL-FO and Apache FOP for print-grade documents and complicated pagination. Use HTML/CSS pipelines (Prince, wkhtmltopdf, WeasyPrint) for designer-friendly output. Choose programmatic libraries for highly customized or logic-heavy generation. Automate the pipeline for scale, secure it for privacy, and validate output against your business requirements.
If you want, I can generate a starter XSLT + XSL-FO stylesheet tailored to a sample XML (invoice, report, or certificate), or provide a ready-to-run Docker-based pipeline (XSLT + FOP) you can deploy instantly — tell me which document type and I’ll produce it.