Skip to content

mule-pdfbox-module

Empower your MuleSoft flows with native PDF manipulation powered by Apache PDFBox. This connector provides high-performance PDF operations with no external dependencies.

PDFBox Module Icon

PDF Manipulation

  • I would like to delete blank PDF pages before submitting to MuleSoft IDP to make the response more manageable and cheaper?
  • Yes sure use Apache PDFBox - Split Pages

Value Adds

Key Features

  • Metadata Extraction – Get author, title, number of pages, and more.
  • Text Extraction – Pull text from a specific range of pages.
  • Blank Page Removal – Clean your documents before delivery.
  • Page Rotation – Rotate document pages as needed.
  • PDF Splitting – Break large PDFs into separate single-page files.
  • PDF Merging – Combine multiple PDFs into a single cohesive document

Built For Developers

  • Lightweight, single-dependency module
  • Designed using MuleSoft Java SDK
  • Input/output via standard Java streams

Under the Hood

  • Built using Apache PDFBox
  • Fully compatible with Mule 4.x
  • Handles page ranges and robust PDF parsing

Implemented Operations:

1. extractPdfInfo

  • Purpose: Extracts document metadata such as number of pages, author, title, subject, and version.
  • Input: InputStream of the PDF.
  • Output: POJO with document properties.
  • Under the Hood - PDFDocumentInformation

2. extractTextByPageRange

  • Purpose: Extracts plain text from a given page range.
  • Input: PDF stream + optional startPage / endPage.
  • Output: Extracted text as a string.
  • Under the Hood - PDFTextStripper

3. filterPages

  • Purpose: Removes blank pages and/or filters based on a page range.
  • Mechanism: Detects blankness using text visibility, annotations, and embedded images.
  • Parameters: Page range, remove blank pages flag.
  • Output: Filtered PDF stream.

4. rotatePages

  • Purpose: Rotates pages within a specified range clockwise or counterclockwise.
  • Parameters: Page range, rotation direction.
  • Output: Modified PDF stream.
  • Under the Hood - setRotation

5. splitPages

  • Purpose: Splits a PDF into individual pages.
  • Output: A list of InputStreams, each containing a single-page PDF.

6. mergePdfs (New 1.0.1)

  • Purpose: Combines two or more PDF documents into one.
  • Input: A list of PDF InputStreams.
  • Output: A single merged PDF stream with extracted metadata.
  • Under the Hood: PDFMergerUtility + RandomAccessReadBuffer

Released under the MIT License.