Apache PDFBox - Image to PDF
Operation Name
Apache PDFBox – Image to PDFimageToPdf
Description
Converts a single image (e.g., JPG, PNG, GIF, BMP) into a one-page PDF document. The resulting PDF has a single page sized to match the dimensions of the source image.
Perfect for embedding scanned receipts, screenshots, or images into a PDF workflow.
Inputs
- Image File [Binary] (
InputStream) Binary content of the image (JPEG, PNG, etc.).
Output
- Payload:
InputStream(binary stream) A one-page PDF containing the provided image. - Attributes:
PdfBoxFileAttributesMetadata of the generated PDF: number of pages (always 1), file size, creation date, etc.
Notes
- The generated page is automatically sized to the image's pixel dimensions (no scaling).
- You can adjust scaling or margins later by combining this with other operations (e.g., rotatePages, filterPages).
- Currently supports formats loadable by
PDImageXObject(PNG, JPEG, BMP, GIF). - Always outputs a single-page PDF regardless of the image dimensions.
Underlying Application Interface
Pseudo Code
Operation: imageToPdf
Input:
imageStream: Binary content of the image (InputStream)
streamingHelper: MuleSoft StreamingHelper (for context/utilities)
Output:
Result containing:
A single InputStream representing the new one-page PDF document.
PdfBoxFileAttributes containing metadata of the generated PDF (page count, size, etc.).
Errors:
PDF_PROCESSING_ERROR: If the input is not a valid image or embedding fails.
PDF_METADATA_EXTRACTION_FAILED: If metadata cannot be retrieved from the generated PDF.
Steps:
Convert the imageStream to a byte array.
Create a new empty PDDocument.
Try to create a PDImageXObject from the byte array inside the document.
If this fails, throw a ModuleException with PDF_PROCESSING_ERROR.
Get the width and height of the image from the PDImageXObject.
Create a new PDRectangle using the image's width and height.
Create a new PDPage with this rectangle and add it to the document.
Create a PDPageContentStream bound to the new page.
Draw the image on the page at coordinates (0,0) with full width and height.
Close the content stream.
Create a ByteArrayOutputStream.
Save the PDF document to the ByteArrayOutputStream.
Try to extract metadata from the PDF (using extractPdfMetadata).
If extraction fails, throw a ModuleException with PDF_METADATA_EXTRACTION_FAILED.
Convert the ByteArrayOutputStream to a ByteArrayInputStream.
Create a new Result object containing:
The ByteArrayInputStream as output.
The PdfBoxFileAttributes as attributes.
Return the Result object.
In a finally block, ensure the PDDocument is closed to release resources.Methods used from the Apache PDFBox library
📂 Classes & Methods from PDFBox
PDDocumentnew PDDocument()→ creates a new empty PDF document.addPage(PDPage page)→ adds a new page to the document.save(OutputStream)→ saves the document to an output stream.close()→ releases resources held by the document.
PDImageXObjectPDImageXObject.createFromByteArray(PDDocument doc, byte[] imageData, String name)
→ creates a PDF image object from raw image bytes.
PDRectanglenew PDRectangle(float width, float height)→ defines a rectangle (page size) with the same dimensions as the image.
PDPagenew PDPage(PDRectangle mediaBox)→ creates a new page with the given dimensions.
PDPageContentStreamnew PDPageContentStream(PDDocument doc, PDPage page)→ creates a content stream to draw onto a page.drawImage(PDImageXObject image, float x, float y, float width, float height)→ draws the image onto the page at(x, y)scaled to width/height.close()→ finalizes and closes the content stream.
Supported Image Formats:
- JPEG / JPG
- PNG
- BMP
- GIF
