Why PDF is complicated in Mirth?

This blog is about why PDF is complicated in Mirth? and How we can Split PDF in Mirth?.

Splitting of PDF, PDF parsing is complicated in Mirth because of the libraries used to perform few actions in it.

Basically, in JAVA if you want to split a PDF or parse a PDF content or manipulate data of a PDF content you would use couple of famous Libraries such as

  1. PDF BOX from Apache
  2. iText library (Licensed from 7.1 Version)

If you are going to use the unlicensed stable version on iText Library then the best version is 5.5.1

Problem of using PDF library in Mirth:

By Default, Mirth already uses these two libraries by default for its other functionality.

For Example: Mirth uses PDF BOX v.1.8.4 for the pdf viewer extension. If you are using a new version of PDF BOX library and provide it in anywhere in custom library or other locations it wont work.

Because Mirth do not identify which version of the library it has to select. You can see these two libraries in the following location shown in the screenshot below:

How to use PDF Box Library:

The Best library you can use to perform multiple functionality of PDF is using Apache PDF BOX library.

First, Mirth wants to read the font of the PDF that is suppose to manipulate. To, do that we need to add another library called fontbox-1.8.4 inside C:\Program Files\Mirth Connect\extensions\doc\lib location.

Then add this library path  in destination.xml in C:\Program Files\Mirth Connect\extensions\doc as <library type=”SERVER” path=”lib/fontbox-1.8.4.jar” />

Another approach:

If you really do not want to use the PDF Viewer functionality in mirth.

You can disable that extension and provide later version (v2.5) of PDF BOX library in the same way mentioned above.

Note: Using the PDF BOX library outside without following the above approach will not work. It will always throw error.

Is Splitting PDF possible inside Mirth?

Yes, certainly possible.

There is a Java program already written using Apache PDF Box library function https://gist.github.com/actsasflinn/4516ae1c322447bdc2634fab9240d70c 

I used the same function and converted to Javascript. Here is a sample of that code which is converted to EX4JS

Code converted from Java to Javascript – Splitting PDF

var AnyValueOfYyourChoice = ”;

for(q=0;q<data.length;q++){

var pdPage = org.apache.pdfbox.pdmodel.PDPage();
var inputDocument = org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(new java.io.File(globalMap.get(‘pdfReaderFilePath’)+$(‘originalFilename’)),null);
var stripper = new org.apache.pdfbox.util.PDFTextStripper();
var outputDocument = new org.apache.pdfbox.pdmodel.PDDocument();
var uuid = UUIDGenerator.getUUID();
var page;

for (page = 1; page <= inputDocument.getNumberOfPages(); ++page) {

stripper.setStartPage(page);
stripper.setEndPage(page);
var text = stripper.getText(inputDocument);
var p = new java.util.regex.Pattern.compile(DataNeedToBeCheckedFor);
var m = p.matcher(text);
if(m.find()){
var pdPage = inputDocument.getDocumentCatalog().getAllPages().get(page – 1);
outputDocument.importPage(pdPage);
}
}
var output_file = new java.io.File(globalMap.get(‘newPdfReaderPath’) +$(‘fileNameDocType’)+’_’+AnyValueOfYyourChoice+”.pdf”);
outputDocument.save(output_file);
outputDocument.close();
inputDocument.close();
}

Leave a Comment