Like many of you (yes, I know who you are), I’ve used Adobe’s Acrobat Portable Document Format - PDF - many times. How can you not? In those days when it was hard to share documents, PDF made things so much easier. They are a great (free) way to distribute files which describe the layout, fonts, graphics and text in flat documents but there are interactive versions, too, and in particular, so-called AcroForms which allow users to enter form data and save it. That’s what this brief article is about - editable PDF files and in particular, how to read them in C#.
If you do a quick Google of which libraries are available, you will come up with a few possibilities but in my opinion, it comes down to using iTextSharp. The two options are the freely usable version (
126.96.36.199) and the one you are meant to pay for (
5.5.6) which comes with lots of support, has fixed lots of bugs and has no further potential licensing issues. Clearly then, we’ll go for version 4! :-)
Getting hold of version
188.8.131.52 of iTextSharp is easy if you use the NuGet Package manager in Visual Studio. Go to the menu:
Tools -> NuGet Package Manager -> Manage NuGet Packages for Solution and fill the fields in as in the image below (see yellow highlighting). You can see I have looked online for itextsharp and picked the one with the title: “iTextSharp, a .NET PDF library“.
We’ve got the package DLLs as part of our project, but don’t forget to reference the classes you will need as below.
This example is strictly only interested in form data and for illustration purposes, I am not going to get it in any particular order or do anything useful with it.
var reader = new PdfReader(@"G:\\OoPdfFormExampleFilled.pdf");
As you can see, we simply open up the PDF file and then iterate over each of the keys before extracting the field data for it using
GetField(). Sadly, the class
PdfReader doesn’t support
System.IDisposable so you must remember to close the file and can’t use a using statement to envelope everything.
Key: "Given Name Text Box " Value: "Stephen"
You will notice that checkboxes have values which are
"Yes" (Groan: I know, I know…) and all others can be treated as text. Pretty simple and a testament to how well this library handles things for you.
So far, in my limited use, I haven’t had any real problems or encountered bugs, but of course, they are there. Use this with some caution but if it isn’t mission critical, I don’t think you can go far wrong.
Hi! Did you find this useful or interesting? I have an email list coming soon, but in the meantime, if you ready anything you fancy chatting about, I would love to hear from you. You can contact me here or at stephen ‘at’ logicalmoon.com