Reading PDF Form Data in C#
Like many of you (yes, I know who you are), I’ve used Adobe’s Acrobat Portable Document Format - PDF - many times. How can you not? In those days when it was hard to share documents, PDF made things so much easier. They are a great (free) way to distribute files which describe the layout, fonts, graphics and text in flat documents but there are interactive versions, too, and in particular, so-called AcroForms which allow users to enter form data and save it. That’s what this brief article is about - editable PDF files and in particular, how to read them in C#.
If you do a quick Google of which libraries are available, you will come up with a few possibilities but in my opinion, it comes down to using iTextSharp. The two options are the freely usable version (4.0.3.0
) and the one you are meant to pay for (5.5.6
) which comes with lots of support, has fixed lots of bugs and has no further potential licensing issues. Clearly then, we’ll go for version 4! :-)
An Example Form
First things first - I need a form. I found one here (courtesy of Foersom Engineering Solutions - thank you) and filled it in.
data:image/s3,"s3://crabby-images/06fc7/06fc702cf31d895845d4f763b9de8943ec6073c0" alt="Filled in form"
Downloading and Installing
Getting hold of version 4.0.3.0
of iTextSharp is easy if you use the NuGet Package manager in Visual Studio. Go to the menu: Tools -> NuGet Package Manager -> Manage NuGet Packages for Solution
and fill the fields in as in the image below (see yellow highlighting). You can see I have looked online for itextsharp and picked the one with the title: “iTextSharp, a .NET PDF library“.
data:image/s3,"s3://crabby-images/90c89/90c897b0f36d921118780e15b38edfc5e07d8a98" alt="How to install the package using NuGet"
Next, click Install
then OK
and Close
.
The Using Statement
We’ve got the package DLLs as part of our project, but don’t forget to reference the classes you will need as below.
using iTextSharp.text.pdf; |
Traversing the Forms Data
This example is strictly only interested in form data and for illustration purposes, I am not going to get it in any particular order or do anything useful with it.
var reader = new PdfReader(@"G:\\OoPdfFormExampleFilled.pdf"); |
As you can see, we simply open up the PDF file and then iterate over each of the keys before extracting the field data for it using GetField()
. Sadly, the class PdfReader
doesn’t support System.IDisposable
so you must remember to close the file and can’t use a using statement to envelope everything.
The Output
Key: "Given Name Text Box " Value: "Stephen" |
You will notice that checkboxes have values which are "Off"
or "Yes"
(Groan: I know, I know…) and all others can be treated as text. Pretty simple and a testament to how well this library handles things for you.
Final Thoughts
So far, in my limited use, I haven’t had any real problems or encountered bugs, but of course, they are there. Use this with some caution but if it isn’t mission critical, I don’t think you can go far wrong.
Hi! Did you find this useful or interesting? I have an email list coming soon, but in the meantime, if you ready anything you fancy chatting about, I would love to hear from you. You can contact me here or at stephen ‘at’ logicalmoon.com