Reading PDF Form Data in C#

Like many of you (yes, I know who you are), I’ve used Adobe’s Acrobat Portable Document Format – PDF – many times. How can you not? When it used to be hard to share documents because reading Word files was a bit hit and miss, PDF made things so much easier.

They are a great (free) way to distribute files which describe the layout, fonts, graphics and text in flat documents but there are interactive versions, too, and in particular, so-called AcroForms which allow users to enter form data and save it. That’s what this brief article is about – editable PDF files and in particular, how to read them in C#.

If you do a quick Google of which libraries are available, you will come up with a few possibilities but in my opinion, it comes down to using iTextSharp. The two options are the freely usable version (4.0.3.0) and the one you are meant to pay for (5.5.6) which comes with lots of support, has fixed lots of bugs and has no further potential licensing issues.

Clearly then, we’ll go for version 4! 🙂

An Example Form

First things first – I need a form. I found one here (courtesy of Foersom Engineering Solutionsthank you) and filled it in.

An example PDF form with entries filled in.

Filled in form.

Downloading and Installing

Getting hold of version 4.0.3.0 of iTextSharp is easy if you use the NuGet Package manager in Visual Studio. Go to the menu:

Tools -> NuGet Package Manager -> Manage NuGet Packages for Solution

and fill the fields in as in the image below (see yellow highlighting). You can see I have looked online for itextsharp and picked the one with the title: “iTextSharp, a .NET PDF library“.

How to install the package using NuGet

Installing the package.

Next, click Install then OK and Close.

The Using Statement

We’ve got the package DLLs as part of our project, but don’t forget to reference the classes you will need as below.

Traversing the Forms Data

This example is strictly only interested in form data and for illustration purposes, I am not going to get it in any particular order or do anything useful with it.

As you can see, we simply open up the PDF file and then iterate over each of the keys before extracting the field data for it using GetField().

Sadly, the class PdfReader doesn’t support System.IDisposable so you must remember to close the file and can’t use a using statement to envelope everything.

The Output

You will notice that checkboxes have values which are "Off" or "Yes" (Groan: I know, I know…) and all others can be treated as text. Pretty simple and a testament to how well this library handles things for you.

Final Thoughts

So far, in my limited use, I haven’t had any real problems or encountered bugs, but of course, they are there. Use this with some caution but if it isn’t mission critical, I don’t think you can go far wrong.

twittergoogle_plusredditpinteresttumblrmail

Written by Stephen Moon
email: stephen at logicalmoon.com
www: https://www.logicalmoon.com


This entry was posted in: c#. Bookmark the ➜ permalink.

One thought on “Reading PDF Form Data in C#

Leave a Reply

Your email address will not be published. Required fields are marked *