New Project – Specialized PDF Tool

Ok, we’re going to have to put The Books on the back burner because something new has come up! Here’s the background.

I work at a law firm, and it’s a frequent task to work with exhibits, which are documents submitted to a court as evidence of something. These documents are all in the form of individual PDFs. It’s customary to add a sheet to each PDF that says Exhibit x, where x is an increasing number. The PDFs are then ordered by their exhibit number, usually by adding the exhibit number to the name, such that “Jan Invoices.pdf” becomes “01 Jan Invoices.pdf.”

We then use the pro version of Adobe Acrobat to Bates-stamp the entire set of PDFs. To Bates-stamp a set of documents means to add an incremental page number to the entire set, not restarting at page one for every document. All of this is done quickly and easily, but then comes the tedious part — creating the index.

The index is a table with at least three columns: starting page, ending page, and document name. It’s tedious because you have to open every document to see the starting page number. There’s no column in File Explorer for the number of pages a PDF (or any kind of file) has. You can make this somewhat easier by using preview mode in File Explorer, but if you have fifty exhibits, this is not a fun job.

There must be a better way!

I visualized creating, at the very least, a listing of files with page counts. Of course, this would be on a form, using a DataGridView, with a toolbar with a button you could push to select the folder plus a textbox where you could type in the folder name. Something like that. So let’s go ahead, fire up Visual Studio, and create a project called PDFLegal. Well, that’s what I called it.

Now there are two questions. How are we going to access the PDFs to determine their page count? And how are we going to store this information?

To manipulate PDFs, a quick look through Google shows that about the best choice out there is iText7. It’s open source, well-established, and was recently updated from 5 to 7. 6 was afraid to show up because 7 8 9. LOL. Anyway, we’ll need to use our Nuget package manager to install it.

While I was at it, I remembered how awful the standard folder picker was. It’s a tiny window that can’t be resized. It only has a tree view, and you can’t type in a path. Its usability is just terrible, so I was very happy to discover there’s a better option if you install WindowsAPICodePack-Shell. This offers a file and folder picker that is much improved.

Here’s what it should look like when you’re done installing the two Nuget packages. When you install itext7, you get Portable.BouncyCastle and the two Common.Logging, which require Microsoft.CSharp. When you install the API Code Pack, you install the shell, which pulls in the core.

Now, to the second question of how are we going to store the information. We’re going to read in a folder, get the names of all the PDFs in the folder, store them somewhere, and then add the number of pages each PDF has to this somewhere, which should be suitable as a data source for our dgv. So what is this somewhere?

I was too busy doing research to document the process, but I assure you I looked into lists, dictionaries, structs, arrays, etc., and finally concluded the best way was to use PODs, Plain Old Datasets (sorry, don’t think POD is a thing, it’s not). But seriously, you can create a dataset without having to have a database or an object or anything. To do this, don’t use the Data Sources tab to add a new data source. Instead, use the project’s Add New Item feature, and select DataSet. This will create an empty dataset. Right-click on the canvas and select DataTable (not TableAdapter). You can see at the right the table I created.

Next, we’ll create a form and start working on the code.

Leave a Reply

Your email address will not be published. Required fields are marked *