A new library has been released.

lawrence@lemmy.world · edit-2 1 year ago

A new library has been released.

Hazzard@lemm.ee · 1 year ago

Biggest mutant like this I ever made was a government requirement to export PDFs. Best way I could find to make PDFs from PHP was a library called wkhtmltopdf. Which, as the name suggests, converts html to pdf.

Installed a library to let me call a local install of wkhtmltopdf on the command line of the host machine. Wrote a ridiculous HTML template, with all kinds of weird styling and jank to support the older version of WebKit that wkhtmltopdf used, and then would save the output as a file. Then I would run wkhtmltopdf with that file as an argument.

Of course, I wasn’t done here. They required that I use their existing title page, appendices, etc. Only the data in the middle was to change. So I added a whole “PDF Data” table to the database, with storage locations for them to upload something like 10+ PDFs to append at the front and back of the PDF. Did I mention this whole thing supported two languages?

So then I implemented another command line library, called pdftk, or pdf toolkit. I used a crazy call to pdftk to append all of these to the front and back of the document, making these look like what they wanted. Save to that same folder, send the file to the client through PHP, and use my “command line from PHP” wizardry to rm all the files I’d made in my “cache” folder, as I called it.

But of course… we’re not done. Turns out appending files like this horribly breaks the PDF table of contents, which was apparently just using page numbers, not any kind of actual linking. Enter pdftk again, and now I’m running it before generating my HTML, on each and every PDF I’m going to add, to get the page count, and saving that value.

I’d then pass this crazy dictionary into my template and add “fake pages” to the start and end, with headings, and a special margin that wkhtmltopdf interprets as a page break. This even works to add my “additional documents” to the table of contents. Now, my pdftk append commands also deliberately trim the PDF, so as to replace the fake pages, keeping the page count the same, so the links work.

So close… but it turns out wkhtmltopdf doesn’t account for when the table of contents is so ridiculously long that it goes on for more than a whole page. Did I mention these PDFs are more than 300 pages long in many instances? Suddenly every link goes to the page after the one it’s supposed to, or even a couple after. Not good.

Yeah… this is the beginning of a nightmare where I add fake table of contents pages that I cut out later with pdftk. Which means I have to somehow know in advance how many pages the ToC will be… estimation time. That’s right, I run through all the data I generate the ToC with in advance, and count the number of entries I’m going to be adding to the ToC, and, by literally counting the entries on a full ToC and saving that as a magic number, guess how many pages there will be.

Oh, but what if a line is too long, and wraps to two lines in the ToC? Well, guess who counted the number of characters in a line to produce another guesstimate? No, neither of these heuristics were perfect, and they looked like a spaghetti mess, but with enough tinkering, I got numbers that worked on everything I tested.

And there you have it, 300+ page PDFs generated from the database, with all the title pages and such that they manually uploaded, in two languages, with a working table of contents. During my time, we never even added a cache to this monstrosity, it did all this every time the user clicked “Download PDF”. Took around 30 seconds, and the UI just pretended it was a really big file.

What a wild project, probably the biggest spaghetti mess I’ve ever written. But hey, actually met all the requirements, no matter how ridiculous, and I’m proud of that monstrosity. Probably still in use today.

Takumidesh@lemmy.world · 1 year ago

I just want you to know, I just spent the last 2 sprints dealing with wkhtmltopdf, only in .net, using razor pages and mvc views to generate the templates.

I feel your pain.

lawrence@lemmy.world · 1 year ago

I had a good experience with wkhtmltopdf, which was much better than the alternatives. Fortunately, I didn’t have to deal with the TOC, so I guess I was lucky.