[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

From: Ken Sharp
Subject: Re: Ghostscript/GhostPDL 9.22 Release Candidate 1
Date: Tue, 19 Sep 2017 15:03:14 +0100

At 15:44 19/09/2017 +0200, David Kastrup wrote:

Are there any example documents with thousands of pages and ten
thousands of PDF inclusions one could look at?

I would suggest that the fact you want to 'include' tens of thousands of PDF files to be the problem, really.

I appreciate you are trying to deal with an existing problem, but using Ghostscript to do something it wasn't intended for isn't really the best idea for solving the problem.

As I've said elsewhere there is a genuine bug which can be exposed doing what you want with Ghostscript and it would not surprise me if in the long run it causes you another problem.

It would be possible to write a tool which could reliably detect identical fonts in a PDF file, remove the duplicates and alter the references so that the PDF continued to work. In all honesty, if the problem is as important as you say, this is probably a better solution. A tailored program, specifically designed to solve a specific problem is much more likely to work reliably than trying to use a general purpose program, designed for a different problem.

That said, it would be quite a big job, and I'm not actually offering to take it on.

My suggestion, which may not be feasible, is to keep everything in an editable format until the last second

This is extracted from an email I decided earlier not to send:

While I can tell you a lot about PostScript and PDF I can't help you at all with TeX. In general, however, my experience of working with large documents is that the content should be maintained in the layout application native format until the last moment. Broadly speaking this is similar to keeping bitmap data in something like TIFF and only converting to JPEG at the last moment, and for similar reasons.

When you create a PDF you are discarding all the 'metadata' that describes the layout to the typesetting or layout application. Its all but impossible to recover that information once its been lost.

Your problem with multiple fonts pretty much exhibits that; once you've got the PDF file, a layout engine can't tell that all the fonts are the same. Ghostscript can't either, which is why it now doesn't strip the duplicates out. While I appreciate this is a problem for your particular use case, it is actually a considerable improvement for users in general.

Assuming that you are using TeX throughout for your documentation, then it seems to me that you should be creating your final document by appending the various TeX documents together and then producing a final PDF, instead of appending multiple PDF files.

Presumably you want to show some parts of Lilypond as well, so I would create EPS figures for those. It will of course increase the number of font inclusions again, but in the case of Lilypond I don't think that you can be merging the fonts anyway, because Lilypond always uses glyphshow, and pdfwrite will create a uniquely named font for each usage. So you aren't gaining any benefit from exploiting the Ghostscript bug with the Lilypond output.

So by maintaining the text and layout in TeX, inserting EPS figures as required, and only producing PDF as the last step in the process you would create a file which (as I understand it) would only contain a single instance of each font.

in short I'm not really suggesting that you change anything except your working practices, and maintain your files as TeX files rather than as PDF. Because I don't have any knowledge of your workflow (or TeX) I cannot say if this is reasonable, it may well not be.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]