[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ghostscript/GhostPDL 9.22 Release Candidate 1

From: Ken Sharp
Subject: Re: Ghostscript/GhostPDL 9.22 Release Candidate 1
Date: Thu, 21 Sep 2017 14:58:08 +0100

At 14:43 21/09/2017 +0200, Knut Petersen wrote:

The fonts in the pdfs are identical fonts constructed by ghostscript on the fly, I think it was Ken Sharp who explained to me some years ago that the term "subset" is wrong ;-)

Well, sort of, they aren't identical though, they are all different, but yes constructed fonts. But if you set SubsetFonts=false, then I'd expect the full font to be embedded, regardless of which glyphs are used.

That's not quite the same as constructing a fully populated new font, but it may well explain why SubsetFonts=false isn't having the result I'd expect.


I thought that when you set bigpdfs, you used 'show' instead of 'glyphshow', and that takes you down a totally different code path, where Ghostscript/pdfwrite *doesn't* construct a font. It only does that if the PostScript uses glyphshow, because there is no glyphshow in PDF. Instead it just uses the font it has. If you don't subset the font then it also doesn't re-encode it (which is important fo your workflow).

So, if you aren't using glypshow, then the logic is different, and the fonts really are subsets. Except that they shouldn't be, because -dSubsetFonts=false says to embed the entire font.

I haven't had the time to check what's actually going on yet, I've had to go back to working. I'm reasonably certain you aren't using glyphshow, because if you were pdfwrite would create fonts with different encodings, and this hack wouldn't work, you'd get the wrong output. So, in this case, it is correct to call the fonts subsets. The problem is, they shouldn't be subset.

One emmentaler font + three encodings + one character (scaled to invisibilty) of each encoding used prior to anything else in the ps leads ghostscript to produce three different subsets ;-)) of the emmentaler font in every pdf. But the set of 3 "subsets" is identical in any pdf that is produced this way, and so gs is (was) able to remove the duplicates. That's the --bigpdf trick.

That's not what I see, nor what I would expect. Unless you are using glyphshow, but if you were doing that then I believe the encodings would differ significantly and you would get collisions in the encodings, which would mean the bigpdf trick would produce garbled output.

The PDF files you supplied each contain 1xEmmentaler-20 font, and each one has a FontFile (the actual data) of a different size. So the fonts in each case are, actually, different. Again I haven't checked (and its probably not worth it) but the subsets certainly don't contain the full set of glyphs and probably only contain the glyph descriptions of the glyphs that were used.

I don't disagree with the expectation, but what you expect isn't what's in the files.

That doesn't prevent the trick you are using from working, because all the fonts have the same name, so if you don't consider the filenames and font object numbers, then Ghostscript (falsely) considers them to be the same font. Provided the Encoding is the same (or at least compatible, and pdfwrite checks that) for each of the fonts, they can safely be treated as the same font.

We only gather the glyph descriptions as they are used because, in PostScript, its possible to incrementally download a font, so the glyph description might not be available until its used. So we can happily copy the used glyphs from instance 'A' of the font and instance 'B' of the font (at this point we think they are the same font, possibly with some glyphs added since we last looked at it), and combine them into one final destination font.

Now as long as there are no character encodings in the 2 fonts which have different glyphs at the same character code, everything is fine. The problem arises if you have two fonts with the same name, but *different* glyphs at the same code point. Because we think they are the same font, when we see the second use of the code point, we *don't* copy the glyph. We see that we already have a glyph at that location, and it must be the same one, because this is the same font, right ? So we use the existing one.

You get away with this because, in your workflow, there are no collisions in encoding with the various fonts. If you were using glyphshow I'm fairly certain this would not be the case.

However, what if you used the same font in TeX ? I don't necessarily mean the Emmentaler font, I note that there's a font called something like TeXGyreSchola-Regular in the Lilypond files too, and that will be getting the same treatment as Emmentaler-20. If someone used that font in TeX itself, then potentially there's a problem. You could end up with the encodings colliding and get the wrong glyph when the PDF file is rendered.

Obviously I'm not sure this is a valid concern, I presume for your special case of creating documentation it isn't, but in the general case I would think it would be.

I agree that mutool clean can be a good starting point. If I read the documentation correctly, it does "clean" (remove) unused objects, but it is unable to subset fonts if not all glyphs of the fonts are used?

Not exactly (caveat; I am not a MuPDF developer, so I could be wrong). It will never subset the fonts, it just removes unused and duplicate objects. The 'problem' is that it only considers objects to be identical, and therefore candidates for removal, if they are, well... identical.

The PDF files you have created from the Lilypond EPS files contain fonts which are not identical. They are, at least in some sense, subsets. As I said, I'm not entirely sure why at the moment. I'll have to walk through the code in a debugger to see what's going on there, and its complicated, so it will take some time.

But that's why you get no benefit at all from running the final file through Mutool, each of the FontFile streams is different, so Mutool correctly decides they are not identical. Ghostscript really ought to do so as well and indeed, it now does so by default.

lilypond spawns ghostscript. If our --bigpdf option is used the command is e.g.:

   gs -q -dSAFER -dEPSCrop -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH -r1200 -dSubsetFonts=false -sDEVICE=pdfwrite -dAutoRotatePages=/None -sOutputFile=testa.pdf -c.setpdfwrite -ftesta.eps

Yep Masamichi-san already mailed me that, good to have it confirmed though. In broad, its the same as the command I used. -r1200 is possibly larger than I would use, its only needed if there are gradients or transparency though, and there won't be transparency, because this is PostScript.

I did mention it at the end of my email (I see you commented on it later as well), if I run the 9.21 release, then I do get a single font out, its still not an entire font. If I run the 9.22 release, then I get *3* Emmentaler fonts out, each of which is larger than the one in the 9.21 output, none of which is a complete font.

So as I said, this is an area which has changed, it may be that even if I put back the PDFDontUseObjectNums hack then you won't get the improvement you did before. Even if you do, its some evidence to add to my warnings about this. Things change frequently in this area, its inherently fragile because its loaded with heuristics, and it probably isn't something I can realistically hope to preserve in the long term.

Hmm, actually, going back to the 9.21 release does produce at least similar behaviour, whereas the 9.22 release does not. In 9.22 I get three fonts output instead of 1. I've no idea why currently, and right at the moment I don't have time to look.

I'll try and remember to look at it when I am not drowning under support, but it looks like there have been changes in this area unrelated to the PDFDontUseObjectNum bug, and that in itself may mean that your process doesn't work any more, or works less well.

Thanks for you patience!

I'm afraid its going to be at least next week now, and that's likely to disappear in testing the next release candidate. Fixing a couple of the problems that turned up in RC1 caused differences in about 1/3 of our test suite. That means manually examining hundreds of pages of bitmaps :-(

Looking at the RC1 bitmaps took 2 of us three or so days to complete, so by the time we finish fixing the regressions, build a new RC2, run the tests and gather the output, then examine the bitmaps that's probably all of next week gone.

If I'm 'lucky' the final couple of problems won't get fixed for a few days, and I should get a little time to look at this before the testing starts again. Depends what happens with customer support and that's been just crazy this week.

Right at the moment, you're probably going to have to leave this with me. It might be useful for one of you to get hold of the RC1, patch and rebuild GS (or point it to a modified ghostpdl/Resource/Init directory with the -I switch) and test to see whether this behaviour even still works for you at all with the new release.

I tried it here myself and it did appear to work, but I'm not entirely sure I trust that. Also I was (obviously) using the very reduced set of files Knut sent me, so that may not be a sufficient test.

The commit with the change is here:;a=commitdiff;h=ca1ec9b486ddba3f921355fd1d775f27f4871356

Just remove 1 line (dup //null eq) and replace the 6 lines that were deleted. You can drop the comment lines but its probably easier just to copy the lot. I *think* that will work.

By the way, you could always do this yourself to your own Ghostscript installlation anyway.


PS according to the font Knut sent, its 160 KB, but the font stream in the EPS files is only 65KB. Don't know if that means anything, possibly the TrueType portion is the other 100KB. I'd have to decipher the OTF font and it doesn't seem worth the effort, since that's not really the problem. 100KB seems like a lot though.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]