bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #66323] [mom] rendering differences between PostScript and PDF outp


From: G. Branden Robinson
Subject: [bug #66323] [mom] rendering differences between PostScript and PDF output?
Date: Mon, 14 Oct 2024 13:31:07 -0400 (EDT)

Follow-up Comment #15, bug #66323 (group groff):

Hi Deri,

At 2024-10-14T12:00:32-0400, Deri James wrote:
> Follow-up Comment #14, bug #66323 (group groff):
> 
> First a comment on a mom difference between grops and gropdf post
> 1.23.0, when both used the same method, ascii-ing the pdfmark
> parameters by calling pdfmomclean, (pdf.tmac had an eqivalent pdfclean
> routine which did a similar job). In HEAD, mom no longer uses
> pdfmomclean for pdf output the whole "dirty" string is passed to
> gropdf, whereas it is still used for grops.
> 
> What this means is that gropdf more intelligently "cleans" the string
> than is possible by using asciify, in particular the \[uXXXX]
> characters are converted to UTF-16, any glyphs (\C, \N, etc) are
> converted back to ascii, and spaces are handled.

Right.  I recall feeling (and perhaps saying) that this is a good model
for grops to emulate, at least insofar as \[uXXXX] goes...\C and \N
conversion seem like gravy to me.

> [comment #13 comment #13:]
> > Maybe; Deri can speak better to that.  
> 
> If you look at -Tps -Z output for good.mom the Author/Title have been
> cleaned (asciify) so unicode has been dropped, but for pdf the dirty
> string is passed and pdfinfo shows:-
> 
> Title:          N????: stdarg.h wording...
> Author:         наб, seb, rCs,
> Creator:        groff version 1.23.0.2178-b25f0
> Producer:       gropdf version 1.23.0.2178-b25f0
> 
> So this is definitely not an issue, except for postscript pdfmark not
> supporting unicode.

Ah, good--thanks!

> > Personally, I'm resigned to living with the resurrected "can't
> > output node in transparent throughput" diagnostic until we get all
> > the machinery required to eliminate it sorted out.
> 
> No longer occurs with -Tpdf, since asciify no longer used.

Right, but 'ps' is still the default output device--or the default
default--and so I expect people still to encounter it with some
frequency and to expect an explanation with why their beautiful, perfect
documents sometimes provoke diagnostics.

> > That involves not just #62264, but bug #63074 (still a 1.24 release
> > goal) and maybe bug #64484, which like many of our tickets has
> > sprawled in scope; at this point it seems essentially a duplicate of
> > #63074 in its central purpose, but Deri and I have perhaps not
> > reached a full meeting of the minds on some auxiliary issues.  (He
> > would like to see explicit horizontal motions discarded from device
> > extension command arguments automatically by the formatter; I would
> > not [because as input validation goes, I'm an irascible,
> > white-gloved barracks inspector]. 
> 
> I don't remember saying this, perhaps you can point me in the right
> direction.

Comment #26 in bug #64484.

https://savannah.gnu.org/bugs/?64484#comment26

> I do remember telling you I wanted 'x X' output untouched by the
> formatter,

You mean via `.output` and `\!` rather than `.device` and `\X`?

If so, yes, I remember that clearly and agree.  `.output` and `\!` are
formatter escape hatches--you get total power with total responsibility.

> just passed as a string (copy-in mode?) the same as it was in 1.23.0.

I've tried to migrate our terminology away from "copy-in mode" (because
"copy-out" mode, or "copy-" with any other preposition, either doesn't
exist or is never named thus).  I favor simply "copy mode", but yes.  In
"copy mode", the formatter is a fairly dumb robot that reads characters
from the input stream and copies them someplace else, without attempting
to convert them into lexical tokens, let alone nodes.  This is an
oversimplification because sometimes copy mode gets applied to things
that have _already_ been tokenized or coverted to nodes, and some things
_have_ to be tokenized because it's difficult to manage certain
non-printable code points in C/C++, especially given GNU troff's
historical support for CCSID 1047 ("EBCDIC").

And yes, that robotic copying is what we want for "transparent output".
I wish I could come up with a better name.  The word "transparency" has
been ruined by people who can't competently design interfaces.  They
were too busy maximizing synergies to leverage economies of scale and
giving 110% at the weekly office pizza party/mandatory unpaid overtime
session.

> #63074 is not required for 1.24 since all the advantages this would
> bring, are already available in pdf output now, and you have known
> that even before you started your long (oft complained of) travails
> with this self imposed wish.

Touché.

It's there but it's not as clean as it should be.  We don't have parity
between `device` and `\X` in one respect.

There's a commented-out test in
"src/roff/groff/tests/device-control-special-character-handling.sh":


#echo "checking practical bookmarking with device request" >&2
#printf "%s\n" "$output" \
#  | grep -Fqx 'x X ps:exec 7:device [/Dest /pdf:bm1 /Title (Caf\[u00E9]
Hyphen-Minus and \\[u2010]) /Level 1 /OUT pdfmark' \
#  || wail


When I uncomment it and "make check":


FAIL: src/roff/groff/tests/device-control-special-character-handling.sh


...and when we run the test manually to see what went awry (or look at
the log), the following lines leap out to the trained eye:


# A more practical case, suggested by Deri James.

input='.
.ds h Caf\['"'"'e] Hyphen-Minus and \[rs]\[u2010]
...
.device ps:exec 3:device [/Dest /pdf:bm1 /Title (\*[h]) /Level 1 /OUT pdfmark
...
# Test the same thing, but with a composite special character escape
# sequence.

input='.
.ds h Caf\[e aa] Hyphen-Minus and \[rs]\[u2010]
.device ps:exec 7:device [/Dest /pdf:bm1 /Title (\*[h]) /Level 1 /OUT pdfmark


The output (in part)?


x X ps:exec 3:device [/Dest /pdf:bm1 /Title (Caf\[u00E9] Hyphen-Minus and
\\[u2010]) /Level 1 /OUT pdfmark
troff:<standard input>:6: error: composite special character escape sequences
not yet supported in device extension command arguments
x X ps:exec 7:device [/Dest /pdf:bm1 /Title (Caf\[e aa] Hyphen-Minus and
\\[u2010]) /Level 1 /OUT pdfmark
checking practical bookmarking with device request
...FAILED


I want that fixed.  Non-orthogonality is the devil.

> > But I acknowledge that without a string iterator (#62264 again),
> > most users will simply have to endure diagnostic messages about
> > them.  
> 
> If using -Tps.

Acknowledged.

> > I wouldn't wish the tedium of composing things like "an.tmac"'s
> > ellipsifiers on anyone, not even those calling for me to be dragged
> > before a war crimes tribunal in The Hague for removing an
> > undocumented macro without a deprecation cycle. ;-) )
> 
> And removing a line simply because you did not know what it did!!

I tested it.  But I don't think I A/B compared it to BSD ms behavior.
Maybe back then I didn't yet have them in my ever-growing archive of
historical exhibits.

> > > * "special characters aren't getting into 'grout' at all."  I take
> > > it this is a separate problem from the grops one that is now the
> > > focus of this ticket, since grops takes grout as input.
> > 
> > This isn't a defect if nothing has yet taken on the responsibility
> > of encoding special characters in PDF bookmarks in PostScript output
> > in the first place.  I had assumed that was the case, in part
> > because of the surprising spelling of the device extension tags
> > exercising the "pdfmark" command word ("ps: exec"), but if these are
> > "ps: exec"s that _grops_ will in fact never see in the first place,
> > then my observation is a potential feature request at best.
> 
> Correct. PDFMARK is an extension to the postscript language which
> allows distillers to include pdf features, I am not sure if it even
> allows UTF-16 strings, although pdf detects UTF-16 with an initial
> BOM, so if ghostscript treats pdfmark input as an array of 8-bit codes
> and plomps them straight into the pdf there is at least a chance it
> may work, given the correct input.

Aha!  With (many) repeated applications, this stuff begins to soak even
into _my_ brain.

> > That moves it back to _mom_, so reassigning to Peter--but I'm
> > leaving the status as "Need Info" because it's not clear to me that
> > it is actually true that any discrepancy between _mom_'s PostScript
> > and PDF output remains; наб noted in comment 11 to bug #66322 that
> > the page size differences observed were due to a Debian patch to its
> > _groff_ package.
> 
> It is definitely not mom, this is a question of minute differences in
> the positioning of text on the page between grops/ghostscript and
> gropdf. The example shown in comment #10 (appears to be a diffpdf
> output)  is a little misleading since it is apparent the OP is testing
> his own created grops fonts (from /gsfonts using afmtodit) with the
> stock devpdf fonts (which are a copy of the original devps groff
> fonts). The gsfonts contain many more kern pairs, which affect the
> output, so its comparing apples with pears.
> 
> However, I am still investigating, and when I compare apples, there
> are still differences. Initially it looks like a tiny bug in
> ghostscript which introduces .11 of a point difference in the vertical
> positioning of all text, and it specifies colours to six decimal
> places, with the 3 least significant containing garbage. I don't
> currently believe my findings, and I am currently documenting them in
> an email to the groff list in the hope I have made a mistake!!

Looking forward to reading and learning more about the stage of the
sausage factory closer to the loading dock than the section I walk.

> Since the grout produced on the same mom document by both -Tps and
> Tpdf is the same in all respects regarding text positioning the issue
> is definitely not mom.

Okay.  Well, when you reach the point you want to seize this ticket for
gropdf, just do so, or ask me to.

> > Who else needs a beverage after all that?
> 
> A pint of "Old Peculiar", since you're offering - thanks.

It's a real thing!  Not quite as light-suckingly black as I feared.



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66323>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]