groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] pdfmom grep (was parallel text processing)


From: Deri James
Subject: Re: [Groff] pdfmom grep (was parallel text processing)
Date: Sat, 09 Sep 2017 19:56:14 +0100
User-agent: KMail/4.14.10 (Linux/4.4.82-desktop-1.mga5; KDE/4.14.35; x86_64; ; )

On Sat 09 Sep 2017 09:51:27 Peter Schaffter wrote:
> On Sat, Sep 09, 2017, Ralph Corderoy wrote:
> > Hi Peter,
> >
> > 
> >
> > > The grep in pdfmom is returning a binary file hit when it encounters
> > > the diacritic in 
> > >
> > >   .ds pdf:look(pdf:bm1) L'étranger
> >
> > 
> >
> > What does locale(1) output for you where you run this pdfmom command?
> 
>   LANG=en_CA.UTF-8
>   LANGUAGE=en_CA:en
>   LC_CTYPE="en_CA.UTF-8"
>   LC_NUMERIC="en_CA.UTF-8"
>   LC_TIME="en_CA.UTF-8"
>   LC_COLLATE="en_CA.UTF-8"
>   LC_MONETARY="en_CA.UTF-8"
>   LC_MESSAGES="en_CA.UTF-8"
>   LC_PAPER="en_CA.UTF-8"
>   LC_NAME="en_CA.UTF-8"
>   LC_ADDRESS="en_CA.UTF-8"
>   LC_TELEPHONE="en_CA.UTF-8"
>   LC_MEASUREMENT="en_CA.UTF-8"
>   LC_IDENTIFICATION="en_CA.UTF-8"
>   LC_ALL=en_CA.UTF-8
>  
> 
> > > The solution is to pass the -a flag to grep.
> >
> > 
> >
> > How about 
> >
> > 
> >     groff ... 2>&1 | LC_ALL=C grep '^\.ds' | groff ...
> 
> Yes, that's the solution I thought of before suggesting the tidier
> but, as Steffen pointed out, not universal -a flag.
>  
> 
> > BTW, pdfmom has a bug shown by that strace command I suggested.
> >
> > 
> >     system("groff ... 2>&1 | grep '^\.ds' | groff ...");
> > 
> >
> > That's a double-quoted Perl string so `\.' is escaping the dot and grep
> > sees a plain dot for `any character'.  The backslash needs doubling.
> 
> Missed that.  Argh.  Why don't they make special glasses that let
> you see code as if for the first time whenever you put them on?
> 
> -- 
> Peter Schaffter

I can't actually recreate the problem, i.e. grep does not spit out the 
"binary" error. I've tried with a en_GB.UTF-8 and a en_GB environment, neither 
show the message. The version of grep I'm using is:-

grep (GNU grep) 2.20

The double escaping of the "." in the grep pattern used to be there:-

grep \"^\\.ds\"

but got changed.

Cheers 

Deri




reply via email to

[Prev in Thread] Current Thread [Next in Thread]