[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sed on binary files

From: Gary V. Vaughan
Subject: Re: sed on binary files
Date: Thu, 2 Oct 2008 12:09:35 +0800

Hi Eric,

On 2 Oct 2008, at 10:51, Eric Blake wrote:
Is there any portable way to process files that contain NUL bytes?

None that I'm aware of. Many GNU utilities are reasonably well behaved with respect to '\0', and m4 is unusual to some extent in that we don't handle them well ourselves.

I'm working on making m4 1.6 transparently handle NUL,

Excellent! I made an attempt to do that myself on the 2.0 branch some years ago, but it didn't go well so I never committed...

and want to
post-process the output to normalize error messages while still verifying
that NUL bytes appeared where expected on stderr.  But on Solaris, the
native sed strips NUL bytes before processing the line (NUL bytes cannot
appear in text files, and POSIX does not define behavior on non-text
files, so this is not a bug, just a difference from GNU diff).  As a
result, the m4 testsuite either fails (if I only postprocess the captured stderr and not the expected error) or can have false positives (if both stderr and expected error are normalized, then regressions involving added or missing NUL are not detected). I don't want to require perl for just
this one test; m4 seems fundamental enough to keep the testsuite
restricted to the GNU coding standards set of tools.

I'd be inclined to do that in C. A few lines should be sufficient to write a minimal filter that writes '\' '0' or '^' '@' to output whenever a NUL byte arrives?

The Solaris man
pages mention that /usr/xpg4/bin/tr can handle NUL bytes, but not
/usr/bin/tr; maybe I could search for an adequate tr, and change all NUL to some other byte that does not otherwise appear in my expected output
(with the added benefit that diff might not give up early with the
complaint that the files are binary), but I don't know if that is portable

It's probably a safe bet that whatever vendor tool you rely on to postprocess will do the wrong thing on one machine or another :(

 Any suggestions?  Is this worth documenting in the autoconf manual?

Certainly, especially since many of the GNU tools *do* endeavour to handle '\0' input gracefully.

Email me:          address@hidden                        (\(\
Read my blog:              ( o.O)
And my other blog:              (uu )o
...and my book:  ("("_)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]