[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Trying out the new escaping version...

From: Jan Hudec
Subject: Re: [Gnu-arch-users] Trying out the new escaping version...
Date: Thu, 18 Mar 2004 18:19:05 +0100
User-agent: Mutt/

On Thu, Mar 18, 2004 at 09:21:19 -0800, Tom Lord wrote:
>     > From: Jan Hudec <address@hidden>
>     > > A generic tool for applying a command to arguments, some of which are
>     > > escaped, is needed.   Modifying every command that takes filename
>     > > arguments to sometimes do unescaping is undesirable.
>     > It was quite predictable you'll say that ;-) I tend to agree with you
>     > that this it is a dirty hack.
>     > On the other hand I tend to disagree the quoting
>     > selected is reasonable. I also disagree, that it shouldn't support the
>     > --null|-0 (like find's -print0), because that would work with xargs
>     > right away and NUL bytes are not going in filenames as long as POSIX is
>     > around (syscalls use NUL terminated strings).
> The primary purpose of quoting in tla --- the thing that _must_ work
> 100% correctly and _must_ be based on a long term view --- is for data
> files that tla itself manages.  Changesets, log files, ,,index caches,
> etc. are written and read by tla.  They must have an unambiguous
> syntax.  The syntax must be a good choice so that it doesn't need to
> later be changed or, if it must later be changed, can be changed in an
> upward compatible way.

Yes. Fully agree on this.

Now, I wanted to write that simpler and already used \uxxxx would work,
but it wouldn't since 4 hexa digits are just 16 bits and unicode already
uses 21. The sgml/xml entity encoding would work (and I would vote for
it since it is already implemented at least as a library everywhere).
The perl style \x{<hex-number>} would work too, but that's about as
arbitrary and incompatible as \(U+<hex-number>).

> Null quoting (like find's print0) is completely inappropriate for
> those purposes.  Surely you don't seriously propose that log files,
> for example, start using null characters as syntax.  Even if you do,
> you're assuming that the _only_ data that will ever be quoted is unix
> filenames but why should we make that assumption?  If later, we need
> quoting for something else, does that mean tla will need _two_ quoting
> syntaxes?

Null separation is not appropriate whenever you need to store two (or
more) items per record and that is needed in index files.

Yes, I DO suggest, that tla supports 3 output formats -- quoted,
unquoted with "\n" separation and unquoted with "\0" separation.

> People are now trying out the new support and --surprise-- discovering
> that once you get out of the universe of "whitespace separated fields" 
> a lot of pipeline tools, shells, and other programs fall down and go
> boom.    Well, no kidding.   That's a problem to work on fixing.
> It's one of reasons libhackerlab is being built and why some of chth's
> support for quoting goes there.   But the problems with these tools
> weren't created by tla and there's no magic tla can perform that will
> make them instantly go away.
> People are noting that Pika quoting interacts poorly with shell syntax
> in some cases.  Well, no kidding, but I'd put the emphasis the other
> way around.  Shell syntax interacts poorly with just about anything
> other than a narrow subset of strings.  Different quoting syntaxes
> will represent different choices about which particular aspects of
> shell bogosity to trigger and which to work-around, but there's no
> choice at all that will suddenly make shell syntax work well.

Shell syntax does behave a little (but no bigger) bit saner with
sgml-style quoting.

> The -print0 hack is a broken idea.   It was a cheap and dirty hack
> that happened to be easy to implement and allowed a few GNU programs
> like `find' and `tar' to handle a few cases they couldn't otherwise.
> The print0 hack deserves to die but, until then -- yes, sometimes it's
> handy.   

Yes, it is a cheap and dirty hack, but it comes around handy and it
shound't be expensive to support.

> Have you considered the idea of writing a 10 line program that reads
> newline separated Pika-quoted strings from stdin and writes
> 0-separated unquoted strings to stdout?

Yes, already working on few tools in perl (so it does not need
(much) installation). These few tools, namely 3, shell be:

    pxcall: unquote it's arguments and call exec on the result
    pxnull: read newline separated tla escaped lines end output null
            separated unquoted strings
    pxargs: shortcut to pxnull | xargs -0 -n1 "$@"

These will go in my tlacontrib branch once tested at least a little bit.

                                                 Jan 'Bulb' Hudec 

Attachment: signature.asc
Description: Digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]