[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] Feature suggestion: "tla inventory -0"
From: |
Jan Hudec |
Subject: |
Re: [Gnu-arch-users] Feature suggestion: "tla inventory -0" |
Date: |
Tue, 6 Jan 2004 12:11:30 +0100 |
User-agent: |
Mutt/1.5.4i |
On Mon, Jan 05, 2004 at 22:09:45 -0700, Eric W. Biederman wrote:
> Tom Lord <address@hidden> writes:
>
> > > From: Charles Duffy <address@hidden>
> >
> > > ...resulting in null-delimited output, suitable for piping into
> > > xargs -0 or the like, and thus causing The Right Thing to happen
> > > in cases involving filenames with spaces.
> >
> > > Thoughts?
> >
> > Perhaps.
> >
> > What I would most like to avoid longer-term is a half-hearted
> > accumulation of features, each intended to make filenames-with-spaces
> > support closer, but in actuality not adding up to anything coherent.
> >
> > The null-character convention used by GNU xargs (and GNU tar as I
> > recall) is one strategy for dealing with such filenames -- but I think
> > it is a problematic one. For example, other textutils don't
> > understand that convention, it looks horrible in a text editor,
> > although fine for filenames it can't handle fields that contain the
> > null character, etc.
> >
> > We have other needs within arch for lists (in some cases multi-field
> > lists) which can include odd filenames. I'd find it easier to say yes
> > to incrementally adding features to arch if we first had an overall
> > strategy for fields that can contain non-graphical characters.
> >
> > So far as I know, the choices basically come down to:
> >
> > ~ use 0 specially
> >
> > losses: not terminal or editor friendly,
> > can't handle 0 in fields
> >
> > wins: GNU xargs and GNU tar support it
> >
> > ~ use a quotation syntax (which also then has to include escapes)
> > to delimite fields with some kind of quote mark
> >
> > losses: whitespace-based field separation fails,
> > tools need to translate fields for many operations
> >
> > wins: pick the string syntax of your favorite scripting language
> > terminal/editor-friendly
> >
> > ~ use an escape syntax without delimiters to map all strings into
> > strings of graphical characters
> >
> > losses: tools need to translate fields for many operations
> >
> > wins: whitespace-based field separation works,
> > terminal/editor-friendly
> >
> >
> > Of these, I think I'm mostly inclined towards the last one (but see
> > below).
>
> Then let me suggest the C convention for representing unicode characters.
> \u hex-quad
> \U hex-quad hex quad
For ascii characters, old octal syntax (\octal-triplet) would be
preferable however, since most tools understand it...
> This is generally useful, it is clear that it is an escape sequence,
> and it is trivial to verify that it is a complete escape sequence.
>
> Given existing unix conventions it is probably worth implementing the
> rest of the standard escapes to be implemented as well.
>
> The command line option -e could be used to go into escape processing
> mode, just like it is in echo. The only real problem I can see is if
> multiple tools in a chain attempted escape processing, but there is
> really no solution to that problem.
>
> > If you look at my full devo tree (as opposed to devo.tla) you can see
> > that there's a lonely directory there containing just `unfold.c'.
> >
> > One direction I think is worth exploring:
> >
> > ~ making a full plan for arch (changeset format, log file format,
> > cached inventory file format ....)
> >
> > ~ make a coding standards spec for tools in general to handle
> > the new conventions
> >
> > ~ incrementally add stuff to arch according to the plan.
> > also incrementally add utils to src/text-utils according
> > to the plan
> >
> > One difficulty is that it's probably worth thinking about Unicode
> > issues in the same plan.
>
> Generally things should be exchanged in utf8, but the above lets
> you stick to pure ascii which is a subset of most character set.
I don't know of any tool, that would have trouble accepting characters
128-255 and thus accepting any properly utf8-encoded non-ascii unicode
character (though it probably won't be able to convert it to the current
locale). What is much bigger problem is characters 0-32 (control chars
+ space).
> > Another difficulty is that it's probably worth thinking about
> > alternative record syntaxes at the same time -- e.g., a generic syntax
> > for multi-line records.
>
> At least until there is a need I don't see the point.
Newline is perfectly encodeable as \012. Should be sufficient.
-------------------------------------------------------------------------------
Jan 'Bulb' Hudec
<address@hidden>