gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] How to support arch on systems with a small PATH_MAX [W


From: Parker, Ron
Subject: [Gnu-arch-users] How to support arch on systems with a small PATH_MAX [WAS: arch o n windows?]
Date: Wed, 14 Jul 2004 17:06:14 -0500

Here comes the Stay-Puf Marshmallow man.

On Mon, 12 Jul 2004 00:52:13 -0700 (PDT), Tom Lord <address@hidden> wrote:

>        ~ write a good, tight post summing up the state of things

Sorry this isn't a "good, tight post" on the way things are.  
It is more of a supplemental post.


_How to support arch on systems with a small PATH_MAX_

Okay, my fire-retardant suit is firmly in place, I'm more than
willing to accept flames, but hopefully this leads to something
constructive.  I finally decided to take a ground up look at
this. 


* Terms

POSIX-minimal system, any O.S. or platform where PATH_MAX is
equal to or slightly larger than the minimum value of 255 defined
by POSIX.


* Assumptions

Arch will always require a PATH_MAX that at least meets the POSIX
minimum of 255.


* Goals

To provide a fully functional implementation that will allow
complete participation in the arch community.  This is defined
as:

1.  Being able to access and use all archives produced and hosted
on any system that has a more liberal PATH_MAX.

2.  Being able to operate with or without a revision library.

3.  Being able to host the working directory and archives on the
local file-system.

4.  Being able to distribute source archives of projects
developed with arch on a POSIX-minimal system in such a manner
that the arch data contained therein is readily usable on even
more liberal systems.

5.  Being able to host archives from the POSIX-minimal system,
using its standard protocols and tools.

To provide the above listed functionality in an upwardly
compatible manner.


* File Categories

When considering the files that are created and used by arch,
they may be divided into three categories.  The first category is
comprised of those files that may be considered strictly local to
the POSIX-minimal environment.  The second category of files is
those that should be directly sharable between a POSIX-minimal
and more optimal systems.  The third and final category are those
files that may need to be shared indirectly between POSIX-minimal
and more optimal systems.


** Strictly Local Files

Examples of the first category of files would be those in the
,,commit... directories left by an incomplete commit and
++saved... directories generated by doing a build-config in a
directory that already has the results of a previous
build-config.  

While the contents of these directories may contain changesets
that could possibly be used and distributed to others, there may
be room here to alter the way file and directory names are
generated such that the results can fit on a POSIX-minimal system
and if really needed the pertinent parts could still be passed
along.

As an example, the following is the longest FQFN I have in a
,,commit... directory:

        /home/rdparker/tla-1.3/src/hackerlab/\
        
,,address@hidden
.1/\
        hackerlab--devo--1.0--patch-93.patches/\
        new-files-archive/{arch}/hackerlab/hackerlab--devo/\
        hackerlab--devo--1.0/address@hidden/patch-log/\
        patch-93

It is 264-characters long and would nearly fit on a POSIX-minimal
system.

The longest FQFN I have in a ++saved... directory is:

        /home/rdparker/tla-1.3/++saved.src.1089815347.15624.1/\
        {arch}/++pristine-trees/unlocked/package-framework/\
        package-framework--devo/package-framework--devo--1.0/\
        address@hidden/\
        package-framework--devo--1.0--patch-5/{arch}/\
        package-framework/package-framework--patch-detection/\
        package-framework--patch-detection--1.0/\
        address@hidden/patch-log/patch-2

It contains 358 characters and exceeds the POSIX minimum by ~40%.
Anecdotally in playing with arch, I believe I have seen paths in
excess of 400 characters but never greater than 500.


** Directly Sharable Files

Directly sharable files would basically consist of shared
archives and mirrors.  So a mechanism is needed to allow archives
from a POSIX-minimal system to be shared with others in an
upwardly compatible manner.  In terms of files that ultimately
reside in an archive the longest example I have is:

        /home/rdparker {mirrors}/\
        address@hidden/tla-bash-complete/\
        tla-bash-complete--rwa/tla-bash-complete--rwa--1.0/\
        patch-1/\
        tla-bash-complete--rwa--1.0--patch-1.patches.tar.gz

At only 160 characters, that at least works on a POSIX-minimal
system.  So this *may* be a non-issue.  Even on my HTTP mirror
the deepest it gets is:

        .../{archives}/address@hidden/\
        package-framework/package-framework--devo/\
        package-framework--devo--1.0/patch-1/\
        package-framework--devo--1.0--patch-1.patches.tar.gz

at a total length of 204 characters measured from root.  My
concern here is that a longer temporary pathname may be used
during a commit or archive-mirror.  There are plenty of examples
of long individual directory names, cf. ,,commit.hackerlab... 
above.  If something like this shows up in an archive, that would
be an issue.


** Indirectly Sharable Files

This would include tarball distributions of working directories,
like tla-1.2.tar.gz, that contain {arch} hierarchies.  Also the
results of mkpatch, delta, and get-changeset would qualify.  
(There may be others.)

The longest FQFN I have in a working directory was when not using
a revlib and is 346-characters long:

        /home/rdparker/tla-rdparker/src/{arch}/++pristine-trees/\
        unlocked/package-framework/package-framework--devo/\
        package-framework--devo--1.0/address@hidden/\
        package-framework--devo--1.0--patch-1/{arch}/\
        package-framework/package-framework--patch-detection/\
        package-framework--patch-detection--1.0/\
        address@hidden/patch-log/patch-2

176-characters was the longest mkpatch FQFN I came up with:

        /home/rdparker/mkpatch-output/removed-files-archive/\
        {arch}/package-framework/package-framework--devo/\
        package-framework--devo--1.0/address@hidden/\
        patch-log/patch-1

A delta produced a path of 198-characters:

        /home/rdparker/tla-1.3/src/,,changes/new-files-archive/\
        {arch}/package-framework/\
        package-framework--patch-detection/\
        package-framework--patch-detection--1.0/\
        address@hidden/patch-log/patch-2

And get-changeset turned out a 198-character path as well:

        /home/rdparker/\
        package-framework--devo--1.0--patch-5.patches/\
        new-files-archive/{arch}/package-framework/\
        package-framework--devo/package-framework--devo--1.0/\
        address@hidden/patch-log/patch-5

Now, I realize these are no where near maximal, but it does show
some of what can be expected.

For strictly-local files, it may be permissible to deviate
significantly from what is done on an optimal system.

For the directly-sharable files, we need to maintain full upwards
compatibility.  This does not appear to be a severe problem as
the archives appear to stay below the POSIX minimum limit.
However, some consideration may need to be paid to temporary
directory naming conventions.  ISTR that there are some rather
long directory names created within an archive that are later
renamed.  I may be wrong.  But this may need to change if it can
be done upward-compatibly.  If it would break upwards
compatibility, consideration may need to be given to some sort of
intermediate mechanism or proxy that could handle the
translation, between "regular" systems and the POSIX-minimal
ones.

The indirectly-sharable files may be handled completely on the
client-side or optionally a tool could be provided to convert
files from a POSIX-minimal system into their normal form on a
more optimal system.


*  Current Proposals and (Partial) Solutions

Currently there are three things that are being used or have been
proposed.  The first and most infamous being replacing
c/c--b/c--b--v/archive-name/c--b--v--r with c/b/v/archive-name/r
or something similar.  The second is a Win32 specific solution of
using the 8.3 filename hack to map names that exceed the POSIX
minimum down to something that fits within the minimum.  The
third is the current =dirname pathcompression that Lode.Leroy
started.


**  Category-Branch-Version-Revision Reduction


***  Pros

This retains much of the semantic information within the file
names and paths.

And has the potential to reduce the 358-character:

        /home/rdparker/tla-1.3/++saved.src.1089815347.15624.1/\
        {arch}/++pristine-trees/unlocked/package-framework/\
        package-framework--devo/package-framework--devo--1.0/\
        address@hidden/package-framework--devo--1.0--patch-5/\
        {arch}/package-framework/\
        package-framework--patch-detection/\
        package-framework--patch-detection--1.0/\
        address@hidden/patch-log/patch-2

to a 229-character:

        /home/rdparker/tla-1.3/++saved.src.1089815347.15624.1/\
        {arch}/++pristine-trees/unlocked/package-framework/devo/\
        1.0/address@hidden/patch-5/{arch}/package-framework/\
        patch-detection/1.0/address@hidden/patch-log
        patch-2


***  Cons

It breaks upward compatibility if implemented universally.

Tom has historically resisted this as a solution for merging into
his devo branch.

It does lose semantic information when viewing a directory in
isolation.  (Think the vi or emacs directory viewers or shell
prompts with the last fragment of the current-working-directory
displayed.)  Granted IMO this is primarily an issue for arch
developers.  But I include myself marginally in that category.

It may still be possible with long category, branch or archive
names to reach the limit.  This could also be done using a
relatively deep base directory for the working directory,
"C:\Documents and Settings\username\My Documents" not being
unheard of on Windows.


**  8.3 Filename Hack


***  Pros

Relatively simple to implement on Windows.

It would reduce the 19 element deep path of:

        /home/rdparker/tla-1.3/++saved.src.1089815347.15624.1/\
        {arch}/++pristine-trees/unlocked/package-framework/\
        package-framework--devo/package-framework--devo--1.0/\
        address@hidden/package-framework--devo--1.0--patch-5/\
        {arch}/package-framework/\
        package-framework--patch-detection/\
        package-framework--patch-detection--1.0/\
        address@hidden/patch-log/patch-2

to an absolute maximum of 247-characters.

The upwardly compatible long file paths remain intact, the 8.3
forms essentially are alternative names for the same files.


***  Cons

This is strictly a Windows solution.  (That may not be all that
bad.)

These remaining long file paths will still crash native Windows
tools and exceed the limits of Cygwin application ports.  Windows
Explorer (konqueror or nautilus in file:// mode) will blow up if
the user drills too deeply.  Microsoft Internet Information
Server (think httpd or apache) cannot serve up these long paths
either.  According to second hand reports, standard antivirus and
backup products on Windows are likely to choke on these.  Even
the standard Cygwin ports of bash, vi, etc. have the
260-character limit and using the 8.3 forms of the paths is
awkward at best.

If one is looking at the 8.3 version it is hard to distinguish
files and directories that have the same initial characters and
it definitely loses semantic info.

Adding a couple directories to the hierarchy seen above could
still exceed a minimal PATH_MAX.


** Path-compression Via the =dirnames Map


*** Pros

It easily squeezes the long example given above down to less-than
100 characters.

The PATH_MAX limit is not likely to be exceeded by any reasonable
usage of tla.


*** Cons

It loses all semantic information in the various directories
where it is used, greatly complicating browsing of the
directories.

There is a performance issue with the current implementation.

It requires patching diffutils, tar and patch.

The modified utilities must be distributed with tla.


** Universal Cons of the Three Systems

They all require some tool external to tla to distribute source
archives of working directories.

None of them addresses being able to share an archive hosted on
Windows in an upwardly compatible manner, if archives do indeed
exceed the POSIX minimum.


*  Additional Goals Derived from this Analysis

To retain as much semantic information in the file and directory
names as possible.

To be able to use the standard Windows and Cygwin tools in
conjunction with tla.

To not have to patch and ship a modified tar, diffutils or patch.

To maximize performance without resorting to a single-file
archive format like db as has been suggested elsewhere.  That is
not necessarily a bad idea for improving performance on Windows,
but it is orthogonal to getting tla working fully on Windows.


*  An Alternative Solution

These are my preliminary thoughts on an alternative solution.
Much of it is nothing more than brainstorming.  I thought about
working through it all and coding it up first then dropping it on
the world, but I would prefer to work with everyone and reach
some kind of consensus (or be told I am completely off-base)
before wasting a lot of time coding and angering others by going 
around behind their backs.  (Lessons from the recent xemacs 
posts?  Possibly.)

Category-Branch-Version-Revision reduction could be used for
strictly-local files without affecting the external arch
community.  While, this still could theoretically exceed the
POSIX-minimal PATH_MAX, it lessens the likelihood of it
happening.  

There are also individual directory names that are fairly long
compared to the POSIX minimum.  For those things like the
,,commit directories:

        .../\
        
,,address@hidden
.1/\
        hackerlab--devo--1.0--patch-93.patches/new-files-archive/\
        {arch}/hackerlab/hackerlab--devo/hackerlab--devo--1.0/\
        address@hidden/patch-log/patch-93/
        
a simplification could be made reducing that one directory name
to:

        .../,,commit.hackerlab--1089817914.15755.1/...

which retains the trailing subdirectories and avoids 
completely loosing any semantic meaning.  While having the extra
information in the ,,commit directory name is nice, it can be 
found within the commit directory itself.  I think that is a
reasonable compromise on a POSIX-challenged machine.  If the data 
needs to be explicit, it could go in a file just inside the
,,commit directory.

<bad-idea?>

A possible extension or alternative to CBV reduction that could
deal with the possibility of blowing PATH_MAX would be to do some
remapping to a higher directory for over-deep paths.  Placing
=remap-to file in the nested directory that contains a single
line like "../../../=remapped.xyz" or "file-or-dirname
../../../=find-it-here".  These =remap* files and directories
would be ignored by most tools much like the current =dirnames
file in the path compression code.  (This thought definitely
needs more development.)  There are issues with maintaining the
proper overall nesting so that entire directory structures can be
treated as a unit via "rm -rf" and similar.

Given the beastly:

        /home/rdparker/tla-1.3/++saved.src.1089815347.15624.1/\
        {arch}/++pristine-trees/unlocked/package-framework/\
        package-framework--devo/package-framework--devo--1.0/\
        address@hidden/package-framework--devo--1.0--patch-5/\
        {arch}/package-framework/\
        package-framework--patch-detection/\
        package-framework--patch-detection--1.0/\
        address@hidden/patch-log/patch-2

It would probably be bad to remap anything in the
package-framework--patch-detection section such that it was
actually contained above the {arch} directory two levels higher.
It would be better to move the {arch} directory itself to a higher
level.  The question is where would it be appropriate to remap
it?  ...ah never mind, I think there are too many issues with
this.  It would require a deeper understanding of arch than what
I currently have in order to get it right.

</bad-idea?>

A command or two could be added to tla to convert between the
filenames used on POSIX-minimal systems and the names used on
other machines.  Perhaps 'tla file-path' could have options to
display the POSIX-minimal form of a filename, the normal form of
the filename and the native form of the system where tla is
running.

This would only be needed on POSIX-minimal systems.  Instead of
patching and distributing modified versions of tar, diffutils and
patch, tla could implement filters to their input or output.  I
know its ugly, but custom GNU utils might just be uglier.

For example the output from diffutils could be piped into a
routine in tla to rewrite the diff headers to use the normal form
of the path name.  The input to patch could be a filter that did
the opposite if necessary.  In the case of where tla unpacks a
tar file during a get, it could pipe the tar.gz to gunzip or zLib
and assuming the tar file format is well known (IIRC it was 18
years ago when I last looked) tla could rewrite the header blocks
of each file to give the local file name, then pipe that input
along to 'tar x....'  

The opposite could be done for a make-src-distribution command
which could turn a working directory on a POSIX-minimal system in
to a normalized .tar.gz for general distribution.  A
unpack-src-distribution might also be handy for the
POSIX-minimally impaired systems that need to unpack a source
distribution from a more optimal system.

The {make,unpack}-src-distribution commands and similar could be
used for mangling^h^h^h^h^haging the passing around of
indirectly-shared files as the need arises (or is planned for).


*  Directly-Shared files

None of the above solutions really addresses directly shared
files.  For example serving up your writable archive via a
read-only HTTP connection or for that matter serving up an
archive via HTTP at all on a POSIX-minimal system, when the URL
maps to something larger than the POSIX minimum on the server.

        *** Late breaking news, this just in. ***

A query graciously run on the supermirror by jblack returned:

        .../address@hidden/\
        libgraphics-colordeficiency-perl/\
        libgraphics-colordeficiency-perl--upstream/\
        libgraphics-colordeficiency-perl--upstream--0.0/\
        base-0/\
        libgraphics-colordeficiency-perl--upstream--0.0--base-0.src.tar.gz

As his longest at a total 251 characters.  Ouch, that is getting
close to the POSIX minimum limit, so there may be a need for a
solution.  The big problem is how to do this in a upwardly
compatible manner.  If you don't want to change the protocol, and
your server can't support the current protocol, what do you do?
How about a helper program?  A little proxy that could be run to
access POSIX-minimal archives in a normal manner?  Say a proxy
that would convert requests for

        http://www.inthefaith.net/rdp/{archives}/\
        address@hidden/package-framework/\
        package-framework--devo/package-framework--devo--1.0/\
        patch-1/\
        package-framework--devo--1.0--patch-1.patches.tar.gz

into:

        http://www.inthefaith.net/rdp/{archives}/\
        address@hidden/package-framework/devo/1.0/\
        patch-1/patches.tar.gz

The POSIX-minimal systems themselves could either use the proxy
as well, or access the archive directly by archive-register'ing
it as pm-http://www.inthefaith.net/rdp/{archives}.....
Optionally a POSIX-minimal server could be indicated by a
=meta-info/windows-blows, with the simple text "yes it does".  Of
course if the mere existence of =meta-info indicates a mirror, a
=short-paths file could be used with the simple text of "windows
blows".

Note:  The machine at www.inthefaith.net does not run Windows,
this is just an example.

Okay, I spent entirely too long working on this but I hope it
generates some discussion and presents some alternative solutions
that may have not been considered before.  I feel there are
things here that would play nicely with any of the existing ideas
out there.  I only focused on the CBV reduction because it
retains the most semantic information.  Do your worst and let me
know what you all think.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]