bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bugs in dirname module


From: Eric Blake
Subject: Re: bugs in dirname module
Date: Tue, 08 Nov 2005 07:21:01 -0700
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Paul Eggert on 11/7/2005 2:02 PM:
>>Should I add a dependency on the verify module to ensure that drive
>>letter prefixes are only used on ASCII platforms?
> 
> I wouldn't bother, unless there's a serious possibility that we'll see
> drive letter prefixes on an EBCDIC platform.  I think that unlikely.
> Besides, if we ever port to OS/360, we'll need to rewrite all this
> code anyway -- their file name syntax is _weird_.

OK.  VMS also has weird file names, if anyone ever wants to port there.

> 
>>Should I still move FILE_SYSTEM_PREFIX_LEN out of config.h and into
>>dirname.h?
> 
> I'd rather move it out of config.h, yes.  I have some qualms with
> config.h containing ifdefs.

What about ISSLASH?  I did not touch that in dos.m4, but it is another
#ifdef that could reasonably be moved to dirname.h.

>>POSIX poses the rule as: if "$string" is a valid filename, then 'cd
>>"$(dirname "$string")"; ls "$(basename "$string")"' lists the same
>>file
> 
> Yes, that's a better way to formulate the constraint (though it's not
> a complete spec).
> 

A similar formulation also appears in the comments of filenamecat.c.

> 
>>(although POSIX fails to mention that this will fail if the
>>filename contains a trailing newline).
> 
> That problem doesn't occur for dir_name and base_name though, right?
> At least on Unix-like platforms.

The problem with trailing newlines is not in the library dirname(3) and
basename(3), but with the command line utilities dirname(1) and
basename(1) in the presence of `` or $() of /bin/sh.  This is because the
command line utilities must add a newline, then the command substitution
quotes strip all consecutive trailing newlines whether they were part of
the filename or added by the utility.

> 
>>My patch makes base_name("c:/") return "".
> 
> Is "" a valid file name?  If not, then we're violating the constraint.
> I don't know the DOS rules, but I suspect that the only suffix of
> "c:/" that identifies the same file is "c:/" itself; in that case,
> base_name("c:/") should return "c:/".

A quick test shows that DOS treats "" as ".", but cygwin rejects it as
invalid (as do all POSIX platforms).  I'm not sure about DJGPP or mingw.
Consistency argues that for roots (/, //, and c:/), we should either
always return the entire root or the empty string.  Then there is the
question that when returning the empty string, should it be done by
returning the beginning of the original string with base_len of 0, or by
returning the end of the original string.

> 
> 
>>For that matter, would it help to redefine our base_name to ALWAYS return
>>the empty string if the argument is a root directory?
> 
> That might be simpler, yes, but due to backward-compatibility concerns
> I might be inclined to give that function a different name --
> base_component perhaps.
> 
...
> 
> I'd like someone to review the uses of base_name, dir_name, etc. in
> coreutils and tar.  This will give us a better feeling for how these
> changes would affect real code.  Perhaps our backward-compatibility
> concerns are unwarranted....
> 

OK, here's my audit.

gnulib:
 lib/backupfile.c: check_extension and numbered_backup - both are static,
and only invoked by find_backup_file_name; which in turn is only used by
coreutils and tar on non-directories, so immune to whether
base_name("c:/") returns "" or "c:/".

 lib/dirname.c: dir_len - logic works whether base_name("c:/") returns ""
or "c:/", so immune.

 lib/filnamecat.c: file_name_concat - uses base_name on DIR argument;
works if base_name("c:/") returns "c:/" length 3, or returns &"c:/"[3]
length 0, but fails if it returns "c:/" length 0.  Also see the comment in
the test code below about the possibility of changing this file to warn if
ABASE is a root.

 lib/same.c: same_name - used twice to compare two filenames, so immune
since both calls have same semantics.

 lib/stripslash.c: strip_trailing_slashes - already has an issue with
"///" not being shortened to "/", so easy to fix to whatever semantics we
give base_name.

coreutils:
 src/basename.c: main - currently uses what base_name returns, but easy to
patch to print original string if base_name returns empty string.

 src/cp.c: ASSIGN_BASENAME_STRDUPA - uses alloca, walks entire string
twice (by calling strip_trailing_slashes before base_name); but only used
as an argument to file_name_concat which already had the comment about
passing a root as the ABASE argument.  target_directory_operand - followed
up by stat of full filename, so immune.

 src/dircolors.c: guess_shell_syntax - setting SHELL as a root would fail
anyways, so immune.

 src/install.c: target_directory_operand - this should be factored out and
shared with cp.c; followed up by stat of full filename, so immune.
install_file_in_dir - only used as an argument to file_name_concat which
already has the comment about passing a root as the ABASE argument.

 src/ln.c: target_directory_operand - again, this could be factored out;
followed up by stat of full filename, so immune.  main - only used as an
argument to file_name_concat which already has the comment about passing a
root as the ABASE argument.

 src/ls.c: basename_is_dot_or_dotdot - only used for string comparison to
. or .., so immune.  However, base_name does not strip trailing slashes,
and system.h DOT_OR_DOTDOT does not accomodate trailing slashes.

 src/mv.c: target_directory_operand - again, this could be factored out;
followed up by stat of full filename, so immune.  movefile - only used as
an argument to file_name_concat which already has the comment about
passing a root as the ABASE argument.

 src/remove.c: rm_1 - only used for string comparison to . or .., so immune.

 src/rm.c: usage - unlike all the other coreutils, this only uses
base_name(program_name) in the usage string.  Either they all should, or
this should be changed to print the full program_name.

 src/shred.c: wipename - only invoked on non-directories, so immune.

 src/split.c: next_file_name - only used in checking for filename
truncation of long names, but roots are short, so immune.

 src/su.c: log_su - based on program_name, so immune.  run_shell - setting
SHELL as a root would fail anyways, so immune.  Besides, there is talk of
removing su from coreutils.

paxutils:
 lib/rtapelib.c: rmt_open__ - setting SHELL as a root would fail anyways,
so immune.

 paxlib/rtape.c: rmt_open - setting SHELL as a root would fail anyways, so
immune.

tar:
 src/extract.c: create_placeholder_file - only used on filenames with
multiple components, so immune.

 src/xheader.c: xheader_formate_name - comments claim that %f is treated
as the results of the basename utility, but the code neglects to strip
trailing slashes (unless this was already done when building the
tar_stat_info).  I'm not sure if tar_stat_info can ever name a root.

findutils:
 find/find.c: at_top - only used after filtering out roots, so immune.

 find/pred.c: pref_fprintf - I'm not sure if pathname will ever be a root.
 pred_iname and pred_name - used as argument to fnmatch; I'm not sure if
pathname will ever be a root, or if it can have trailing slashes that
should have been stripped.

 locate/locate.c: visit_basename - I'm not sure if pathname will ever be a
root, or if it can have trailing slashes that should have been stripped.



In summary, if we can verify that tar and findutils' usage of base_name
will not be done on root directories, then it looks like changing the
semantics of base_name to return the address of the \0 in the original
string naming a root directory is probably a safe semantic change.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDcLRM84KuGfSFAYARAtd3AKCgKidtbkbJEdmNhMrjRXRC3708ngCg1MKr
BJ+h6FlqTvMAlI4CofkowZM=
=X4aw
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]