[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bugs #10609] Inconsistent behaviour of printf-style format specifiers

From: James Youngman
Subject: [bugs #10609] Inconsistent behaviour of printf-style format specifiers
Date: Fri, 19 Nov 2004 18:40:30 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041007 Debian/1.7.3-5

This mail is an automated notification from the bugs tracker
 of the project: findutils.

[bugs #10609] Latest Modifications:

Changes by: 
                James Youngman <address@hidden>
                Fri 11/19/04 at 23:25 (GMT)

            What     | Removed                   | Added
          Resolution | Postponed                 | Wont Fix
              Status | Open                      | Closed
       Fixed Release | None                      | 4.2.5

------------------ Additional Follow-up Comments ----------------------------
Going ahead with the plan to close the bug report as described in my earlier 
update.  Please open a new bug report specifically on the printf format 
specifier flags if you feel that the documentation change is not sufficient, 
and the actual functionality should in fact change, or (especially) if you can 
think of a good fix for the problem.   Thanks.

[bugs #10609] Full Item Snapshot:

URL: <http://savannah.gnu.org/bugs/?func=detailitem&item_id=10609>
Project: findutils
Submitted by: James Youngman
On: Thu 10/07/04 at 22:21

Category:  find
Severity:  1 - None
Item Group:  None
Resolution:  Wont Fix
Privacy:  Public
Assigned to:  jay
Originator Name:  Chris Chittleborough
Originator Email:  address@hidden
Status:  Closed
Release:  None
Fixed Release:  4.2.5

Summary:  Inconsistent behaviour of printf-style format specifiers

Original Submission:  Dan Jacobson recently pointed out a documentation problem 
find's format strings.  Here is a fuller explanation of the situation as
of find 4.1.20.  (I sent this to bug-findutils back in August but got no
response; maybe sending it during the summer vacation period for you
Northern Hemisphere dwellers wasn't such a great move ...)

I have written a patch which adds a new format code (%M -> ls-style
10-character mode string) and a new time format (%T+ ->
yyyy-mm-dd+HH:MM:SS) to find.  Thanks to the really clean internal
structure of the source, coding this enhancement was quite easy.  But I
have run into some problems while documenting the new format features.

In find.texi, the "Print File Information" node says about "-printf" that
   Field widths and precisions can be specified as with the
   `printf' C function
which is true but quite incomplete.  On the other hand, the "Format
Directives" node says that
   [u]nlike the C `printf' function, ["-printf" and
   "-fprintf"] do not support field width specifiers
which is quite untrue.

Here's the full truth as of version 4.1.20. Find accepts the syntax
   '%' {'-' | '0' | '#' | '+' | ' ' } Number [ '.' Number ] Directive
for format conversions, but only honors the '0', '#', '+' and ' ' flags
with two directives, %d (depth) and %m (mode).  You might expect numeric
directives like %s, %n and %i to behave like %d, but they don't.  In a
closely-related development, with %d and %m a precision specification
(eg., the ".4" in "%.4m") specifies the minimum number of digits to be
output, but with all other directives (even %s!) it specifies the
maximum output length.  (The reason for this is that %d and %m are
implemented using fprintf's numeric conversions -- %d (decimal) and %o
(octal) respectively -- while all the other directives use fprintf's %s
(string) conversion.)
There is another complication with %m.  You might expect that it would
always output 3 octal digits or 4 iff the setuid, setgid or sticky bits
are set.  However, a file with mode 044 (unlikely, but I've done ... er,
that is, *seen* ... stranger things), causes "%m" to output "44", not

All this is confusing and hard to document.  Perhaps %d and %o should be
brought into line with the other directives?  But this might break
existing scripts.  Does the fact that these details were not documented
make it acceptable to change them?  Perhaps more importantly, what does
POSIX say about this stuff?  (Having never even seen any POSIX
specification, I have no idea.)

If the people on this list can reach some sort of consensus, I'm willing
to try to produce a patch.

Follow-up Comments

Date: Fri 11/19/04 at 23:25         By: James Youngman <jay>
Going ahead with the plan to close the bug report as described in my earlier 
update.  Please open a new bug report specifically on the printf format 
specifier flags if you feel that the documentation change is not sufficient, 
and the actual functionality should in fact change, or (especially) if you can 
think of a good fix for the problem.   Thanks.

Date: Wed 11/10/04 at 23:15         By: James Youngman <jay>
Postpone further action until I hear from Chris.

Date: Wed 11/10/04 at 23:15         By: James Youngman <jay>
I have applied something very similar to your patch.  I have also documented 

As indicated in my earlier comment, I propose not to change the code now that 
the behaviour is documented for the %d and %m specifiers (versus everything 
else).  The rationale behind this is that it's very difficult to portably print 
uintmax_t values on systems that don't have support for that in their fprintf 
library function. 

Hence I plan to close the bug report as resolved.  Do you have any objections 
or other comments?

Date: Wed 11/10/04 at 22:42         By: James Youngman <jay>
Thanks for the detailed bug report.  I've examined the code and now understand 
why it does what it does.   In short, most of the numeric fileds from struct 
stat are not defined by POSIX to fit into a "long".   That means that we can't 
rely on being able to print them with %ld.  POSIX requires that implementations 
provide an environment in which a subset fit into a long but there is no 
guarantee that we're running in it, and in any case some of the fields we are 
interested in are not among the fields POSIX requires to fit into a long.  

The findutils code uses human_readable() to print these values.  That uses a 
uintmax_t type, which might have been defined by config.h as "unsigned long 
long" on systems that have no native uintmax_t.  Hence once human_readable() 
has done its work, the resulting field is printed with %s.    Therefore the 
flags are passed to printf(), but of course printf() ignores them. 

To honour the flag characters we would either need to post-process the result 
of human-readable or figure out a way of getting printf() to print a value of 
type uintmax_t even on a system which doesn't natively support such a type (for 
example because it is a GCC extension and we are using the system's C library).

For now, I have opted to document the current behaviour rather than change the 
code (though I have put comments in the code explaining what is happening).

Date: Sun 10/10/04 at 10:55         By: Chris(topher) Chittleborough 
FURTHER THOUGHTS: I've been thinking about how find's format codes *should* 
work in an ideal world, and came up with the following proposal.  The lines 
beginning with two plus signs describe changes to find's current behaviour.

   a    string  last access time in ctime() format
   A<K> string  last access time using strftime() code <K>
   b    integer size in 512-byte blocks, rounded up
   c    string  status change time in ctime() format
   C<K> string  status change time using strftime() code <K>
   d    integer depth
   f    string  file name (= basename(%p))
   F    string  filesystem type (a string)
   g    string  group name (or number)
   G    integer group number
   h    string  directory (= dirname(%p))
   H    string  command-line arg under which file was found
   i    integer inode number in decimal
   k    integer size in 1K blocks, rounded up
   l    string  target of symlink; "" for non-symlinks
   m    special modes (really permissions) in octal
++ M    string  mode in string format (eg., "drwxr-xr-x")
   n    integer number of hard links
   p    string  file's path
   P    string  file's path with command-line arg (=%H) removed
   s    integer size in bytes
++ S    string  size in human-friendly notation (like df -h)
   t    string  last modification time in ctime() format
   T<K> string  last modification using strftime() code <K>
   u    string  user name (or number)
   U    integer user number

 String-style conversions are of the form
   "%" {<str-flag>} [<min-width>] ["."<max-length>] <code>
 where <str-flag> can be
        "-" for left-justification (iff <width> specified),
++   or "#" for Unix-style (backslashed) quoting.
 This could be implemented by using a "%[-][<width>][.<max-length>]s" format
 with fprintf().  (If "#" is specified, a backslash-escaped string is used
 instead of the original string.  Note that the -ls predicate performs
 backslash-escaping of pathnames.)

 Number-style conversions are of the form
   "%" {<num-flag>} [<min-width>] ["."<min-digits>] <code>
 where <num-flag> can be
        "-" for left-justification (iff <width> specified),
++      "0" to pad with leading zeros instead of spaces (ignored if "-" used),
++      "+" to output a plus sign before the digits,
++      " " to output a space before the digits (ignored if "+" used),
++   or "'" to group digits in a locale-specific way.
 This could be implemented by using a "%{<num-flag>}[<width>][.<min-digits>]u"
 format with fprintf(), modulo type modifiers.
++      * I suspect that most find users would expect the "0", "+" and " " flags
++        and the ".<min-digits>" specifier to be supported for numeric
++        conversions; certainly I was suprised to find they weren't.
++      * I think the "'" flag would be a useful addition to find.
++      * if "'" is used and the local fprintf() does not support "'" itself, we
++        have to insert the grouping separators (and pad the result) ourselves.
          (Version 2 of the Single UNIX Specification, which dates back to 1997,
          requires support for the "'" flag.)

 The %m format is of the form
   "%" {<m-flag>} [<min-width>] ["."<min-digits>] "m"
 where <m-flag> can be
        "-" for left-justification (iff <width> specified),
++   or "#" to force a leading zero.
 This can be implemented using a "%{<m-flag>}<min-width>.<min-digits>o" format
 with fprintf().
++      * We always specify both <min-width> and <min-digits>.
++      * <min-digits> defaults to 3, to match most people's expectations.
++      * <min-width> probably should default to 5.

Even with the above changes, it is not possible to write a printf format which
is equivalent to -ls. With three more changes, it would be possible:
  * Let %s with a "#" flag behave differently with block and character
    devices: instead of printing a meaningless number, print the major and
    minor device numbers in "%3u, %3u" format.
  * Add a %L code, which generates "-> %l" for symbolic links and zero
    characters for everything else.
  * Add a time format code, possibly "-", which uses the strftime() formats
    "%a %b %d %H:%M" (for timestamps in the previous six months) or
    "%a %b %d  %y" (for all other timestamps).
Then -ls would be identical to
        -printf '%6i %4k %M %3n %0-8.8u %-8.8g %#8s %T- %p%Ln'
except that -ls uses %4b instead of %4k in POSIXLY_CORRECT mode.

File Attachments

Date: Wed 11/10/04 at 22:42  Name: format-documentation.diff  Size: 7.07KB   
By: jay


Date: Sun 10/10/04 at 10:55  Name: CodeOnly.patch  Size: 2.05KB   By: 
FYI, here's a code-only patch to add %M and the &quot;+&quot; timestamp option.

For detailed info, follow this link:

  Message sent via/by Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]