coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ls feature request


From: Kaz Kylheku (Coreutils)
Subject: Re: ls feature request
Date: Fri, 21 Feb 2020 19:51:05 -0800
User-agent: Roundcube Webmail/0.9.2

On 2020-02-21 10:32, Riccardo Mazzarini wrote:
Hi Kaz, this works almost perfectly but it fails with filenames that contain spaces.
I tried using quotation marks, i.e.

ls -dU "$(find .* * -maxdepth 0 -not -type d | sort ; find .* * -maxdepth 0 -type d | sort)"

but that didn't work. Any ideas?

I can answer that in three parts of increasing complexity. The remaining caveat is that since we are relying on passing all names as arguments to a single invocation of "ls", these solutions are all susceptible to the kernel's argument passing limit.

Part 1:

Solutions involving capturing the output of a program and interpolating it as arguments for ls will not work. Or if they are made to work, they will require a clumsy
escaping-and-eval job. So we switch to another method.

If the only issues with names are spaces and control characters, but no spurious newlines, so that the output of "find" has exactly one name per line, then we
can use xargs:

(find .* * -maxdepth 0 -not -type d | sort ; find .* * -maxdepth 0 -type d | sort) | xargs ls -dU

Note that xargs cannot use your shell alias for ls. If you want colors, you have to add
--colors=auto

Part 2:

If the names can be completely arbitrary strings, and include newlines, then we have "find -print0" that will output names as null terminated strings, and we
have "xargs -0" that reads null-terminated strings.

What we don't have is a "sort" that does null-terminated string I/O.

But, what we do have is GNU Awk. GNU Awk can separate input according to
arbitrary records, using a regular expression. In GNU Awk's regular expression
syntax, we can specify the null byte as \0.

Watch this. Here is a little test directory with some files:

  ~/test $ ls
  cert.pem  char.c  hello.c  Makefile      palin.tl  str.sh
  char      hello   lex.awk  notreached.c  pushl.s

We can pass these as null-terminated strings with "find -print0" to a gawk script which
handles them just fine and prints them as newline-terminated lines:

  $ find . -print0 | gawk -v 'RS=\0' 1
  .
  ./hello
  ./Makefile
  ./lex.awk
  ./palin.tl
  ./char
  ./char.c
  ./hello.c
  ./cert.pem
  ./notreached.c
  ./pushl.s
  ./str.sh

And with -v 'ORS=\0', it will output null terminated records too! But we won't be
making use of this.

With the above, we can implement a sort easily:

    # Null terminated string sort using GNU Awk
gawk -v 'RS=\0' '{ line[NR] = $0 } END { asort(line); for (l in line) { printf("%s\0", line[l]); } }'

It's quite a mouthful, so let's move the RS assignment into a BEGIN block and put the whole
awk script into a variable called sort0:

sort0='BEGIN { RS = "\0" } { line[NR] = $0 } END { asort(line); for (l in line) { printf("%s\0", line[l]); } }'

With that variable, we can now have:

(find .* * -maxdepth 0 -not -type d -print0 | gawk "$sort0" ; find .* * -maxdepth 0 -type d -print0 | gawk "$sort0" ) | xargs -0 ls -dU


Part 3:

Since we're using Gawk, we could run a single "find" job and use logic inside the Gawk script to do the separation of directories and non-directories. To distinguish the two, we can use GNU find's -printf instead of -print0. We can print directory names with
a "d" prefix, and other entries with a "-" prefix.

My attempt at this script looks like this:

#!/bin/bash

(find .* * -maxdepth 0 \
           \( -not -type d -printf "-%p\0" \) -o \
           \( -type d -printf "d%p\0" \) ) | \
gawk 'BEGIN { RS = "\0" }
      /^-/ { nondir[NR] = substr($0, 2) }
      /^d/ { dir[NR] = substr($0, 2) }
      END { asort(nondir)
            asort(dir)
            for (l in nondir)
              printf("%s\0", nondir[l]);
            for (l in dir)
              printf("%s\0", dir[l]); }' | \
xargs -0 ls -dU --color=auto


As we want, the script handles the case when I have a file created using:

  $ touch 'foo
  bar'

it ends up displayed as 'foo'$'\n''bar', indicating that it got passed
through correctly through the plumbing all the way to the final ls -dU.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]