bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bu


From: Pádraig Brady
Subject: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086
Date: Tue, 09 Oct 2012 13:25:19 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 10/09/2012 12:52 PM, Jim Meyering wrote:> Eric Blake wrote:
>> On 07/26/2011 02:33 PM, marcel partap wrote:
>>> Here's a patch. Adds STRCASEEQ_LEN macro for case insensitive extension
>>> matching.
>>> #regards/marcel.
>>
>> Your patch would make the new behavior unconditional.  But I like case
>> sensitivity, and think that case insensitivity should be an opt-in
>> process that I request, with coordination between dircolors to
>> generate a new string for LS_COLORS to be honored by ls.  Furthermore,
>> the patch is lacking in NEWS, documentation, and testsuite coverage.
>>
>> Additionally, you should be aware that strncasecmp() has undefined
>> behavior in non-C multibyte locales.  It would probably be better to
>> use c_strncasecmp(), so that you are guaranteed defined behavior
>> regardless of the current locale.
>>
>> Would you care to tackle those additional issues?  And are you set up
>> for copyright assignment, since the patch will probably be non-trivial
>> by that point in time?
>
> I saw this while going back through old bugs, so started poking around.
> How about this: if a suffix is not matched, convert it to lower case
> and check again.  That way, anyone who cares to highlight .TaR or .TAR
> differently from .tar can still easily do so, yet names ending with
> uppercase .TAR, .JPG, .FLAC, etc. will now be highlighted by default.
>
> Does anyone think it's worth making this new fallback-to-case-insensitivity
> an option?

Not me anyway.

> While looking at that, I realized that comparing each name in an ls'd
> directory with each of nearly 100 suffixes should leave nontrivial room
> for improvement.  Sure enough, once I'd replaced that linear search
> with a hash-table lookup, I see a measurable improvement.

> and taking best-of-10 times, I see a 0.34 -> 0.23 (~30%) improvement.

Very nice.

> Of course, I have deliberately used the case that shows the most improvement:
>    - a directory with very many files
>    - no suffix match, so the old code searches all suffixes
>    - using tmpfs: minimal stat overhead

> @@ -4343,21 +4412,11 @@ print_color_indicator (const struct fileinfo *f, bool 
symlink_target)
>       }
>
>     /* Check the file's suffix only if still classified as C_FILE.  */
> -  ext = NULL;
> -  if (type == C_FILE)
> -    {
> -      /* Test if NAME has a recognized suffix.  */
> -
> -      len = strlen (name);
> -      name += len;                /* Pointer to final \0.  */
> -      for (ext = color_ext_list; ext != NULL; ext = ext->next)
> -        {
> -          if (ext->ext.len <= len
> -              && STREQ_LEN (name - ext->ext.len, ext->ext.string,
> -                            ext->ext.len))
> -            break;
> -        }
> -    }
> +  char *suffix;
> +  struct sufco *ext;      /* Color extension */
> +  ext = (type == C_FILE && (suffix = strrchr (name, '.'))
> +         ? suffix_color_lookup (suffix)
> +         : NULL);

So do we now only support suffixes delimited by '.' ?
Previously the delimiter was arbitrary or optional:

  touch star; LS_COLORS="*tar=01;31" /bin/ls --color *tar

cheers,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]