bug#40407: [PATCH] slow ENCODE_FILE and DECODE

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE

From:	Eli Zaretskii
Subject:	bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date:	Fri, 03 Apr 2020 19:24:09 +0300

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Fri, 3 Apr 2020 16:18:43 +0200
> 
> ENCODE_FILE and DECODE_FILE turn out to be surprisingly slow, and allocate 
> copious amounts of memory, to the point that they often turn up in both 
> memory and cpu profiles. (This is on macOS; I haven't checked the situation 
> elsewhere.)

AFAIR, on macOS the situation is worse than elsewhere, because of the
normalization thing.

> For instance, a single call to file-relative-name, with ASCII-only arguments, 
> manages to allocate 140 KiB. There are several conversion steps each 
> involving creating temporary buffers as well as the compilation and execution 
> of very large "quick-check" regexps. Example:
> 
> (progn
>   (require 'profiler)
>   (profiler-reset)
>   (garbage-collect)
>   (profiler-start 'mem)
>   (file-relative-name "abc")
>   (profiler-stop)
>   (profiler-report))

Can you tell more about the conversion steps and the memory each one
allocates?

> Perhaps we can assume that file names codings are always ASCII-compatible

I don't think every encoding is ASCII compatible, so I don't see how
we can assume that in general.  But the check whether an encoding is
ASCII-compatible takes a negligible amount of time, so why bother with
such an assumption?

> There is already a hack in encode_file_name that assumes that no unibyte 
> string ever needs encoding; if so, the shortcut could perhaps be extended to 
> decode_file_name and simplified.

I'm not sure I understand what you mean by extending the shortcut to
decode_file_name.  Please elaborate.

> -  if (BUFFERP (dst_object))
> +  if (EQ (dst_object, Qt))
> +    {
> +      /* Fast path for ASCII-only input and an ASCII-compatible coding:
> +         act as identity.  */
> +      Lisp_Object attrs = CODING_ID_ATTRS (coding.id);
> +      if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))
> +          && (STRING_MULTIBYTE (string)
> +              ? (chars == bytes) : string_ascii_p (string)))
> +        return string;

I don't think we can return the same string if NOCOPY is non-zero.
The callers might not expect that, and you might inadvertently cause
the original string be modified behind the caller's back.

But if NOCOPY is 'false', I think this change is OK.  Just make sure
the test suite doesn't start failing, maybe there's something else we
are missing.

Thanks.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Mattias Engdegård, 2020/04/03
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii <=
  - bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Mattias Engdegård, 2020/04/03
    - bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/04
    - bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Mattias Engdegård, 2020/04/04
    - bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/04
    - bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/04
    - bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Mattias Engdegård, 2020/04/04
    - bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/04
    - bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/04
    - bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Mattias Engdegård, 2020/04/05
    - bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/05

Prev by Date: bug#40409: 27.0.90; void variable n-reb
Next by Date: bug#39977: 28.0.50; Unhelpful stack trace
Previous by thread: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Next by thread: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Index(es):
- Date
- Thread