[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
From: |
Eli Zaretskii |
Subject: |
bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE |
Date: |
Fri, 03 Apr 2020 19:24:09 +0300 |
> From: Mattias Engdegård <mattiase@acm.org>
> Date: Fri, 3 Apr 2020 16:18:43 +0200
>
> ENCODE_FILE and DECODE_FILE turn out to be surprisingly slow, and allocate
> copious amounts of memory, to the point that they often turn up in both
> memory and cpu profiles. (This is on macOS; I haven't checked the situation
> elsewhere.)
AFAIR, on macOS the situation is worse than elsewhere, because of the
normalization thing.
> For instance, a single call to file-relative-name, with ASCII-only arguments,
> manages to allocate 140 KiB. There are several conversion steps each
> involving creating temporary buffers as well as the compilation and execution
> of very large "quick-check" regexps. Example:
>
> (progn
> (require 'profiler)
> (profiler-reset)
> (garbage-collect)
> (profiler-start 'mem)
> (file-relative-name "abc")
> (profiler-stop)
> (profiler-report))
Can you tell more about the conversion steps and the memory each one
allocates?
> Perhaps we can assume that file names codings are always ASCII-compatible
I don't think every encoding is ASCII compatible, so I don't see how
we can assume that in general. But the check whether an encoding is
ASCII-compatible takes a negligible amount of time, so why bother with
such an assumption?
> There is already a hack in encode_file_name that assumes that no unibyte
> string ever needs encoding; if so, the shortcut could perhaps be extended to
> decode_file_name and simplified.
I'm not sure I understand what you mean by extending the shortcut to
decode_file_name. Please elaborate.
> - if (BUFFERP (dst_object))
> + if (EQ (dst_object, Qt))
> + {
> + /* Fast path for ASCII-only input and an ASCII-compatible coding:
> + act as identity. */
> + Lisp_Object attrs = CODING_ID_ATTRS (coding.id);
> + if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))
> + && (STRING_MULTIBYTE (string)
> + ? (chars == bytes) : string_ascii_p (string)))
> + return string;
I don't think we can return the same string if NOCOPY is non-zero.
The callers might not expect that, and you might inadvertently cause
the original string be modified behind the caller's back.
But if NOCOPY is 'false', I think this change is OK. Just make sure
the test suite doesn't start failing, maybe there's something else we
are missing.
Thanks.
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Mattias Engdegård, 2020/04/03
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE,
Eli Zaretskii <=
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Mattias Engdegård, 2020/04/03
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/04
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Mattias Engdegård, 2020/04/04
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/04
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/04
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Mattias Engdegård, 2020/04/04
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/04
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/04
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Mattias Engdegård, 2020/04/05
- bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE, Eli Zaretskii, 2020/04/05