bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script con

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script con

From:	Kai Tetzlaff
Subject:	bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters
Date:	Fri, 25 Feb 2022 10:04:47 +0100

The sieve-manage package uses the managesieve protocol (s.
https://datatracker.ietf.org/doc/html/draft-ietf-sieve-managesieve) to
communicate with a sieve server process.

When the sieve-manage client retrieves a script from the server it uses
the `sieve-manage-getscript' function to send command `GETSCRIPT
"<scriptname>"<CRLF>` to the server and tries to parse the response.

If the downloaded sieve script contains multibyte characters the attempt
to parse the response results in an infloop (in
`sieve-manage-parse-okno').

To reproduce, save the following code

sieve-manage-getscript-minimal-example.el
Description: minimal example

to a file and run: emacs -Q -l <file>


* Detailed analysis:

The example code sets up a response buffer for a successful managesieve
`response-getscript` defined as:

    response-getscript = (sieve-script CRLF response-ok)

Here's the buffer content:
```
1: {32}<CRLF>
2: if body :matches "ä" { stop; }<LF>
3: <CRLF>
4: OK "Getscript completed."<CRLF>
```

It comprises:

1. lines 1-2 (`sieve-script`): encoded as a managesieve `literal-s2c`
   string which:
   a. starts with a length in the form '{<OCTETS>}<CRLF>' (i.e. 32)
   b. followed by the string data (i.e. the actual script: 'if
      body :matches "ä" { stop;}<LF>') using UTF-8 encoding

2. line 3 (`CRLF`)

3. line 4 (`response-ok`): 'OK' SP "Getscript completed." (the latter is
   an optional `quoted` string which can be shown to the user)

The sieve-manage code is parsing the length into an integer and uses it
to skip over `sieve-script` to get to the start of line 3 (<CRLF> -
empty) which is then also skipped to get to line 4 in order to parse the
result ('OK').


Now the problem:

Since sieve-manage explicitly enables multibyte support in the response
buffer (by calling '(mm-enable-multibyte)' in
`sieve-manage-make-process-buffer`) and uses `goto-char' for the purpose
of skipping/jumping over `sieve-script`, each multibyte character in
`sieve-script` causes the jump to go 1 (2, 3) character(s) too far. In
the example above there's only a single 2 byte character (`ä`), so
instead of skipping to the beginning of line 3, we land in the middle of
<CRLF>: <CR(point)LF>.

This causes the following attempt to parse the result code (i.e. the 'OK
"Getscript completed."<CRLF>' line) to infloop in
`sieve-manage-parse-okno'.


* An attempt of a fix:

As far as I can tell, the attached patch fixes the issue for the
GETSCRIPT command.

diff --git a/lisp/net/sieve-manage.el b/lisp/net/sieve-manage.el
index 50342b9105..8020e6fdca 100644
--- a/lisp/net/sieve-manage.el
+++ b/lisp/net/sieve-manage.el
@@ -449,10 +449,19 @@ sieve-manage-deletescript
 (defun sieve-manage-getscript (name output-buffer &optional buffer)
   (with-current-buffer (or buffer (current-buffer))
     (sieve-manage-send (format "GETSCRIPT \"%s\"" name))
+    (set-buffer-multibyte nil)
     (let ((script (sieve-manage-parse-string)))
+      (set-buffer-multibyte t)
       (sieve-manage-parse-crlf)
       (with-current-buffer output-buffer
-       (insert script))
+        (insert (decode-coding-string
+                 script
+                 ;; not sure if using `buffer-file-coding-system' is
+                 ;; the right approach, it might be better to hardcode
+                 ;; it to utf-8-* (managesieve requires UTF-8
+                 ;; encoding) but in that case, which variant of
+                 ;; utf-8-unix/dos/...  is to be used?
+                 buffer-file-coding-system t)))
       (sieve-manage-parse-okno))))
 
 (defun sieve-manage-setactive (name &optional buffer)


* Additional remarks:

There might be more problems. E.g. `sieve-manage-putscript' contains the
following comment:

    ;; Here we assume that the coding-system will
    ;; replace each char with a single byte.
    ;; This is always the case if `content' is
    ;; a unibyte string.

which seems to indicate that it might also have an issue with multibyte
content (even though I have not experienced any uploading issues). I
will try do some more testing to check that.


In general, it is also not clear to me why the response (or process)
buffer needs to be multibyte enabled at all as it should only be used
for the line/byte oriented protocol data. But the commit message of
8e16fb987df9b which introduced the multibyte handling states:

    commit 8e16fb987df9b80b8328e9dbf80351a5f9d85bbb
    Author: Albert Krewinkel <krewinkel@moltkeplatz.de>
    Date:   2013-06-11 07:32:25 +0000
    ...
        * Enable Multibyte for SieveManage buffers: The parser won't properly
          handle umlauts and line endings unless multibyte is turned on in the
          process buffer.
    ...

so this was obviously done on purpose. I contacted Albert about this but
he couldn't remember the details (it's been nearly 10 years).



In GNU Emacs 29.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.31, cairo 
version 1.16.0)
 of 2022-02-18 built on moka
Repository revision: 51e51ce2df46fc0c6e17a97e74b00366bb9c09d8
Repository branch: master
System Description: Debian GNU/Linux bookworm/sid

Configured using:
 'configure --with-pgtk --with-native-compilation'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY
PDUMPER PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS XIM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

[Prev in Thread]

Current Thread

[Next in Thread]

bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters, Kai Tetzlaff <=
- bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters, Lars Ingebrigtsen, 2022/02/25
  - bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters, Lars Ingebrigtsen, 2022/02/25
    - bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters, Kai Tetzlaff, 2022/02/25
    - bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters, Lars Ingebrigtsen, 2022/02/26
    - bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters, Kai Tetzlaff, 2022/02/28
    - bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters, Lars Ingebrigtsen, 2022/02/28
    - bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters, Lars Ingebrigtsen, 2022/02/28
    - bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters, Kai Tetzlaff, 2022/02/28
    - bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters, Kai Tetzlaff, 2022/02/28

Prev by Date: bug#54025: 27.2; dired-do-find-regexp skips occurences
Next by Date: bug#54140: 27.2; desktop-save-mode fails to restore frame and window config
Previous by thread: bug#12291: [rev 109796] wrong UTF-8 handling
Next by thread: bug#54154: 29.0.50; [PATCH] `sieve-manage-getscript' fails if script contains multibyte characters
Index(es):
- Date
- Thread