URL library problem

From: Paul Pogonyshev
Date: Sun, 2 Oct 2005 21:48:06 +0300
User-agent: KMail/1.4.3


I believe I have found a serious problem in the URL library.  If you
look at the very end of function `url-http', you can see that the
result of `url-http-create-request' is sent to the connection as-is.
But encoding of the connection is binary!  It means, that multibyte
strings are sent in Emacs internal coding, which nothing but Emacs

Form data sent as `multipart/form-data' is usually sent in the
encoding of the page, e.g. UTF-8.  With the current state of URL, it
seems to be impossible to send non-ASCII `multipart/form-data'.

Here is a test:

(let ((url-request-method "POST")
      (url-request-extra-headers '(("Content-Type" . "multipart/form-data; 
      (url-request-data (concat "-----\r\nContent-Disposition: form-data; 
                (lambda () (pop-to-buffer (current-buffer)))))

Save the buffer it pops up as an HTML and open it in a browser.  It
should be a Wikipedia preview page with Russian word ``проверка''
(`test'), but it isn't.  Instead of UTF-8, the word got sent in Emacs
internal coding.

Note how explicit UTF-8 encoding helps nothing, because
`url-request-data' is later concatenated with some strings turning
multibyte again:

(let ((url-request-method "POST")
      (url-request-extra-headers '(("Content-Type" . "multipart/form-data; 
      (url-request-data (encode-coding-string
                         (concat "-----\r\nContent-Disposition: form-data; 
                (lambda () (pop-to-buffer (current-buffer)))))

However, this trivial (and not-for-production) patch makes the first
test work, because it encode the complete request, which is then sent
to Wikipedia server unmodified:

--- /home/paul/emacs/lisp/url/url-http.el       2005-09-28 16:56:02.000000000 
+++ /tmp/buffer-content-2240ocC 2005-10-02 21:30:00.000000000 +0300
@@ -268,7 +268,7 @@ request.
           ;; Any data
     (url-http-debug "Request is: \n%s" request)
-    request))
+    (encode-coding-string request 'utf-8))
 ;; Parsing routines
 (defun url-http-clean-headers ()

Of course, uncoditional encoding in UTF-8 is not a right thing to do.
Actually, encoding of the complete request is not right.  A proper
patch would simply avoid concatenating `url-request-data' with
anything and send it to the connection verbatim, assuming that the
user of the library has already properly encoded it.  The reason for
this is that `multipart/form-data' can have different parts in
different encoding (even if it is hardly ever used.)

Are you interested in a patch?


