emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: url-retrieve-synchronously and coding


From: Lennart Borgman
Subject: Re: url-retrieve-synchronously and coding
Date: Mon, 24 Jan 2011 13:21:16 +0100

On Mon, Jan 24, 2011 at 4:37 AM, Stefan Monnier
<address@hidden> wrote:
>> If I do something like this
>>          (setq buffer (url-retrieve-synchronously url))
>
>> and the contents of the buffer begins with
>
>>   HTTP/1.1 200 OK
>>   Content-Type: text/xml
>>   X-Content-Type-Options: nosniff
>>   Connection: close
>
>>   <?xml version = "1.0" encoding="UTF-8" standalone="yes"?><!--
>
>>   Content-type: fix-mhtml
> -->
>
>> should not then the buffer file coding system be utf-8?
>
> I don't think so, because url-retrieve-synchronously handles the HTTP
> part of the protocol only.  Maybe you're thinking of url-insert-file-contents?


Ok, thanks. It is not easy to navigate among those functions. But I
guess we have said before that better documentation is needed.

Unfortunately url-insert-file-contents does not decode the file as
utf-8. mm-disect-buffer looks for the charset, but only in the mime
headers. In this case the charset is specified instead in the xml
content.

I do not know how the retrieved content above should be handled. It
looks however like the web browsers handles this case and shows the
xml content correctly.

It seems natural in a case like this where Content-Type is text/xml to
look for the specified charset in the xml content. I think
`url-insert' should do this. Here is a suggestion for how to do it
where I just have added a search for <?xml encoding=...>:


(defun url-insert (buffer &optional beg end)
  "Insert the body of a URL object.
BUFFER should be a complete URL buffer as returned by `url-retrieve'.
If the headers specify a coding-system, it is applied to the body
before it is inserted.
Returns a list of the form (SIZE CHARSET), where SIZE is the size in bytes
of the inserted text and CHARSET is the charset that was specified in
the header,
or nil if none was found.
BEG and END can be used to only insert a subpart of the body.
They count bytes from the beginning of the body."
  (let* ((handle (with-current-buffer buffer (mm-dissect-buffer t)))
         (data (with-current-buffer (mm-handle-buffer handle)
                 (if beg
                     (buffer-substring (+ (point-min) beg)
                                       (if end (+ (point-min) end) (point-max)))
                   (buffer-string))))
         (charset (mail-content-type-get (mm-handle-type handle)
                                          'charset)))
    (mm-destroy-parts handle)
    (if charset
        (insert (mm-decode-string data (mm-charset-to-coding-system charset)))
      (if (not (string= "xml" (mm-handle-media-subtype handle)))
          (insert data)
        ;; Content is XML, use the specified encoding if any:
        (let ((coding-system
               (with-temp-buffer
                 (insert (substring data 0 100))
                 (let* ((enc-pos (progn
                                   (goto-char (point-min))
                                   (xmltok-get-declared-encoding-position)))
                        (enc-name
                         (and (consp enc-pos)
                              (buffer-substring-no-properties (car enc-pos)
                                                              (cdr enc-pos)))))
                   (cond (enc-name
                          (if (string= (downcase enc-name) "utf-16")
                              (nxml-choose-utf-16-coding-system)
                            (nxml-mime-charset-coding-system enc-name)))
                         (enc-pos (nxml-choose-utf-coding-system)))))))
          (if coding-system
              (insert (mm-decode-string data coding-system))
            (insert data)))))
    (list (length data) charset)))

Is this the right thing to do, or?


Something more is needed to get things working in my case, but I want
to know if this part is ok first. Or is perhaps the coding handled to
late here?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]