[Orgmode] Re: org-feed XML entities and character encoding

From: Michael Brand
Subject: [Orgmode] Re: org-feed XML entities and character encoding
Date: Fri, 13 Aug 2010 21:03:52 +0200
User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv: Gecko/20100317 Thunderbird/3.0.4

Hi David

On 10-08-13 17:59 , David Maus wrote:
2. request for help about an issue with multibyte character encoding

There is an issue with multibyte characters that appear in the input
as unescaped, multibyte encoded characters (not as XML entities, as XML
entities multibyte characters are simply substituted correctly). I
looked for an example with a character encoding specified in the first
line of the XML feed like
<?xml version="1.0" encoding="utf-8"?>
and found one here:

The problem with this feed is, that it contains raw unicode characters
that must be converted to utf-8 before they can be properly inserted
in the target buffer.

Attached patch does this by explicitely decoding new entries according
to their detected character encoding.

Btw.: Helpful introduction to the topic gives

The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)

by Joel Spolsky


Thank you very much for your patch, it resolves this issue with
org-feed.el like expected. I tested your patch with the two feeds
http://www.openscreencast.de/blog/rss.xml  (declared utf-8)
http://pod.drs.ch/world_music_special_mpx.xml  (not declared utf-8)
described more by me earlier and a dozen other feeds, all with
character encoding utf-8.


