From: Jambunathan K
Subject: Re: [O] ODT Charset/Encoding issues (was question about ODT export behavior)
Date: Sun, 17 Jul 2011 01:43:28 +0530
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (windows-nt)


> I just want to add one point that I did not find in the org-manual.  I tested
> some of my org-files and exported them to the OpenOffice format. When I tried 
> to
> open these documents in OpenOffice, they were corrupt and could not be opened.
> I soon found out why. If you want to export an org-mode file to .odt, you need
> to explicitly set the file encoding to UTF-8 (I usually use iso-8859-1 
> encoding
> for my files), like:
> #-*- mode: org; coding: utf-8; -*-
> After that OpenOffice could open the files without any problems.

I use English for communication and I have to admit that I have zero
understanding of things like character sets, encodings etc. 

Thanks for the above note. I surely see is a bug but my poor
understanding prevents me from quantifying it further.

Could you please send me a minimal iso-8859-1 test.org file and the
associated corrupted test.odt file? I will look in to this issue.

1. Do you have any specific requirement on how the component xml files
   be encoded? A cursory look at the odt exporter suggests that it could
   actually be emitting xml files in iso-8859-1 format while wrongly
   claiming UTF-8 encoding as below

--8<---------------cut here---------------start------------->8---
<?xml version="1.0" encoding="UTF-8"?>
--8<---------------cut here---------------end--------------->8---

2. Should the xml file be always ejected in UTF-8 irrespective of how
   the original Org file is encoded.

[Notes to Self]
[Notes from odbook]

Para 3 of http://books.evc-cit.info/odbook/apa.html#appc-11-fm2xml

--8<---------------cut here---------------start------------->8---
OpenDocument files are always encoded in UTF-8. 
--8<---------------cut here---------------end--------------->8---

Para 2 of

--8<---------------cut here---------------start------------->8---
XML 1.0 allows a document to be encoded in any character set registered
with the Internet Assigned Numbers Authority (IANA). European documents
are commonly encoded in one of the ISO Latin character sets, such as
ISO-8859-1. Japanese documents commonly use Shift-JIS, and Chinese
documents use GB2312 and Big 5.
--8<---------------cut here---------------end--------------->8---

Para 4 of

--8<---------------cut here---------------start------------->8---
XML processors are not required by the XML 1.0 specification to support
any more than UTF-8 and UTF-16, but most commonly support other
encodings, such as US-ASCII and ISO-8859-1.
--8<---------------cut here---------------end--------------->8---

[Notes from XMLmind XSL-FO Converter]

XFC supports outputting of content.xml and styles.xml in UTF-8 as well
as ISO-8859-1.



,---- [see outputEncoding section]
| For OpenDocument output (.odt), this option specifies the encoding of
| XML content (files styles.xml and content.xml) in the output
| document. All encodings available in the current JVM are supported. The
| option value may be either the encoding name (e.g. ISO8859_1) or the
| charset name (e.g. ISO-8859-1). The default value is UTF8.


