emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Orgmode → ODT: Certain chars break export


From: Tory S. Anderson
Subject: Re: [O] Orgmode → ODT: Certain chars break export
Date: Fri, 13 Feb 2015 10:18:24 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux)

>From a user perspective just stripping the characters seems best to me, but 
>finding out what the characters seems obnoxious. Neither a quick search nor 
>skimming the ODT doc specification[1][2] seem to give any insight into a set 
>of illegal characters. Does elisp have anything similar to Java's 
>"isWhitespace"[3] that could be used to check character features? 

Rasmus <address@hidden> writes:

> address@hidden (Tory S. Anderson) writes:
>
>> While we're on the topic of ODT export problems: I was in the process
>> of converting PDF to Text to Org to ODT/DocX and discovered that
>> certain characters seem to break exported odt documents, which fail
>> with a line and col number. So far the only one I know for sure is the
>> "" (Char: C-l (12, #o14, #xc)). Hopefully a single fix can handle
>> all such cases.
>>
>> You probably don't need it, but I verified with the following file:
>> http://toryanderson.com/files/breakorg.org
>
> The export is fine, but the produced XML is invalid since it contains an
> illegal character.  But how to resolve this?  Should ox strip illegal
> charterers (if so what are they)?  If so, could they be used for entities?
>
> —Rasmus

Footnotes: 
[1]  https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
[2]  
http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#__RefHeading__1415196_253892949
[3]  http://www.fileformat.info/info/unicode/char/000c/index.htm




reply via email to

[Prev in Thread] Current Thread [Next in Thread]