[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] Fix of upstream parsing of CDATA
From: |
Linus Björnstam |
Subject: |
[PATCH] Fix of upstream parsing of CDATA |
Date: |
Thu, 16 Jan 2020 13:00:25 +0100 |
User-agent: |
Cyrus-JMAP/3.1.7-754-g09d1619-fmstable-20200113v1 |
Hello Guilers!
RhodiumToad found an error in sxml where it would not properly parse CDATA: >
would be converted to > inside CDATA blocks. This is probably due to some wrong
reading of the XML spec:
"Within a CDATA section, only the CDEnd string is recognized as markup, so
that left angle brackets and ampersands may occur in their literal form; they
need not (and cannot) be escaped using ' < ' and ' & '.".
Notice that it mentions that only CDEnd is recognized, but omitts > in the
enumeration of things that need-not-and-cannot be escaped.
No other XML libraries behave this way. Take for example python's Etree:
Python 2.7.17 (default, Dec 23 2019, 21:25:33)
>>> import xml.etree.ElementTree as ET
>>> root = ET.fromstring("<e><![CDATA[>]]></e>")
>>> root.text
'>'
The same thing with the un-patched (sxml ssax) (or rather (sxml simple)): looks
different:
(xml->sxml "<e><![CDATA[>]]></e>")
;; => (*TOP* (e ">"))
The question is whether this patch should be sent upstream. Since there has
been very little activity there, I suspect it is a lost cause.
Failing tests have been looked through, verified and fixed. No unexpected
errors were encountered. All SXML tests pass after this patch.
Best regards
Linus Björnstam
0001-module-sxml-upstream-SSAX.scm-Fix-improper-handling-.patch
Description: Binary data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [PATCH] Fix of upstream parsing of CDATA,
Linus Björnstam <=