bug#38269: SSAX incorrect handling of > in CDATA

From: Andrew Gierth
Subject: bug#38269: SSAX incorrect handling of > in CDATA
Date: Tue, 19 Nov 2019 13:41:54 +0000
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (berkeley-unix)

The bug:

> (xml->sxml "<e><![CDATA[&gt;]]></e>")
$2 = (*TOP* (e ">"))

The expected result is (*TOP* (e "&gt;")).

In upstream/SSAX.scm:

; procedure+:   ssax:read-cdata-body PORT STR-HANDLER SEED
; Within a CDATA section all characters are taken at their face value,
; with only three exceptions:
;       &gt; is treated as an embedded #\> character

This handling of &gt; is contrary to the XML specification, in which
there are no special character sequences inside CDATA except newline and
the "]]>" closing tag. I have confirmed this by checking other XML
parsers. The code seems to be based on a wild misreading of another
section of the specification that does not apply here. (And
unfortunately, the W3C validation suite for XML happens not to contain
any instances of &gt; inside CDATA.)

I believe the fix should be as simple as removing the entire (#\&) case
from the function (and fixing the test cases).

This bug seems to exist in all versions of SSAX.


