[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Converting a string to valid XHTML id?

From: Lawrence Mitchell
Subject: Re: Converting a string to valid XHTML id?
Date: Thu, 02 Dec 2010 15:50:11 +0000
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (usg-unix-v)

Lennart Borgman wrote:
> On Thu, Dec 2, 2010 at 5:42 AM, PJ Weisberg <address@hidden> wrote:

>>> In the context where it is used it is for export of org-mode files to
>>> xhtml. Obviously if there are links to anchors within other files my
>>> approach will fails.

>>> So, hm, maybe I should reset this variable when starting a directory
>>> tree export or a single file export rather than making it buffer
>>> local. (But then I have to look into the export of directory trees in
>>> org-mode which I have not done yet.)

>> Just to be sure we're on the same page: the string MUST be unique
>> within the output, but it may NOT be unique within the input?
>> Therefore calling the function twice with the same argument must give
>> different results?

> No, I think they are already unique enough so to say in org-mode.
> Otherwise the links within org-mode could not work.

> So calling the function with the same argument must give the same
> result all times. (AND that result must be unique, ie no other input
> string should give the same result.)

As suggested previously, just take a crypto hash of the id.

(defun org-newhtml-escape-id (id)
   (format "ANON-%s" (sha1 id)))

As long as you do this for /all/ ids in the buffer, that'll work

If you only do it to invalid ids, then there's the possibility
that an existing ID in the buffer will have the form ANON-sha1sum
and a different invalid id will be escaped to ANON-sha1sum.

Or use Davis' solution which works in a similar way, and as a
bonus you can map back to the original id easily.

Recall his solution:

(defun org-newhtml-escape-id (str)
  "Return a valid xhtml id attribute string.
See URL `http://xhtml.com/en/xhtml/reference/attribute-data-types/#id'."
   "[^-.a-zA-Z0-9]" (lambda (c)
                      (mapconcat (lambda (d) (format "_%02x" d))
                                 (string-as-unibyte c) "")) str))

Notice that the output uses "_" which is a /valid/ char in an
xhtml id.  However, it is not considered valid in an input

So (org-newhtml-escape-id "foo_5fbar") => foo_5f5fbar
But (org-newhtml-escape-id "foo_bar") => foo_5fbar

So notice that valid ids /without/ an underscore in them are left
as is, but ids with an underscore are encoded under this scheme,
so you can't generate a collision.


Lawrence Mitchell <address@hidden>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]