[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Spaces in <A HREF="#foo bar"> are deleted

From: Foteos Macrides
Subject: Re: LYNX-DEV Spaces in <A HREF="#foo bar"> are deleted
Date: Wed, 26 Mar 1997 17:50:35 -0500 (EST)

Manheim Township Crew <address@hidden> wrote:
>Jim Spath (Webmaster Jim) wrote:
>> On Wed, 26 Mar 1997, Duncan Hill wrote:
>> > On Wed, 26 Mar 1997, Andrew Haylett wrote:
>> > > If I have
>> > >     <A HREF="#foo bar">link</A>
>> > Netscape and Co, are working around the problem.  The space should be
>> > hex-encoded for it to work properly under Lynx (Is it in an RFC anywhere?)
>> > If I remember correctly, its %20 for
>> > <A HREF="#foo%20bar">blah</a>
>> They are ignoring standards, not "working around the problem."  Look
>> at RFC1808, which has (in part):
>> RFC 1808           Relative Uniform Resource Locators          June 1995
>>    URL         = ( absoluteURL | relativeURL ) [ "#" fragment ]
>>    fragment    =  *( uchar | reserved )
>>    uchar       = unreserved | escape
>>    unreserved  = alpha | digit | safe | extra
>>    escape      = "%" hex hex
>>    hex         = digit | "A" | "B" | "C" | "D" | "E" | "F" |
>>                          "a" | "b" | "c" | "d" | "e" | "f"
>>    safe        = "$" | "-" | "_" | "." | "+"
>>    extra       = "!" | "*" | "'" | "(" | ")" | ","
>> I don't see " " in there anywhere.
>But, in 2.3.3 "Excluded Characters", space is mentioned.

        In all existing HTML DTDs, the NAME and ID attributes can have
only letters, numbers, periods, and hyphens in their values.  All
other characters are invalid.   Thus, markup such as this:

        <A NAME="The End">...</A>
or:     <P ID="The End">...</P>

is invalid.  If you hex escaped the space (%20), it is still invalid
by virtue of the '%'.

        Lynx does not do anything about invalid characters in NAME
and ID attribute values, because it's not clear what to do about
them, and thus will accept values of "The End" or "The%20End".
But they shouldn't be present in valid HTML markup.

        In the URL RFCs and drafts, fragments technically could be
specified for any scheme, and can have any characters, but reserved
and unsafe characters must be hex escaped.  Thus, the fragment:


technically is valid.  However, at present, fragments have been
specified only for the http scheme (and by extension, https,
though there is no IETF RFC or draft for https).  Fragments
are "instructions" to the UA (user agent), and not actually part
the the URL, per se.  For the http scheme (and by extension the
https scheme), fragments are instructions for the UA to seek the
corresponding NAME or ID in the text/html specified by the actual
URL, and present the rendition at that point, rather than at the
top of the document.  Since NAME and ID values cannot have a '%'
in them, the above fragment, though permissible according the
URL specifications, if appended to an http URL is an instruction
to seek something which is not valid in text/html and thus should
not be present in the document.

        It so happens that if you use NAME="The%20End" for a
NAME-ed Anchor, and HREF="#The%20End" for a link, that will
"work" with Lynx, because it accepted the invalid '%' in the
NAME attribute value.  But Lynx users should not use Lynx
as a "validator" and treat its side effects or error recoveries
as "features".

        Lynx expects all HREF and SRC attribute values to be
valid URLs (except for SGML entities), with spaces hex escaped,
and assumes that any unescaped spaces are a consequence of
wrapping such that they should be eliminated.  So, if you use
NAME="The End" the space will be retained, and if you use
HREF="The End" that space will be eliminated, such that the
link won't "work".  If you use the raw space in a quoted URL
as the STARTFILE on the command line, it won't be eliminated
(because it couldn't be a wrapped HREF attribute value), so that
will "work".  But you shouldn't be using a raw space in either
attribute value, or in a quoted STARTFILE URL.

        Note that Lynx collapses rather than eliminating any
unescaped spaces for the lynxexec and lynxprog schemes.  It
techically should require that they be hex escaped, but most
users expect to be able to use any operating system command
for those two "Lynxisms" equivalently to how they would do
it on the command line.


 Foteos Macrides            Worcester Foundation for Biomedical Research
 address@hidden         222 Maple Avenue, Shrewsbury, MA 01545
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]