lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Question about BASE implementation


From: Klaus Weide
Subject: Re: LYNX-DEV Question about BASE implementation
Date: Mon, 21 Oct 1996 22:20:06 -0500 (CDT)

On Mon, 21 Oct 1996, Foteos Macrides wrote:

> >>    Fragments are not part of the URL..  They're instructions
> >> to the client [...]
> >
> >Yes, I noticed that RFC 1808 regarded fragments indentifiers as "not part
> >of the the URL".  I decided to not mention that complication...
> >The difference doesn't seem very relevant (in the context of _resolving_
> >relative URLs) anyway:  whether strictly "part of the URL" or not, the
> >RFC goes on to describe how they are inherited (or not, as in all normal
> >cases).  Also, parameters (the stuff after a ';') _are_ regarded as part
>                 ^^^^^^^^^^                         ?????
> >of the URL, but are not inherited from the base in most cases. 
>                      ^^^^^^^^^^^^^                  ????
> 
>       But I also don't understand what you're saying about parameters.

Well I sometimes have trouble understanding myself...

In section 2.1, the RFC says:

   Note that the fragment identifier (and the "#" that precedes it) is
   not considered part of the URL.  However, since it is commonly used
   within the same string context as a URL, a parser must be able to
   recognize the fragment when it is present and set it aside as part of
   the parsing process.

It doesn't say anything like that about parameters.  My conclusion:
they are considered as part of the URL.

Also, the line from the formal syntax in the next section,

   URL         = ( absoluteURL | relativeURL ) [ "#" fragment ]

seems to indicate that everything that may come before the [ # fragment ]
is considered "part of" an absolute or relative URL. 

Anyway, whether something is "considered part of the URL" or not seems
to be totally academic here, a question of semantics, in the context
of resolving relative URLs.  It doesn't have any meaning for the resolution
process, since the rules for that are fully spelled out in the rest
of the document (including rules for the fragment part as well as all
other components).  I think the distinction is necessary in other
contexts where URLs are used: e.g. the HTTP request line or message
headers, where fragments may be forbidden.

> RFC 1808 says:
> 
> 
>    Parameters, regardless of their purpose, do not form a part of the
>    ^^^^^^^^^^                               ^^^^^^^^^^^^^^^^^^^^^^^^^
>    URL path and thus do not affect the resolving of relative paths.  In
>    ^^^^^^^^

Now they (he? it?)'re talking about being part of a _URL path_, not of
a URL as a whole.  The _URL path_ is only part of the URL, as in

      <scheme>://<net_loc>/<path>;<params>?<query>#<fragment>

from section 2.1.  So everything between the slash after the hostname
and the beginning of a possible param, query or fragment (or end-of-string)
is the URL path.

>    particular, the presence or absence of the ";type=d" parameter on an
>    ftp URL does not affect the interpretation of paths relative to that
>    URL.  Fragment identifiers are only inherited from the base URL when
>    the entire embedded URL is empty.
> 
> 
> Can you think of any cases in which parameters *should* be inherited from
> the base? (I can't.)

This is from Step 5 of section 4:

           a) if the embedded URL's <params> is non-empty, we skip to
              step 7; otherwise, it inherits the <params> of the base
              URL (if any) and [...]

But to reach this part of Step 5 your relative URL has to be pretty
special, because all cases of relative URL which contain a scheme, 
a hostname, or a path have been weeded out by now (in the processs of
following the example algorithm).  This seems to amount to exactly
two cases:  the HREF (relative URL) has to start with either "?" or "#".
Those are the only two cases from the Examples in 5.1 that have ";p"
on the right-hand-side:

Relative to       Base: <URL:http://a/b/c/d;p?q#f>

      ?y         = <URL:http://a/b/c/d;p?y>
      #s         = <URL:http://a/b/c/d;p?q#s>

I don't find either of the two cases very surprising.

>       Also, do you understand that last sentence about a condition for
> which a fragment is inherited from the base? (I don't.) 

HREF="" is equivalent to HREF="<the full Base URL, 
                                including a possible fragment>".

Does it ever make sense to use this, a totally empty relative URL
as an abbreviation for the full Base?  Maybe it does.  Or maybe some
deterministic behaviour just had to be specified for a pathological case.

I don't have any special knowledge, just reading the same RFC document
as you and trying to make sense of it; maybe it helps that my reading is
not tainted by any deep knowledge of implementation issues.[1] :-)

  Klaus

[1] For example the HTParse() function.  It uses some terms differently
    from the RFC (e.g. "path").
     On very cursory examination,
    it appears that Lynx's version of it doesn't have any knowledge
    about URL parameters built in.  The cursory examination consisted
    of searching for the string "';'" in the relevant file.  Libwww5's
    HTParse.c contains one occurence, in HTSimplify().  I suspect 
    neither one gets it exactly right for all cases according to the
    RFC, but I'd have to check.  Anyway, URL parameters are rarely used.
    

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]