Re: lynx-dev URL guessing for .CA domain suggestion

From: Bela Lubkin
Subject: Re: lynx-dev URL guessing for .CA domain suggestion
Date: Thu, 8 Oct 1998 16:49:37 -0700

Leonid Pauzner wrote:

> It would be really great to disable URL guessing for hosts
> ends with "dot + two letters" since it most likely a country code
> like ".uk" or ".ru"
> This will be a very limited disadvantage for edu/com/org/net users
> because second level domains usually have a longer names,
> but really important for country domains:
> a typo in user-defined URL fall into the obviously stupid "URL guessing"
> proccess like etc. - they definitely not exists.
> Changes should be somethere in LYExpandHostForURL()
> Can anybody fix it?

Here's the proper way to do what you want -- and this is definitely for
post-2.8.1 work.  Add a third table to specify domain name endings which
you do not want guessed.  You might have:


The third line is a list of suffixes which are to be considered terminal
-- no guesses should be appended to them.  Note that I've included .com
and so on in my sample entry; this prevents guesses like "can't find, trying".  It would *seem*
sensible to automatically include URL_DOMAIN_SUFFIXES in
URL_DOMAIN_NOGUESS_SUFFIXES, but we retain more flexibility if we don't.
Then the user can choose to or not, by what he puts in the NOGUESS

The code already does this for a URL_DOMAIN_NOGUESS_PREFIXES list,
except that the list is embedded in the code.  The embedded list is
equivalent to:


If someone implements what I'm suggesting, I would recommend also
making URL_DOMAIN_NOGUESS_PREFIXES configurable.

Finally, I see that there is no way to specify "empty guess" in the
list.  That is, suppose I would like to have:


Then if I do `lynx zark`, I intend it to guess:      <-- empty prefix      <-- empty prefix
  zark          <-- empty prefix, suffix
  www.zark      <-- empty suffix
  ftp.zark      <-- empty suffix

You cannot specify an empty prefix or suffix.  This should be fixed.

Other stuff: if it guesses a prefix that corresponds to a known
protocol, shouldn't it guess the protocol as well?  That is, suppose the
above sequence of guesses succeeded at shouldn't it then
have guessed ftp protocol, i.e., not  Furthermore, shouldn't that be user-configurable
somehow?  For instance, some sites use "", so maybe I
want Lynx to guess that, with HTTP protocol:


"look for www.whatever.i.said, and if you find it, make it an http: URL;
then look for ftp.whatever and make it ftp:; finally look for
web.whatever and make it http:"

Again, all of this is post-2.8.1 stuff.


