[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Revised patch for HTFTP.c

From: Doug Kaufman
Subject: Re: lynx-dev Revised patch for HTFTP.c
Date: Sun, 20 Aug 2000 13:31:19 -0700 (PDT)

On Sun, 20 Aug 2000, Klaus Weide wrote:

> On Sat, 19 Aug 2000, Doug Kaufman wrote:
> > I still don't see where any servers are broken. 
> I don't see how you can continue to say that, after it has been shown
> that some servers send CRCRLF in ASCII mode in place of a line break.
> That's so obviously broken - what more do you want?
> Maybe you understand server in a different way than I.  What I mean
> with "server" is the entity with which the FTP client interacts, as
> it can be observed in its behavior during interaction.  It doens't matter
> whether the server software daemon ("server" in some narrower sense) is
> broken, or the setup is misconfigured, or both - the observable behavior
> of the server is broken.

I think that I understand what you are saying; I just view it
differently. When the server is instructed to send the file in ASCII
mode and it is a unix server, it takes each LF and converts it to
CRLF. The fact that the file contains a CR immediately prior to the
LF is irrelevant, since the server isn't required to analyze the file
structure. I view the instruction to send in ASCII as broken, not the
server; the latter is following instructions as given by the client.
So I don't see a CRCRLF linebreak. I see a CR character in the file,
followed by a CRLF linebreak.
> > I thought that we
> > had agreed (correct me if I am in error) that the responsibility for
> > deciding mode of transfer is held by the client. It seems that lynx
> > is specifically requesting transfer in ascii mode of files which are
> > not "text" files, if "text" is defined as being in the native text
> > format of the server. 
> "Text files" means "text files", textual in nature, or in MIME terms, 
> anything that would be correctly labelled as a "text/*" media type (after
> possibly some canonicalization).  It's a property of the content, not
> of a specific representation (which may be "native" or not on a given
> server).
> That's how I understand it...

I would view it more in functional terms. Text should only be defined
in terms of the process that needs a definition of "text". For lynx,
this may mean a file that can be rendered. For FTP, it may mean a
file that should be altered during transfer, as opposed to being
transferred in image form.
> > Any other files should be considered binary,
> > including files with DOS EOLs residing on a unix server. The problem
> > is that lynx is requesting them in ascii transfer mode, which is
> > inappropriate.
> All lynx has to go by for making the decision A vs. I is
> (1) suffix mappings
> (2) explicit ";type=[I|A]" parameter in URL
> (3) some information about the FTP server that comes to light during the
>     connection (i.e., does it identify itself as Unix?  DOS?  something
>     else?  nothing recognizable at all?)
> In the absense of an explicit directive of form (2), it makes sense to let
> the transfer mode be determined by suffix (file extension) rules.  File

This may be the crux of where we disagree. I would say that it makes
sense to retrieve a file exactly as it is on the server, unless we
have evidence that it needs to be changed in order to allow lynx to
process it. Making changes to a file without a good reason, then
saying that the server is broken because we no longer know how to
properly render the file, just doesn't make sense to me. Your argument
is based on an idea of the ideal "text" file and how it should be
treated. The problem is as you noted above; we don't have any way of
determining which files are "true text files". Thus I would argue
that we shouldn't put too much emphasis on our guesses as to which
files are "true text", but rather look at the processing of the files,
seeing where it is necessary to give special treatment to some files.

> Now, given that the file suffix plays this role - and assuming an FTP
> server that doesn't try to intentionally confuse, but is configured to
> support existing conventions -
>   <> 
>      should mean TEXT.  No additional information about the server (its
>      native file format etc.) is needed.  Since this is a URL for a TEXT
>      document, it should af course be downloaded in ASCII mode.
> If OTOH the FTP server wants to offer files-that-are-really-TEXT but
> for some reason wants to offer them as binary - then it should not
> confuse clients by calling those files .txt.  They should be
>    <>
> instead - or some other suffix that does (by hopefully widely understood
> convention) NOT imply text/plain.

But it is not the server that determines the file name. The name is
chosen by a person, who places the file on the server. Trying to get
humans to follow a convention universally, expecially when not clearly
described in a readily accessible document, is doomed to failure. What
should we do about files with a ".doc" extension? Traditionally this
would have been a text file, until Microsoft decided to make this a
binary extension. Clearly, extensions are only a (flawed) guide to the
nature of the file to which they refer. I still think that we need to
look at our process, requesting files in ASCII or binary according
to how we are going process the data that we receive, rather than on
preconceived notions of "what should be".
> Maybe it's not so good that lynx treats files of completely unknown type
> (no suffix mappings) as text/plain by default.

This is the problem with the file I put on my server called "testfile".
> IMO, (3) should ideally play no role at all.  It just shouldn't be
> necessary.  It's all unreliable information, I don't think there is
> a reliable way to determine that a server is (for example) "Mac, DOS,
> or UNIX" or not.

If the server responds to the SYST command, that should be reliable.
When SYST gives a result of UNKNOWN, then we make some attempt to
guess. See the guessing code in HTFTP.c
> > Note that the current implementation does this ONLY
> > when the files are to be rendered on screen. "lynx -dump" of these
> > same files retrieves them in binary and send the rendered version to
> > standard output. 
> Indeed, that is inconsistent.  For consistency, -dump should probably not
> automatically imply binary mode, but should be consistent with the
> normal rendering logic.

I would choose, instead, to make the rendering logic consistent with
the current -dump logic.

It seems that there is, as yet, no consensus on this. We seem to be
having two discussions which intersect somewhat: one on the nature
of text files and how they should be handled by servers and clients
in general, and one on how lynx should render files as they actually
exist on servers now. Regardless of whether we consider servers broken
or not, there is certainly broken function respecting the interaction
between some servers and the lynx client. I doubt that we can fix
the files on all the servers of the world. Whether we call it error
recovery or proper rendering, we still need to change lynx to be able
to render these files. I just don't see a need to make this another
lynx option, unless the change in the code has adverse effects in some
situations. You pointed out the theoretical case of a new server with
incompatible EOL conventions, which could be avoided by enumerating
the servers where binary default was acceptable. Are there any other
places where the change to binary default for most servers causes a

Doug Kaufman
Internet: address@hidden

; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]