[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
lynx-dev megapatch to dev.10 available
From: |
Klaus Weide |
Subject: |
lynx-dev megapatch to dev.10 available |
Date: |
Wed, 13 Oct 1999 12:31:11 -0500 (CDT) |
It's in <http://enteract.com/~kweide/lynx/>.
I just saw there is dev.10 now. Tom, can you merge?
After getting everything (I hope) up to dev.10, I don't feel like
going through everything again...
Klaus
* Changes to UCAuto.c (already sent? - this is updated to dev.10)
* HTFTP.c: (already sent w/o description - this is updated to dev.10)
- interrupted_in_next_data_char was not being reset. That could
make all subsequent FTP directory listings fail (by showing
an empty directory) after receipt of one directory listing had
been interrupted.
- Be nice, send quit before closing at least in the normal (non-
interrupted and successful) case. Some servers (wu-ftpd at
least) otherwise complain with "You could at least say goodbye"
which in turn causes unnecessary RST packets. To minimize
round-trip delays, the QUIT is sent before we start reading
the returned data (but after the initial response to our
retrieval command).
- Always close data connection immediately after we are done reading
from it, also for directory requests. This was already the case
for file requests. Some servers (including recent wu-ftpd beta)
wait for indication that we closed before proceding.
- Keep better track of closed sockets. Some more trace messages.
Some comments corrected.
* Tabular representation for simple tables. See included file
README.TRST.
[ It works quite nicely for me. I don't understand exactly how...
Internal code is still a mess. Interface to rest-of-lynx is ok though.
Mess is happily contained in separate file.
There is no flag or option to turn it off (everyone wants this, right? :),
and it does work nicely enough that i didn't find YA option necessary).
It would be easy to disable though, all it takes is removing the one
call of HText_startStblTABLE() in HTML.c.
For systems not using ./configure and makefile.in, you have to manually
add "TRSTable.o" (or whatever the equivalent name is for you) to your
system's equivalent of a makefile. Otherwise lynx won't compile. ]
* Made User-Agent warning more friendly, and more specific. Tell
the user what lynx expects in order to avoid the warning. On
the other hand, issue an equivalent warning when -useragent is
used. Change documentation accordingly.
[ The UA_COPYRIGHT_WARNING text was IMNSHO nonsense and has bugged me
for a long time. It's not Lynx's job to convey some company's bogus
claims to users. OTOH there are good reasons to encourage users to
identify lynx correctly. Just don't give them bogus reasons. For
consistency there needs ot be a warning for -useragent if there is one
for an 'O'ptions Screen change. ]
* Don't send User-Agent header at all if it somehow would be blank.
* Indicate on forms 'O'ptions Screen which options are not saved
to .lynxrc.
* Disable the form fields in the 'O'ptions Screen if the screen is
generated when FORMS_OPTIONS code is compiled in but not actually
active.
[ The following invocation causes that situation:
lynx -forms_options=off LYNXOPTIONS:
It might actually be useful for initial reviewing of settings. ]
* LYPrint.c: In subject_translate8bit (see OUTGOING_MAIL_CHARSET
option), use higher level function to charset-translate mail Subject
line, rather than low-level UCTransCharStr.
[ It actually works better now, including for UTF-8.
OUTGOING_MAIL_CHARSET is awfully misnamed for what it does. ]
* UPPER8, UCForce8bitTOUPPER: was severely broken for UTF-8 display,
making WHEREIS search for strings containing non-ASCII characters
impossible (and probably with other bad effects). Now case mapping
may still be wrong, but at least identical strings compare as equal.
* LYHistory.c: Entification for saved statusline messages happened
twice by mistake.
* HTFWriter.c: Made code for automatic decompression of bzip2 files
conditional on BZIP2_PATH. Such files should be treated as normal
binary files on systems without bzip2. The configure seems to
always define BZIP2_PATH, but it could be undefined manually.
* HTFWriter.c: Use LYRemoveTemp instead of remove in some cases,
to avoid keeping those files in the temp file list after they are
long gone.
* HTTCP.c: Check whether port numbers in URLs are really numbers.
* HTPlain.c:
- Deal with backspace formatting as used in formatted man pages.
(No highlighting, only avoid double output of characters)
- Pass on 0xAD (soft hyphen) character in more cases.
* HTNews.c: Prevent some ways that could trick lynx into treating
a regular "news:" or "nntp:" URL as something else, like snewspost.
Extra check in LYNews.c whether posing is allowed. Return with
an error message in some cases of URLs that are too long, instead
of silently truncating. Make HEAD work again on news articles.
Some memory leaks in error path removed. A message tweak.
* HTFormat.c: HTStreamStack: avoid some unnecessary work, add a trace
message to show what is returned.
* SGML.c: some cleanup of ugly ifdefs, and of unnecessary abuse of
global variables. (still a lot left!)
* More consistent and correct recognition of element names. The
characters "_-.:" don't end the name.
* Handle INCLUDE and CDDATA marked sections: output the contents.
* SGML.c etc.:
* Parse various elements differently that had/have special requirements
or hacks. Extend meaning of Tgf_strict for litteral-like content
modes. Use SGML_CDATA in some cases (and treat it similar to
SGML_LITTERAL), use SGML_PCDATA for litteral-like parsing (but if
modified by Tgf_strict it's more like regular SGML_MIXED). A '<'
that would start a tag gets displayed (since not element content is
allowed that's just error recovery). Comments now work in TEXTAREA
instead of getting displayed as text (SortaSGML mode only).
[ Try comments within TEXTAREA, try also various invalid constructs
(including anchors) within TITLE, with or without SortaSGML, old
vs. new code. ]
* Minor tweak of sorta SGML handling for invalid end tag if start tag
could be validly omitted.
* More consistent and correct recognition of element names. The
characters "_-.:" don't end tag names.
[ This alone should prevent <http://www.quantum.com/> showing as empty.
Changes for OBJECT handling (see below) should prevent the same effect
for most pages that really _do_ have unclosed OBJECT tags, not just
something that is misinterpreted by lynx as that. ]
* Improved handling of '/' after element name in a tag:
"<foo/>" is treated as an empty element (as in XML). If we recognize
"foo" as an empty element, do nothing special; and if we recognize "foo"
as a non-empty element; convert to "<foo></foo>".
"<foo/bar/" is treated as a shortref construct, by converting to
"<foo>bar</foo>" (for non-empty and recognized "foo").
This is not general as it would have to be for or real SGML parser,
in particular '/' is only treated this way if it directly follows the
element name, and it may not even be quite right. It is better than
the recovery lynx previously did in these cases though.
[ Try <http://www.nyct.net/~aray/junk/hide.html> as a test, you should
see "Hello World!". Also look at prettysrc rendering. ]
* Changed handling of include buffer which is used to pass back data
from HTML.c to SGML.c. Passing data upstream now works without strange
reordering effects even when SGML_character was already parsing data from
a previous include buffer.
Character set translation would happen several times on data passed back
to SGML_character in the include buffer for re-parsing. This is now
avoided. Well at least in most cases, and for characters that *can* be
translated, there are likely combinations of input and output character
sets where the assumptions made are still wrong.
* The start_element and end_element methods of structured stream class now
return a status code. Currently only used for the OBJECT stuff below.
* mostly HTML.c, SGML.c: Changed handling of OBJECT and MAP.
- Avoid using the include buffer mechanism as much as possible.
This involves introducing some new special handling in SGML.c
to change parsing mode for element contents, and a way for
HTML_{start,end}_element to signal to SGML_character what it
should do. In most cases when the OBJECT element content should
be parsed and displayed, SGML_character now only needs one pass
through the data.
- Don't lose content when several OBJECTs are nested.
- In HTML 4, an OBJECT with USEMAP attribute can refer to a MAP
within the OBJECT's content, possibly within nested inner OBJECTs.
Lynx would fail to find the MAP in that case, now it doesn't.
- In HTML 4, MAP can contain arbitray block elements in addition to
AREA. Lynx now shows such block content, even if it occurs
within (possibly nested) OBJECTs with USEMAP whose contents we
would otherwise skip. Sometimes we may show too much now, by
generating a LYNXIMGMAP link as well as showing block content
or by showing more of the OBJECT content than what is within a
MAP, but that is preferable to losing data.
- Treat an A tag with COORDS attribute as equivalent to an AREA
when it is within MAP, for the purpose of collecting links for
LYNXIMGMAP.
- As a fallback, internally redirect a LYNXIMGMAP request to
the position of the MAP element in the normally rendered text
of the document containing the MAP, if it is known that the MAP
element exists and just doesn't contain any AREA (or equivalent
A-with-COORDS) links. It is assumed that in such a case there
is some block content within the MAP that is rendered normally.
[ Try the example fragments in the HTML 4.0 (or 4.01) text from W3C,
especially those with OBJECT and USEMAP. Also adding some more
content within OBJECT (maybe before, within, or after a nested
OBJECT and see what happens. ]
* HTFile.c: new function LYGetFileInfo.
* HTAnchor.c: new function HTAnchor_findSimpleAddress.
* New function HTStartAnchor5.
* Modified the way link text is (re-)drawn by function highlight.
The bulk of processing now happens in new function LYMoveToLink.
The data of the containing line is now scanned from the beginning,
using the same logic as in display_line to make sure that lynx
and the display library have the same idea of where in the line
the link starts. In UTF-8 output mode, parts of the line preceding
the link are also repainted if this is necessary. Refreshing of
the physical line is forced if necessary in UTF-8 mode. For anchors
split across lines, the new approach is currently only used for the
first line.
This change is not in effect for lynx with color style. In that
that case highlighting already is sometimes done in a similar
similar, but not quite the same, separate function.
* Modified WHEREIS target hightlighting for hypertext links.
Now this is done in the same pass as drawing the normal link
text, in LYMoveToLink. This avoids problems in UTF-8 display
mode. It also avoids a lot of complicated and extremely hard
to understand older code in highlight(), but that code is still
there for use by lynx with color style and for other remaining
cases (non-hypertext anchors, second line highlighting).
* Modified WHEREIS target hightlighting for general text.
Instead of first writing each line's characters in display_line,
then scanning again through the line's data for portions to
highlight and repainting those parts after in display_page,
this is now done in one pass within display_line. However,
this isn't (yet?) done for lynx with color style which still
uses the old code.
* These last three changes reduce problems that occur when using
UTF-8 display character set (in an appropriate terminal environment
that understands it, of course). Most of them don't apply with
color style lynx, so it continues to have more UTF-8 problems.
Pages with mostly ASCII characters should be more or less ok.
Problems that otherwise are not visible become apparent in
search higlighting, and after ^Z / fg.
[ As one example, visit <http://www.w3.org/>. (In a correctly set-up
UTF-8 environment with display charset set accordingly - otherwise
all this doesn't apply). Enter '/' (WHEREIS), enter search text "W3C".
Go down to "W3C Services", the first line after "CSS2 Package:" should
have a middle dot and a highlighted "W3C". Do ^Z and fg, see whether
the highlighted string is still in the right place. ]
* GridText.c: More changes to deal with problems caused by using
UTF-8 output with a display library that isn't aware of it.
Break line with UTF-8 before curses does it. This causes lines
that are too short, effectively the rightmost part of a line cannot
be used if there are UTF-8 encoded characters. The alternative,
letting curses wrap the line when it thinks it got too long, is
worse, so do it in lynx code instead.
* Avoid memory overrun for very long lines in UTF-8 mode.
Avoid splitting line in the middle of a mutibyte UTF-8 character.
* Test for SHOW_WHEREIS_TARGETS instead of 'defined(FANCY_CURSES) ||
defined(USE_SLANG)'.
* Initialize new textarea lines created by insert_new_textarea_anchor
with current display character set for value_cs. (The "cloned"
value can be stale in some cases if the user changed the display
character set after the document was last loaded - normally that
should not happen). For a file inserted into a textarea with
INSERTFILE use new function LYGetFileInfo instead to determine
the file content's charset. Thus -assume_local_charset and
conventions based on file suffix should be honored.
[ This is untested. ]
* For Unix, added more specific error message if calling external
editor for textarea failed, based on the status returned by
system().
[ The interpretation of system()'s return code may be not quite
right or portable? ]
* It is possible to require an additional prompt before Enter in
an input field causes form submission: define TEXT_SUBMIT_CONFIRM_WANTED,
explained in userdefs.h.
[ Al Gilman brought this up some time ago, to avoid unexpexted action
for the benefit of blind (and other?) users. I doubt that lynx's
behavior in this respect actually caused a problem in this respect,
and no blind users have spoken up, so it is only a compile-time option
for now. ]
* Some small changes to prevent overstepping string boundary
(HTParse.c,)
* Extended SUFFIX option, added SUFFIX_ORDER option, see documentation
in lynx.cfg. The long list of built-in file suffix rules in HTInit.c
can now be disabled, either at compile time - see userdefs.h - or at
run time. The equivalent functionality is now available in lynx.cfg
for those who want it. Added somments, see HTFileInit in HTInit.c.
[ A lot of those built-in rules were useless and/or outdated for most people,
and not being maintained. They are also start-up overhead that could
not be disabled. ]
* Various tweaks of built-in file suffix rules.
[ But if you _do_ use them, let me know if I broke something... ]
* Allow XLOADIMAGE_COMMAND to be empty (in lynx.cfg) or NULL (in userdefs.h),
just don't use a default X viewer for image types in that case.
* Moved UCGetUniFromUtf8String from LYCharUtils.c to UCAux.c.
* Renamed LYUCFullyTranslateString -> LYUCTranslateHTMLString, and
LYUCFullyTranslateString_1 -> LYUCFullyTranslateString.
* Tweaks for special chars in (what is now) LYUCFullyTranslateString,
in obscure cases (input fields of type password prefilled with
unusual content) lynx would pass text back to the server with special
characters (soft hyphen or non-break space) expressed as lynx-internal
code values.
* Added some replacement characters or strings to various chartrans
tables.
* Experimental command line option -convert_to, only compiled in if
new MISC_EXP symbol is defined. This takes a string in the form
of a MIME type, which can also be combined with an appended ";charset="
parameter. (This needs shell quoting of course). The charset
value can be used to set the display character set from the command
line. The MIME type can be one of the non-official types used
internally, for some interesting effects (crshing lynx not excluded).
Try www/download, www/source, www/dump, or some unrecognized string.
* Fixed HTMainText_Get_UCLYhndl, it was returning the wrong kind of
charset handle (a "UChndl", which is different from a "LYhndl" or
"UCLYhndl" etc. and shouldn't be directly accessed by arbitrary
bits of lynx code - it should be regarded as private to the chartrans
mechanism).
* Protect various printf-like calls against crashes from strings with
'%': LYSyslog, exit_immediately_with_error_message.
* LYDownload.c: made parsing of LYNXDOWNLOAD: URL slightly more robust.
* Disabled some broken pieces.
[ And some minor tweak not mentioned specifically, the list is already long
enough... the end ]
- lynx-dev megapatch to dev.10 available,
Klaus Weide <=
- Re: lynx-dev megapatch to dev.10 available, Vlad Harchev, 1999/10/14
- Re: lynx-dev megapatch to dev.10 - and a rant, Klaus Weide, 1999/10/15
- Re: lynx-dev megapatch to dev.10 - and a rant, Vlad Harchev, 1999/10/15
- lynx-dev "sticky" things, Klaus Weide, 1999/10/16
- Re: lynx-dev "sticky" things, mattack, 1999/10/16