classpath-inetlib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Classpath-inetlib] Problems in gnu.inet.util.LineInputStream and gn


From: Robert Mitchell
Subject: Re: [Classpath-inetlib] Problems in gnu.inet.util.LineInputStream and gnu.inet.util.CRLFInputStream
Date: Wed, 06 Apr 2005 14:54:53 -0500

It looks to be working.  The test case I was using involved a very long email message (something over 70,000 bytes and probably over 1000 lines) which I needed using "getInputStream" as part of my processing.  Because of the way getInputStream works, it was reading the entire message and re-parsing the headers to find the body.  The inefficient CRLFInputStream was probably doing about 1000 * 70000 / 2 (35 million) byte copies to just read the first line and close to that for each of the approximately 6 lines in the header.  The result was that it was taking close to a minute to parse the header.  The new code is much more efficient.
 
Thanks,
Bob Mitchell

>>> Chris Burdess <address@hidden> 4/6/2005 2:10 PM >>>
Robert Mitchell wrote:
> One way around the problems with mbox, etc. is to filter them to add
> the CR to the end of line sequence.

That's true, although our recent discussion shows some of the inherent
difficulties and inefficiencies processing multi-character delimiters,
therefore I feel that normalisation to a single delimiter prior to
processing should yield better results.

> All that aside, I do not think it would be worth changing the
> architecture unless the current implementation is considered
> incompatible with the JavaMail specification.  I think this is an area
> where the specification is incomplete, although you might argue that
> the references to the Internet mail RFC's requires CRLF endings for
> javax.mail.internet implementations at a minimum.

That should be the case for e.g. InternetHeaders parsing: if there are
other cases where CRLFs are not normalised, please let us know.

I have submitted a new version of CRLFInputStream now, and tested it
with the special case where the first CR is at the end of the buffer.
This doesn't seem to result in a significant performance change from
the previous version in my tests, if you have different results I'd be
interested in seeing them.
--
Chris Burdess


reply via email to

[Prev in Thread] Current Thread [Next in Thread]