[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
csplit corrupts files with long lines
From: |
Tristan Miller |
Subject: |
csplit corrupts files with long lines |
Date: |
Sat, 10 Sep 2005 00:07:54 +0200 |
User-agent: |
KMail/1.8.2 |
Greetings.
I'm reporting a bug with csplit coreutils 5.2.1, compiled from sources on a
SuSE 9.3 system. It seems this bug was previously reported over a year
ago (see
<http://lists.gnu.org/archive/html/bug-coreutils/2004-08/msg00112.html>)
but it was never squashed.
In short, csplit produces corrupt output when the input file contains very
long lines. An example file is at
<http://www.dfki.uni-kl.de/~miller/tmp/wikipedia.xml>, an XML file
containing three articles from Wikipedia. The second article was
vandalized by a spammer who inserted a ridiculously long line (42280
characters) full of links.
If I try to split this file with
$ csplit wikipedia.xml '/<page>/' '{*}'
then the file with the second article, xx02, is garbled at the beginning of
the long line. See <http://www.dfki.uni-kl.de/~miller/tmp/xx02>.
Regards,
Tristan
--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
pgp5cXRC1Ulsr.pgp
Description: PGP signature
- csplit corrupts files with long lines,
Tristan Miller <=