bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

csplit corrupts files with long lines


From: Tristan Miller
Subject: csplit corrupts files with long lines
Date: Sat, 10 Sep 2005 00:07:54 +0200
User-agent: KMail/1.8.2

Greetings.

I'm reporting a bug with csplit coreutils 5.2.1, compiled from sources on a 
SuSE 9.3 system.  It seems this bug was previously reported over a year 
ago (see 
<http://lists.gnu.org/archive/html/bug-coreutils/2004-08/msg00112.html>) 
but it was never squashed.

In short, csplit produces corrupt output when the input file contains very 
long lines.  An example file is at 
<http://www.dfki.uni-kl.de/~miller/tmp/wikipedia.xml>, an XML file 
containing three articles from Wikipedia.  The second article was 
vandalized by a spammer who inserted a ridiculously long line (42280 
characters) full of links.

If I try to split this file with

$ csplit wikipedia.xml '/<page>/' '{*}'

then the file with the second article, xx02, is garbled at the beginning of 
the long line.  See <http://www.dfki.uni-kl.de/~miller/tmp/xx02>.

Regards,
Tristan

-- 
   _
  _V.-o  Tristan Miller [en,(fr,de,ia)]  ><  Space is limited
 / |`-'  -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=  <>  In a haiku, so it's hard
(7_\\    http://www.nothingisreal.com/   ><  To finish what you

Attachment: pgp5cXRC1Ulsr.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]