[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: split: Allow splitting by line count (instead of byte size)
From: |
William Bader |
Subject: |
Re: split: Allow splitting by line count (instead of byte size) |
Date: |
Tue, 12 Jan 2021 03:17:30 +0000 |
I think that split has to be able to read from pipes, so if it reads to the end
to find the number of lines, it can't back up to do the split.
If you don't care about the order of the lines, you could use "split
--number=r/2"
________________________________
From: coreutils <coreutils-bounces+williambader=hotmail.com@gnu.org> on behalf
of John <john+gnu_org@daaave.org>
Sent: Monday, January 11, 2021 7:53 PM
To: GNU Coreutils <coreutils@gnu.org>
Subject: split: Allow splitting by line count (instead of byte size)
I would like to be able to split a file by line count, instead of by
(partial) file size. For contrast, I had a 50M file with one record per line,
to be processed by a script that's making one API call per line in the file. I
used split to break the file up into two files, and wound up with two 25M files
with vastly different line counts (one had about 6K and the other hand about
11K).
Now, this wasn't split's "fault"; it operated exactly as designed. The cause
of the unexpected result was that the lines in different parts of the original
file were of vastly different lengths.
What I would like is to be able to split a file such that the resulting
chunks have an even number of lines, regardless of how many bytes each line
contains. I checked the documentation and the coreutils/textutils list
archives, but this doesn't seem to a) be a current feature, or b) have been
brought up before. I also checked the rejected features just in case, but I
didn't see it there either (whew!).
I know I could just do the math myself (and that's what I did, for the next
iteration of the above job that I had to process) and pass the pre-divided line
count to split with the "-l" option, but...you could make that same argument
with byte counts and split handles that, so I figure it's worth asking. :)
- split: Allow splitting by line count (instead of byte size), John, 2021/01/11
- Re: split: Allow splitting by line count (instead of byte size),
William Bader <=
- Re: split: Allow splitting by line count (instead of byte size), Pádraig Brady, 2021/01/12
- Re: split: Allow splitting by line count (instead of byte size), John, 2021/01/12
- Re: split: Allow splitting by line count (instead of byte size), William Bader, 2021/01/12
- Re: split: Allow splitting by line count (instead of byte size), Pádraig Brady, 2021/01/12
- Re: split: Allow splitting by line count (instead of byte size), Bernhard Voelker, 2021/01/12
- Re: split: Allow splitting by line count (instead of byte size), John, 2021/01/13
- Re: split: Allow splitting by line count (instead of byte size), Pádraig Brady, 2021/01/12
Message not available