coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] wc: Add AVX2 optimization when counting only lines


From: Rasmus Borup Hansen
Subject: Re: [PATCH] wc: Add AVX2 optimization when counting only lines
Date: Mon, 29 Mar 2021 09:37:52 +0200

> On 28 Mar 2021, at 19.29, Kristoffer Brånemyr via GNU coreutils General 
> Discussion <coreutils@gnu.org> wrote:
> 
> Maybe this is a pointless optimization, I guess not many people run wc -l on 
> gigabytes of data, but maybe it could be useful for someone...

This happens in bioinformatics, e.g. if you want to count the number of 'reads' 
in a FASTQ file you count the number of lines and divide by 4. A FASTQ file is 
typically several gigabytes (sometimes even terabytes) of text with a block of 
4 lines for each 'read' (identifier, DNA string, a line with a '+', and a 
quality string). A 'read' is part of the output from a DNA sequencing machine.

I'm not the one to look at your patch, but the use case is there. However, 
FASTQ files are traditionally compressed with gzip, so maybe the bottleneck 
will be elsewhere.

Best,

Rasmus

reply via email to

[Prev in Thread] Current Thread [Next in Thread]