[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How tail works on a large file?
From: |
Eric Blake |
Subject: |
Re: How tail works on a large file? |
Date: |
Sat, 22 Aug 2020 07:37:11 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 |
On 8/22/20 5:45 AM, Peng Yu wrote:
Hi,
I tried to tail a large file (2.8GB) to get is last 10 lines. It runs very fast.
How is this achieved? Does tail do it differently between a file
(random disk access) and a pipe (sequential disk access)? Thanks.
Yes. Using 'strace' (on Linux) or a comparable program (on other
platforms) will let you see the syscalls that tail performs; tail
attempts lseek(fd,-bufsize,SEEK_END) in order to read just a buffer of
information first; that works on random-access files (tail only has to
search through one or more buffers until it finds the last few lines)
but fails on pipes (which aren't seekable, so tail has to read the
entire file, and buffer things in memory, although the buffer only has
to be as large as the number of lines it is looking for).
You could also read the source code for yourself. In fact, look at all
the spots where 'presume_input_pipe' is used, to distinguish between
seekable files where lseek works, and pipes where it doesn't (there is
even an undocumented 'tail ---presume-input-pipe' option that lets you
force-disable the lseek optimization, to get the speed penalty of a
non-seekable file even when testing on seekable input).
https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c#n216
On a higher level, it has appeared over the years that you have a
tendency to ask questions to make others do the research for you,
instead of diving into the code yourself. You would do well to remember
that such behavior tends to be viewed as anti-social.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org