poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[DISCUSSION] FILE* slowness


From: apache2
Subject: [DISCUSSION] FILE* slowness
Date: Mon, 11 Jul 2022 17:16:32 +0200
User-agent: Mutt/1.9.3 (2018-01-21)

I looked into the FILE* operations a bit more, and realized that my fseeko() 
patch
is unsound in the face of interleaved fread() and fwrite(), since the FILE* 
functions
require you to fseek() when alternating between fread() and fwrite().

In practice this is fine for us, but only because we don't use fputc()/fgetc().
In theory we are violating the contract by not doing this.

POSIX shenanigans aside, and what's worse, our I/O is still mortally slow:

Here are some cases that we run into, and the system calls we end up issuing:
1) We are at the offset we want to read from. The FILE* buffer contains all the 
data we want: 1 syscall (fcntl from ftello).
2) We are at the offset we want to read from. The FILE* buffer does not contain 
all the data we want:
   fcntl(ftello), and at least one read(fread) to refill the buffer.
3) We are not at the offset we want to read from:
   3.1: fcntl(ftello) to learn where we are and that it isn't where we want to 
be
   3.2: fcntl(fseeko) to get to the offset we want to be at
   3.3: at least one read(fread) to fill the buffer
   This used to be the default case before my ftello patch.
   What's surprising (and slow) is that we also run into this case when the 
FILE* buffer contains all of the data that we do want to read.
   Example:
     3.a) We have buffered offsets [0..1024] of a file.
     3.b) We want to read 10 bytes from offset 1..9 inclusively
     3.c) ftello() tells us we are at offset 0, which is not 1.
     3.d) fseeko() causes another syscall to get us to offset 1. The FILE* 
implementation throws away the buffer because we seeked.
     3.e) We fill the buffer again (reading BUFSIZ ~= 8192 bytes) from the 
file, using read(2).
     3.f) We return the 10 bytes to the user.
     3.h) If we then skip one more byte and ask for 10 byte at offset 11..19, 
we do this whole dance again, rebuffering and all.
  In contrast, it seems like the fgetc/fputc functions are usually smart enough 
to skip ahead in the buffer, but the FILE* api does not
  (portably) expose the information we need to determine if a seek is needed. 
Both glibc and the FreeBSD libc expose a '_offset' field that
  track the last fcntl(fseeko) call (triggered by the user or the buffer 
refilling), but the magic required to determine if *we* need to call
  fseek() or not is not exposed in a portable fashion, and neither is the 
information required to know
  if a given range is contained in the internal buffer.
   
If I messed any of this up, please speak up, but if not, the picture looks kind 
of bleak performance-wise,
 and I don't see a good way to solve this with the FILE* api.

4) I'd like to propose an experimental alternative for the file handling in 
libpoke/ios-dev-file.c, namely ripping out the FILE* code
   and substituting a "stupid" implementation that just uses pread/pwrite to 
read the values we want.
   My hypothesis is that this would be more efficient, since we'd be doing an 
average of one syscall per read instead of the current situation
   where 1 syscall is our ideal best case. We'd still be able to leverage the 
kernel cache of the file contents, kernel readahead etc, and
   we would be doing fewer BUFSIZE copies to get a miniscule amount of data.
4.1) An immediate optimization to try to out would be to *do* the BUFSIZ reads 
into a BUFSIZ buffer to avoid actually calling pread 100 times
     to get 4k worth of memory in chunks. For read/write IODs this would mean 
we also had to duplicate writes into this buffer (or invalidate it at least), 
but I think it's worth it.
4.2) And finally we could experiment with slightly smarter caches (LRU, or 
MRU/ARC as suggested by SAL9000).
     To avoid having to maintain complicated data structures for this I suggest 
having a look at gnulib (which we already depend heavily on);
     they have a bunch of implementations for AVL trees and the like that could 
be useful.

5) Finally, of course we should have benchmarks for all of this. :-)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]