[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gas new (2.20) rebuffer_line() excessive read() syscalls
From: |
Jim Tison |
Subject: |
gas new (2.20) rebuffer_line() excessive read() syscalls |
Date: |
Tue, 19 Mar 2013 06:28:15 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux i686 on x86_64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 |
Hi,
I can't get a web browser connection to sourceware.org, so I can't open
a PR on this (I will as soon as I can unless you advise me not to) but I
wanted to warn you guys this is coming.
I've got a C language source code TU that is 1.3 MiB large. I didn't
write it (I know better), but I have to make it coexist in a rather
lengthy system build, and the leader of the system build team is
justifably upset with me. Using binutils-2.19, I see this in an strace
-c summary:
address@hidden ~ 13:24:40$ strace -c as -o offldr.o2.o
-alshd=offldr.o2.lst offldr.O2.s
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
51.75 0.033194 25 1333 write
19.96 0.012801 10 1337 read
15.86 0.010175 23 438 134 open
3.16 0.002026 7 300 munmap
1.78 0.001144 9 129 127 stat
1.74 0.001114 3 322 mmap
1.66 0.001067 4 304 close
1.41 0.000902 2 388 brk
1.34 0.000858 3 303 fstat
0.85 0.000546 2 312 lseek
0.27 0.000170 170 1 execve
0.17 0.000108 27 4 fadvise64
0.05 0.000034 6 6 mprotect
0.01 0.000005 5 1 1 access
0.00 0.000001 1 1 getrusage
0.00 0.000001 1 1 uname
------ ----------- ----------- --------- --------- ----------------
100.00 0.064146 5180 262 total
All is still well with as-2.19. But now I run the 'upgrade' (as-2.21),
and I see this:
address@hidden ~ 16:46:36$ time as-2.21 -o offldr.o2.o
-alshd=offldr.o2.lst offldr.o2.s
real 4m18.258s
user 4m12.802s
sys 0m3.251s
Note the total assembly time; see how the clear majority of time is
spent in userspace.For now, I've fallen back to as-2.19 (the assembler
only, I brought along a copy of libbfd and libopcodes that match) just
to get past this behavior.
When I build 2.20[1] without trying to fix it, I see this for the same
C-generated code at -O2:
address@hidden ~ 13:27:23$ strace -c as-2.20 -o offldr.o2.o
-alshd=offldr.o2.lst offldr.o2.s
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
87.96 3.508911 3 1020658 read
3.58 0.142899 11 13246 munmap
3.17 0.126437 9 13386 136 open
2.04 0.081426 51 1599 write
1.35 0.053725 4 13250 close
0.92 0.036537 3 13272 mmap
0.53 0.021277 1 26204 lseek
0.34 0.013416 1 13250 fstat
0.06 0.002233 447 5 fadvise64
0.02 0.000840 2 387 brk
0.02 0.000824 6 129 126 stat
0.01 0.000589 589 1 execve
0.01 0.000217 217 1 unlink
0.00 0.000051 7 7 mprotect
0.00 0.000008 8 1 1 access
0.00 0.000002 1 2 fcntl
0.00 0.000002 2 1 lstat
0.00 0.000001 1 1 getrusage
------ ----------- ----------- --------- --------- ----------------
100.00 3.989395 1115400 263 total
As we see, the compilation time grows by a factor of 15X. I see the
rebuffer_line() (listing.c:541) routine doing an ftell(FILE *, last_pos,
SEEK_SET) which sets the file pointer close to and before the line it
seeks, but then it issues an ftell(FILE*) so we "remember" where we are
(why?), then we reset all the way back to zero with fseek(FILE *, 0,
SEEK_SET) and only now it starts reading forward up to the line number
it seeks. In effect, the C source file is being read over and over again
(unnecessarily?) all the way from offset 0 every time it executes that
fseek/ftell/fseek code. This explains the dramatic increase in read
syscalls, almost a million. The behavior is consistent from version 2.20
through 2.23 for this particular TU.
It looks like most of listing.c was rewritten for 2.20. I tried removing
the fseek(FILE *, 0, SEEK_SET) call and made a couple of refactoring
changes to make sure I'm seeking the proper source line. The syscall
increase reduces dramatically; but I'm not sure if this change works for
all assemblies or not. I have attached the patch anyway in hopes that it
will be useful. I'm changing a brand new code routine and I'm not sure
that I'm not breaking something -- this particular use case came out
okay vis-a-vis the listing, at least that I could tell, but I have no
guarantees that I've generated correct relocatable object. Application
of the attached patch turns in the following strace numbers:
address@hidden ~ 14:24:52$ strace -c as-2.20-1 -o offldr.o2.o
-alshd=offldr.o2.lst offldr.o2.s
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
36.98 0.190710 5 39720 read
15.66 0.080763 6 13246 munmap
13.93 0.071843 5 13450 201 open
7.65 0.039442 3 13267 mmap
7.51 0.038721 24 1600 write
6.48 0.033411 2 19731 lseek
6.41 0.033073 2 13249 close
4.72 0.024317 2 13249 fstat
0.23 0.001172 6 192 189 stat
0.22 0.001147 3 388 brk
0.14 0.000737 737 1 unlink
0.04 0.000230 230 1 execve
0.01 0.000046 12 4 fadvise64
0.01 0.000033 6 6 mprotect
0.00 0.000005 5 1 1 access
0.00 0.000005 3 2 fcntl
0.00 0.000002 2 1 getrusage
0.00 0.000002 2 1 lstat
------ ----------- ----------- --------- --------- ----------------
100.00 0.515659 128109 391 total
There is still an increase in read syscalls, but its nowhere as radical
as before; with an attendant reduction in userspace residence time. I've
sent you my very simplistic patch only as a suggestion, I'm sure there's
more work to be done to take care of corner cases, & c. I confess to not
knowing what I'm doing inside this code and I need you guys to determine
what the correct fix is. This particular patch gives me a total assembly
time of:
real 0m3.244s
user 0m2.400s
sys 0m0.363s
I would really love to attach this test case I have, but it's IBM
licensed (i.e. proprietary) code so I can't do that and still keep my
head on the end of my neck. Might I suggest you find a monolithic C
program someplace (there have got to be some you know of) to the order
of 1 MiB+ and use that? This behavior should show itself with a
monolithic program of any size.
I would have opened a PR on this, but I can't seem to get to
sourceware.org today. I'll keep checking until I can get to bugzilla.
Thanks,
--Jim--
work: address@hidden (203) 885-7335
home: address@hidden (860) 946-8088
[1] We actually upgraded to binutils-2.21 with SLES 11 SP2. But I
suspected the change happened in 2.20 thanks to the ChangeLog showing a
lot of activity around its last change date, and it turned out to be
true. This is probably a good candidate for backporting, provided I
understand the problem correctly (of which there are no guarantees).
listing.c-2.20.patch
Description: Text document
- gas new (2.20) rebuffer_line() excessive read() syscalls,
Jim Tison <=