bug-binutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gas new (2.20) rebuffer_line() excessive read() syscalls


From: Jim Tison
Subject: gas new (2.20) rebuffer_line() excessive read() syscalls
Date: Tue, 19 Mar 2013 06:28:15 -0400
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2

Hi,

I can't get a web browser connection to sourceware.org, so I can't open a PR on this (I will as soon as I can unless you advise me not to) but I wanted to warn you guys this is coming.

I've got a C language source code TU that is 1.3 MiB large. I didn't write it (I know better), but I have to make it coexist in a rather lengthy system build, and the leader of the system build team is justifably upset with me. Using binutils-2.19, I see this in an strace -c summary:

address@hidden ~ 13:24:40$ strace -c as -o offldr.o2.o -alshd=offldr.o2.lst offldr.O2.s
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 51.75    0.033194          25      1333           write
 19.96    0.012801          10      1337           read
 15.86    0.010175          23       438       134 open
  3.16    0.002026           7       300           munmap
  1.78    0.001144           9       129       127 stat
  1.74    0.001114           3       322           mmap
  1.66    0.001067           4       304           close
  1.41    0.000902           2       388           brk
  1.34    0.000858           3       303           fstat
  0.85    0.000546           2       312           lseek
  0.27    0.000170         170         1           execve
  0.17    0.000108          27         4 fadvise64
  0.05    0.000034           6         6           mprotect
  0.01    0.000005           5         1         1 access
  0.00    0.000001           1         1 getrusage
  0.00    0.000001           1         1           uname
------ ----------- ----------- --------- --------- ----------------
100.00    0.064146                  5180       262 total

All is still well with as-2.19. But now I run the 'upgrade' (as-2.21), and I see this: address@hidden ~ 16:46:36$ time as-2.21 -o offldr.o2.o -alshd=offldr.o2.lst offldr.o2.s

real    4m18.258s
user    4m12.802s
sys     0m3.251s

Note the total assembly time; see how the clear majority of time is spent in userspace.For now, I've fallen back to as-2.19 (the assembler only, I brought along a copy of libbfd and libopcodes that match) just to get past this behavior.

When I build 2.20[1] without trying to fix it, I see this for the same C-generated code at -O2:

address@hidden ~ 13:27:23$ strace -c as-2.20 -o offldr.o2.o -alshd=offldr.o2.lst offldr.o2.s
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 87.96    3.508911           3   1020658           read
  3.58    0.142899          11     13246           munmap
  3.17    0.126437           9     13386       136 open
  2.04    0.081426          51      1599           write
  1.35    0.053725           4     13250           close
  0.92    0.036537           3     13272           mmap
  0.53    0.021277           1     26204           lseek
  0.34    0.013416           1     13250           fstat
  0.06    0.002233         447         5 fadvise64
  0.02    0.000840           2       387           brk
  0.02    0.000824           6       129       126 stat
  0.01    0.000589         589         1           execve
  0.01    0.000217         217         1           unlink
  0.00    0.000051           7         7           mprotect
  0.00    0.000008           8         1         1 access
  0.00    0.000002           1         2           fcntl
  0.00    0.000002           2         1           lstat
  0.00    0.000001           1         1 getrusage
------ ----------- ----------- --------- --------- ----------------
100.00    3.989395               1115400       263 total

As we see, the compilation time grows by a factor of 15X. I see the rebuffer_line() (listing.c:541) routine doing an ftell(FILE *, last_pos, SEEK_SET) which sets the file pointer close to and before the line it seeks, but then it issues an ftell(FILE*) so we "remember" where we are (why?), then we reset all the way back to zero with fseek(FILE *, 0, SEEK_SET) and only now it starts reading forward up to the line number it seeks. In effect, the C source file is being read over and over again (unnecessarily?) all the way from offset 0 every time it executes that fseek/ftell/fseek code. This explains the dramatic increase in read syscalls, almost a million. The behavior is consistent from version 2.20 through 2.23 for this particular TU.

It looks like most of listing.c was rewritten for 2.20. I tried removing the fseek(FILE *, 0, SEEK_SET) call and made a couple of refactoring changes to make sure I'm seeking the proper source line. The syscall increase reduces dramatically; but I'm not sure if this change works for all assemblies or not. I have attached the patch anyway in hopes that it will be useful. I'm changing a brand new code routine and I'm not sure that I'm not breaking something -- this particular use case came out okay vis-a-vis the listing, at least that I could tell, but I have no guarantees that I've generated correct relocatable object. Application of the attached patch turns in the following strace numbers:

address@hidden ~ 14:24:52$ strace -c as-2.20-1 -o offldr.o2.o -alshd=offldr.o2.lst offldr.o2.s
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 36.98    0.190710           5     39720           read
 15.66    0.080763           6     13246           munmap
 13.93    0.071843           5     13450       201 open
  7.65    0.039442           3     13267           mmap
  7.51    0.038721          24      1600           write
  6.48    0.033411           2     19731           lseek
  6.41    0.033073           2     13249           close
  4.72    0.024317           2     13249           fstat
  0.23    0.001172           6       192       189 stat
  0.22    0.001147           3       388           brk
  0.14    0.000737         737         1           unlink
  0.04    0.000230         230         1           execve
  0.01    0.000046          12         4           fadvise64
  0.01    0.000033           6         6           mprotect
  0.00    0.000005           5         1         1 access
  0.00    0.000005           3         2           fcntl
  0.00    0.000002           2         1           getrusage
  0.00    0.000002           2         1           lstat
------ ----------- ----------- --------- --------- ----------------
100.00    0.515659                128109       391 total

There is still an increase in read syscalls, but its nowhere as radical as before; with an attendant reduction in userspace residence time. I've sent you my very simplistic patch only as a suggestion, I'm sure there's more work to be done to take care of corner cases, & c. I confess to not knowing what I'm doing inside this code and I need you guys to determine what the correct fix is. This particular patch gives me a total assembly time of:

real    0m3.244s
user    0m2.400s
sys     0m0.363s

I would really love to attach this test case I have, but it's IBM licensed (i.e. proprietary) code so I can't do that and still keep my head on the end of my neck. Might I suggest you find a monolithic C program someplace (there have got to be some you know of) to the order of 1 MiB+ and use that? This behavior should show itself with a monolithic program of any size.

I would have opened a PR on this, but I can't seem to get to sourceware.org today. I'll keep checking until I can get to bugzilla.

Thanks,
--Jim--
work: address@hidden  (203) 885-7335
home: address@hidden (860) 946-8088

[1] We actually upgraded to binutils-2.21 with SLES 11 SP2. But I suspected the change happened in 2.20 thanks to the ChangeLog showing a lot of activity around its last change date, and it turned out to be true. This is probably a good candidate for backporting, provided I understand the problem correctly (of which there are no guarantees).

Attachment: listing.c-2.20.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]