guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Playing with guile (vs python). Generate file for GDP suitable for g


From: Arne Babenhauserheide
Subject: Re: Playing with guile (vs python). Generate file for GDP suitable for gnuplot.
Date: Tue, 31 Jan 2017 10:41:06 +0100

Hi Germán,


If I understand your script correctly, you want to grab all lines with
GDP, sort the values by year and country and output them. Is that right?

As a first warning: the csv module in Python mainly calls into a C-based
implementation (_csv, see csv.__file__), so it will be hard to beat this
in pure Scheme.


But now, let’s begin with the optimization. These are my times:

$ time guile-2.0 extract_gdp.scm
real    0m0.509s
$ time python3 extract_gdp.py
real    0m0.089s

The first step is using Guile 2.1.6 instead of 2.0. That reduces the
runtime by 40% to 0.3s. Source: ftp://alpha.gnu.org/gnu/guile/guile-2.1.6.tar.xz

$ time guile extract_gdp.scm
real    0m0.296s
$ time python3 extract_gdp.py
real    0m0.089s

So there’s a factor of 3.3 between Python and Guile on my machine.


Aside from using a more recent Guile, I do not see obvious
optimizations, however (more exactly: all my tries to speedup the code
only made it slower). Though there might be optimizations I do not
see, because 80% of the remaining time is spent in string-parsing.


One thing where I don’t see how to make it cheaper in pure Scheme is
string->number. That calls directly into libguile/numbers.c which does
much more than what python's int() does (internally it calls
mem2complex). But using a pure-scheme function which does less only
makes it slower:

    (define (string->integer s)
      (define (b10fold x kept)
           (+ (* 10 kept)
              (- (char->integer x) 48)))
      (string-fold b10fold 0 s))

As I said: the above makes the code run slower, not faster. A native C
function for string->integer (which only handles integers) could provide
a speedup for that, but I don’t know whether you want to go that far. See
http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=libguile/numbers.c;hb=475772ea57c97d0fa0f9ed9303db137d9798ddd3#l6439

However every time I thought I had a program optimized as far as
possible, talking with Andy Wingo made it much faster, so there might be
lots I’m missing.

Given that just converting a bytevector read from the file to integers
takes 0.8s, I do not think just using bytevectors will help:
    (bytevector->u8-list bv) ; takes 0.8s for your file

Maybe there are more efficient ways to do this, though.

Best wishes,
Arne



Germán Diago writes:

> Hello everyone,
>
> I did a script that parses some file with the GDP since 1970 for many
> countries.  I filter the file and discard uninteresting fields, later I
> write in a format suitable for gnuplot.
>
> I did this in python and guile.
>
> In python it takes around 1.1 seconds in my raspberry pi.
>
> In Guile it is taking around 11 seconds.
>
> I do not claim they are doing exactly the same: in python I use arrays and
> dictionaries, in guile I am using mainly lists, I would like to know if you
> could give me advice on how to optimize it. I am just training for now.
>
> The scripts in both python and guile are attached and the profile data for
> scheme is below. Just place in the same directory the .csv file and it
> should generate an output file with the data ready for gnuplot :)
>
> %     cumulative   self
> time   seconds     seconds      name
>  26.24      3.45      3.43  %read-line
>  20.51      2.68      2.68  string->number
>  15.54      2.05      2.03  string-delete
>   7.39      7.75      0.97  map
>   5.13      3.96      0.67  transform-data
>   4.07      1.75      0.53  format:format-work
>   3.17      0.41      0.41  string=?
>   2.87      0.37      0.37  string-ref
>   1.81      2.50      0.24  tilde-dispatch
>   1.81      0.24      0.24  number->string
>   1.51      0.34      0.20  is-a-digit
>   1.06      0.28      0.14  anychar-dispatch
>   1.06      0.14      0.14  display
>   1.06      0.14      0.14  string-length
>   1.06      0.14      0.14  char>=?
>   1.06      0.14      0.14  char<=?
>   1.06      0.14      0.14  string-split
>   0.60      0.08      0.08  length
>   0.45      0.49      0.06  format:out-num-padded
>   0.45      0.06      0.06  remove-dots
>   0.30      0.04      0.04  %after-gc-thunk
>   0.30      0.04      0.04  list-tail
>   0.30      0.04      0.04  write-char
>   0.15      3.53      0.02  loop
>   0.15      3.47      0.02  read-line
>   0.15      0.02      0.02  substring
>   0.15      0.02      0.02  list-ref
>   0.15      0.02      0.02  reverse!
>   0.15      0.02      0.02  #<procedure 2360350 at extract_gdp.scm:58:10
> (e)>
>   0.15      0.02      0.02  integer?
>   0.15      0.02      0.02  char=?
>   0.00     13.07      0.00  load-compiled/vm
>   0.00     13.07      0.00  #<procedure 18c6180 at ice-9/top-repl.scm:31:6
> (thunk)>
>   0.00     13.07      0.00  #<procedure 1a92e00 at ice-9/boot-9.scm:4045:3
> ()>
>   0.00     13.07      0.00  call-with-prompt
>   0.00     13.07      0.00  #<procedure 18c6100 at ice-9/top-repl.scm:66:5
> ()>
>   0.00     13.07      0.00  apply-smob/1
>   0.00     13.07      0.00  catch
>   0.00     13.07      0.00  #<procedure 1a919c0 at statprof.scm:655:4 ()>
>   0.00     13.07      0.00  run-repl*
>   0.00     13.07      0.00  save-module-excursion
>   0.00     13.07      0.00  statprof
>   0.00     13.07      0.00  start-repl*
>   0.00     11.22      0.00  #<procedure 1a8a170 ()>
>   0.00      3.53      0.00  call-with-input-file
>   0.00      1.85      0.00  call-with-output-file
>   0.00      1.79      0.00  for-each
>   0.00      1.75      0.00  format
>   0.00      0.14      0.00  get-fields
>   0.00      0.10      0.00  #<procedure 2d398a0 at extract_gdp.scm:48:18
> (year)>
>   0.00      0.06      0.00  #<procedure 2d021c8 at extract_gdp.scm:46:6 (p)>
>   0.00      0.02      0.00  format:out-obj-padded
>   0.00      0.02      0.00  remove
>   0.00      0.02      0.00  call-with-output-string



reply via email to

[Prev in Thread] Current Thread [Next in Thread]