Re: [O] Bug? R: Org babel block execution *drastically* slower than in E

From: John Hendy
Subject: Re: [O] Bug? R: Org babel block execution *drastically* slower than in ESS session directly
Date: Thu, 1 Nov 2012 09:53:51 -0500

On Wed, Oct 31, 2012 at 5:53 PM, Nick Dokos <address@hidden> wrote:
John Hendy <address@hidden> wrote:

> On Wed, Oct 31, 2012 at 3:12 PM, <address@hidden> wrote:
>     John Hendy <address@hidden> writes:
>     > On Wed, Oct 31, 2012 at 11:41 AM,  <span dir="ltr"><mailto:address@hidden></span> wrote:
>     > John Hendy <mailto:address@hidden> writes:
>     >
>     >> I edited the subject to be more concise/clear.I let orgmode chug away
>     >> on reading in some ~10-30mb csv files for nearly 30min.
>     >
>     > [rest deleted]
>     >
>     > You need an ECM.I did my best to provide one, other than the file, which I offered to provide
>     if others requested that I upload it somewhere. Since you have done so, so have I:
>     > - https://docs.google.com/open?id=0BzQupOSnvw08WHdabHh5VVczRGM
>     > Let me know if that doesn&#39;t work. I put it on Google docs and sometimes have issues with
>     the sharing settings...
>     Not an ECM in my book, but ...
> What else would you like? I provided:
> - the config
> - the data
> - how to [attempt to] reproduce
> - the org-mode text

Smaller set of data I'd guess :-) But it does not seem to be the
size of the data that matters.

>     On my 4 year old MacBook:
>     ,----
>     |
>     | #+PROPERTY: session *R*
>     |
>     | #+name: bigcsv
>     | #+begin_src R
>     | bigcsv <- Sys.glob("~/Downloads/*.csv")
>     | #+end_src
>     |
>     | #+RESULTS: bigcsv
>     | : /Users/cberry/Downloads/test-file.csv
>     |
>     | #+name: readbig
>     | #+begin_src R :results output
>     |   system.time(
>     |     tmp <- read.csv(bigcsv)
>     |     )
>     |
>     | #+end_src
>     |
>     | #+RESULTS: readbig
>     | :    user  system elapsed
>     | :   5.679   0.306   6.002
>     |
>     `----
>     About the same as running from ESS.
> Not sure what to say. Looking for ways to troubleshoot or confirm. Since you can't confirm, any
> suggestions on where I should look for my issue? I can't explain it! All I know is that org chugs
> and chugs and the direct execution in ESS session is lightning fast.

A few things to try in no particular order:

This was extremely helpful. Thanks for the suggestions.

Here's my attempt at an ECM, though I'm going to keep using the big file since that's what's actually doing it an I've already uploaded it :)
- Using emacs config here: http://pastebin.com/raw.php?i=iTbRtCE9
- Using this org-mode file: 

#+begin_src org

* headline

#+begin_src R :session r :results silent
# file here: https://docs.google.com/uc?export=download&confirm=no_antivirus&id=0BzQupOSnvw08WHdabHh5VVczRGM
data <- read.csv("path/to/file.csv")

#+end_src org
- Execute block with C-c C-c after downloading and changing path

o run top (or whatever equivalent is available on your OS) and see
  whether the CPU (or one of the CPUs) gets pegged at 100% utilization
  and stays there. If yes, that's an indication of an infinite loop

- quit any other instances of emacs/R
- start `top` in terminal
- execute block
- Use '<' '>' to sort back and forth between cpu and ram

- R is at 80-100% cpu for about 5sec
- Then emacs shifts to fairly constant ~100% cpu usage 
- After about a minute, the minibuffer expands to ~1/3 of the window height and fills with the csv data
- Finished after ~5min total time
- So, R took about 5sec, emacs took another 5min to finish
o run vmstat (or equivalent) and see if any of the counters are out of whack.
  That requires some experience though.

I'll skip for now; no experience with that.
o use elp-instrument-package to instrument org and run the test, getting
  a profile. I'm not sure whether the results will be useful, since you
  are going to interrupt the test when you run out of patience, but it
  cannot hurt and it might tell you something useful.

o run your ECM on a different computer/OS/emacs installation. Being able
  to compare things side by side is often very useful.

o Halve your file and run the test on each half (but that's probably not
  the problem given Chuck's results).

o Reinstall org from scratch - you might have some corruption in one of
  the compiled files that's causing it to go into an infinite loop.

- `cd ~/.elisp`
- `sudo rm -r org.git`
- `git clone http://git://orgmode.org/org-mode.git org.git`
- cd org.git && make clean && make && make doc
- Quit previous emacs instance; reopen
- Remove (require 'org-install) per prompt; restart again
- Repeat `top` experiment

- Didn't even see R flash on the screen this time; emacs just jumped to 100%
- After 1min 10sec, the minibuffer filled with data
- At that point I quit, as I think it will be a repeat of the above
o Turn on debug-on-quit, start your test, wait a bit and then interrupt
  it. Check the backtrace.  Do it again and check whether the backtrace
  looks the same. That's often an indication of an infinite loop
  (inferring an infinite loop from a two element sample is statistically
  suspect of course, but surprisingly effective nevertheless). The point
  here is that the infinite loop is in emacs and the backtrace tells you
  something about the parties involved.

- =M-x customize-variable RET debug-on-quit RET=
- Toggled to on; saved for current session
- Waited about a min (till the minibuffer filled), then did C-g

Don't have experience with debugging. It brings me to a *Backtrace* buffer, which is empty except for the line "Debugger entered--Lisp error: (quit)"
Thanks for the suggestions and help. That was quite above and beyond. Much appreciated.

Best regards,

These are obviously not independent and the results of one experiment will
have to guide you in what you try next.

Good luck,

