Re: [bug-gawk] Problem with printing 5000 lines to a coprocess

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Problem with printing 5000 lines to a coprocess

From:	Hermann Peifer
Subject:	Re: [bug-gawk] Problem with printing 5000 lines to a coprocess
Date:	Mon, 22 Dec 2014 12:24:29 -0200
User-agent:	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:24.0) Gecko/20100101 Thunderbird/24.6.0


Thanks again for the explanations.

On 2014-12-21 12:50, Andrew J. Schorr wrote:


Hmmm.  The "pty" trick is just a way to solve the flushing problem.

Which is exactly the one I wanted to solve, as the flushing problem mademy initial code (send 1 line, then read 1 result) either hang orterribly slow (after forcing the coprocess to flush its output via close().

Another "trick" would be to use a 2-way pipe and request line-bufferinglike this, as I learned from [0]:

command = "stdbuf -o L " subprogram

I am adding a brief description of my "odyssey" below. Maybe it is ofuse for someone else running into a similar issue.


Hermann

[0]
Unix buffering delays output to stdout, ruins your day
http://www.turnkeylinux.org/blog/unix-buffering

The idea: send 1 line of data via a 2-way pipe to a separate programmefor processing, then read the resulting line. In my case, the programmeis the geod utility from PROJ.4 library, which expects lat1 lon1 lat2lon2 as input and returns azimuths and distance between the 2 points. Myinitial code was:


[1]

command = "geod -I +ellps=WGS84"

for (...) {
  print one_data_line |& command
  command |& getline one_result_line
  ...
}
close(command)

The above code hangs after printing the 1st line, as the coprocess doesnot flush its output. After some trial and error, I changed the code to:


[2]

for (...) {
  print one_data_line |& command
  close(command, "to")

  command |& getline one_result_line
  ...
  close(command, "from")
}

The above works, but is terribly sloooow, for obvious reasons, so Ichanged the code to "send all data first, then read all results":


[3]

for (...) {
  print one_data_line |& command
}
close(command, "to")

while ((command |& getline one_result_line) > 0) {
  ...
}
close(command, "from")

The above code seemed to work fine when sending say: 1000 lines. It didhowever hang after sending some 4000+ lines, due to the "output bufferis full" problem. So I changed to the tempfile option:


[4]

tempfile = ("mydata." PROCINFO["pid"])
command = "geod -I +ellps=WGS84 > " tempfile

# Write the data for processing
while (not done with data)
  print data | command
close(command)

# Read the results, remove tempfile when done
while ((getline one_result_line < tempfile) > 0)
  ...
close(tempfile)
system("rm " tempfile)

The above worked fine and fast as far as I can tell, but the manualtells me that this is not elegant and I should use a two-waycommunication with a coprocess instead. So I went back to where I camefrom and fixed the "output buffer is not flushed" problem like this:


[5a]

command = "geod -I +ellps=WGS84"
PROCINFO[command, "pty"] = 1

for (...) {
  print one_data_line |& command
  command |& getline one_result_line
  ...
}
close(command)


[5b]

command = "stdbuf -o L geod -I +ellps=WGS84"

for (...) {
  print one_data_line |& command
  command |& getline one_result_line
  ...
}
close(command)

As mentioned in the manual: Option 5a is somewhat slower than 5b, around20% in my code.

[Prev in Thread]

Current Thread

[Next in Thread]

[bug-gawk] Problem with printing 5000 lines to a coprocess, Hermann Peifer, 2014/12/20
- Re: [bug-gawk] Problem with printing 5000 lines to a coprocess, Andrew J. Schorr, 2014/12/20
  - Re: [bug-gawk] Problem with printing 5000 lines to a coprocess, Hermann Peifer, 2014/12/20
    - Re: [bug-gawk] Problem with printing 5000 lines to a coprocess, Andrew J. Schorr, 2014/12/20
    - Re: [bug-gawk] Problem with printing 5000 lines to a coprocess, Hermann Peifer, 2014/12/20
    - Re: [bug-gawk] Problem with printing 5000 lines to a coprocess, Andrew J. Schorr, 2014/12/21
    - Re: [bug-gawk] Problem with printing 5000 lines to a coprocess, Hermann Peifer <=
    - Re: [bug-gawk] Problem with printing 5000 lines to a coprocess, Andrew J. Schorr, 2014/12/22
    - Re: [bug-gawk] Problem with printing 5000 lines to a coprocess, Aharon Robbins, 2014/12/24
    - Re: [bug-gawk] Problem with printing 5000 lines to a coprocess, Andrew J. Schorr, 2014/12/24

Prev by Date: [bug-gawk] [BugReport] GAWK 4.1.1 memory leak
Next by Date: Re: [bug-gawk] Problem with printing 5000 lines to a coprocess
Previous by thread: Re: [bug-gawk] Problem with printing 5000 lines to a coprocess
Next by thread: Re: [bug-gawk] Problem with printing 5000 lines to a coprocess
Index(es):
- Date
- Thread