bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Problem with printing 5000 lines to a coprocess


From: Hermann Peifer
Subject: Re: [bug-gawk] Problem with printing 5000 lines to a coprocess
Date: Mon, 22 Dec 2014 12:24:29 -0200
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:24.0) Gecko/20100101 Thunderbird/24.6.0


Thanks again for the explanations.

On 2014-12-21 12:50, Andrew J. Schorr wrote:

Hmmm.  The "pty" trick is just a way to solve the flushing problem.

Which is exactly the one I wanted to solve, as the flushing problem made my initial code (send 1 line, then read 1 result) either hang or terribly slow (after forcing the coprocess to flush its output via close().

Another "trick" would be to use a 2-way pipe and request line-buffering like this, as I learned from [0]:
command = "stdbuf -o L " subprogram

I am adding a brief description of my "odyssey" below. Maybe it is of use for someone else running into a similar issue.

Hermann

[0]
Unix buffering delays output to stdout, ruins your day
http://www.turnkeylinux.org/blog/unix-buffering


The idea: send 1 line of data via a 2-way pipe to a separate programme for processing, then read the resulting line. In my case, the programme is the geod utility from PROJ.4 library, which expects lat1 lon1 lat2 lon2 as input and returns azimuths and distance between the 2 points. My initial code was:

[1]

command = "geod -I +ellps=WGS84"

for (...) {
  print one_data_line |& command
  command |& getline one_result_line
  ...
}
close(command)

The above code hangs after printing the 1st line, as the coprocess does not flush its output. After some trial and error, I changed the code to:

[2]

for (...) {
  print one_data_line |& command
  close(command, "to")

  command |& getline one_result_line
  ...
  close(command, "from")
}

The above works, but is terribly sloooow, for obvious reasons, so I changed the code to "send all data first, then read all results":

[3]

for (...) {
  print one_data_line |& command
}
close(command, "to")

while ((command |& getline one_result_line) > 0) {
  ...
}
close(command, "from")

The above code seemed to work fine when sending say: 1000 lines. It did however hang after sending some 4000+ lines, due to the "output buffer is full" problem. So I changed to the tempfile option:

[4]

tempfile = ("mydata." PROCINFO["pid"])
command = "geod -I +ellps=WGS84 > " tempfile

# Write the data for processing
while (not done with data)
  print data | command
close(command)

# Read the results, remove tempfile when done
while ((getline one_result_line < tempfile) > 0)
  ...
close(tempfile)
system("rm " tempfile)

The above worked fine and fast as far as I can tell, but the manual tells me that this is not elegant and I should use a two-way communication with a coprocess instead. So I went back to where I came from and fixed the "output buffer is not flushed" problem like this:

[5a]

command = "geod -I +ellps=WGS84"
PROCINFO[command, "pty"] = 1

for (...) {
  print one_data_line |& command
  command |& getline one_result_line
  ...
}
close(command)


[5b]

command = "stdbuf -o L geod -I +ellps=WGS84"

for (...) {
  print one_data_line |& command
  command |& getline one_result_line
  ...
}
close(command)

As mentioned in the manual: Option 5a is somewhat slower than 5b, around 20% in my code.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]