octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #54622] test importdata fails in dev octave wi


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #54622] test importdata fails in dev octave with windows
Date: Fri, 7 Sep 2018 04:20:07 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0

Follow-up Comment #33, bug #54622 (project octave):

I did a little test here with Linux.  By adding a \CR to the stream, the
problem sort of appears here as well:


octave:30> A = [3.1 Inf NA; -Inf NaN 128];
octave:31> fn  = tempname ();
octave:32> fid = fopen (fn, "w");
octave:33> fputs (fid, "3.1\tInf\tNA\n-Inf\tNaN\t128");
octave:34> fclose (fid);
octave:35> dlmread (fn, '\t', 0, 0, "emptyvalue", NA)
ans =

     3.1000        Inf         NA
       -Inf        NaN   128.0000

octave:36> [a,d,h] = importdata (fn, '\t');
octave:37> unlink (fn);
octave:38> assert (a, A);
octave:39> assert (d, "\t");
octave:40> assert (h, 0);
octave:41> 
octave:41> A = [3.1 Inf NA; -Inf NaN 128];
octave:42> fn  = tempname ();
octave:43> fid = fopen (fn, "w");
octave:44> fputs (fid, "3.1\tInf\tNA\r\n-Inf\tNaN\t128");
octave:45> fclose (fid);
octave:46> dlmread (fn, '\t', 0, 0, "emptyvalue", NA)
ans =

     3.1000        Inf         NA
       -Inf        NaN   128.0000

octave:47> [a,d,h] = importdata (fn, '\t');
octave:48> unlink (fn);
octave:49> assert (a, A);
error: ASSERT errors for:  assert (a,A)

  Location  |  Observed  |  Expected  |  Reason
     .          O(1x1)       E(2x3)      Dimensions don't match
octave:49> assert (d, "\t");
octave:50> assert (h, 0);


But notice how dlmread() does *not* fail, unlike on the Windows version.  In
other words, for Linux, dlmread() is programmed to handle either <CR><LF> or
just <LF> in an ASCII file.  Note that Matlab doc makes no mention of the DOS
scenario, so let's assume that our goal is for both Windows and Linux to
handle either scenario.  Let me fix this here, with the following:


diff --git a/scripts/io/importdata.m b/scripts/io/importdata.m
--- a/scripts/io/importdata.m
+++ b/scripts/io/importdata.m
@@ -257,7 +257,7 @@ function [output, delimiter, header_rows
   endif
   if (any (na_idx(:)))
 
-    file_content = ostrsplit (fileread (fname), "\n");
+    file_content = ostrsplit (fileread (fname), "\r\n")
 
     na_rows = find (any (na_idx, 2));
     for ridx = na_rows(:)'


and re-running:


octave:51> A = [3.1 Inf NA; -Inf NaN 128];
octave:52> fn  = tempname ();
octave:53> fid = fopen (fn, "w");
octave:54> fputs (fid, "3.1\tInf\tNA\n-Inf\tNaN\t128");
octave:55> fclose (fid);
octave:56> dlmread (fn, '\t', 0, 0, "emptyvalue", NA)
ans =

     3.1000        Inf         NA
       -Inf        NaN   128.0000

octave:57> [a,d,h] = importdata (fn, '\t');
file_content =
{
  [1,1] = 3.1   Inf     NA
  [1,2] = -Inf  NaN     128
}

octave:58> unlink (fn);
octave:59> assert (a, A);
octave:60> assert (d, "\t");
octave:61> assert (h, 0);
octave:62> 
octave:62> A = [3.1 Inf NA; -Inf NaN 128];
octave:63> fn  = tempname ();
octave:64> fid = fopen (fn, "w");
octave:65> fputs (fid, "3.1\tInf\tNA\r\n-Inf\tNaN\t128");
octave:66> fclose (fid);
octave:67> dlmread (fn, '\t', 0, 0, "emptyvalue", NA)
ans =

     3.1000        Inf         NA
       -Inf        NaN   128.0000

octave:68> [a,d,h] = importdata (fn, '\t');
file_content =
{
  [1,1] = 3.1   Inf     NA
  [1,2] = 
  [1,3] = -Inf  NaN     128
}

octave:69> unlink (fn);
octave:70> assert (a, A);
octave:71> assert (d, "\t");
octave:72> assert (h, 0);


Notice how in the <CR><LF> DOS format the ostrsplit() routine has an extra
empty line...which gets tossed by the rest of the routine.

That's my way thinking about this; that we need to handle either Linux or DOS
ASCII line endings.  So, John, if you apply the diff hunk above, I've a
feeling it might work on Windows.

As a secondary note, I notice on Linux, unlike Windows, that dlmread() is
properly handling INFINITY *and* NAN as opposed to just INFINITY for Windows. 
That is fine because the "touch-up" code that follows addresses these
differences.  But we still need to answer why it is that in Windows the
dlmread() is reading slightly differently for the <CR><LF> and <LF> scenarios.
 Let's hold off on that until some feedback from John, but jumping ahead a
bit, I wonder if instead of the following from dlmread.cc:


  // Read the data one field at a time, growing the data matrix as needed.
  while (getline (*input, line))
    {


which has a linux definition and perhaps slightly different Windows definition
with regard to <CR><LF>, to the C++ 


istream& std::getline (char* s, streamsize n, char delim );


which might be more consistent between the two, i.e., like the conventional
Linux behavior in which \r is lumped in with the resulting data string and it
need only be tossed out if present.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?54622>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]