octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #54661] textscan() continues from next line if


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #54661] textscan() continues from next line if line ends with delimiter
Date: Sun, 16 Sep 2018 00:19:08 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0

Follow-up Comment #5, bug #54661 (project octave):

It seems to me that there is a flaw in the way the delimiters is constructed. 
The newline characters (eol1 and eol2) are included in the list of
delimiters:


    bool is_delim (unsigned char ch) const
    {
      return ((delim_table.empty () && (isspace (ch) || ch == eol1 || ch ==
eol2))
              || delim_table[ch] != '\0');
    }


and


    // Create look-up table of delimiters, based on 'delimiter'
    delim_table = std::string (256, '\0');
    if (eol1 >= 0 && eol1 < 256)
      delim_table[eol1] = '1';        // EOL is always a delimiter
    if (eol2 >= 0 && eol2 < 256)
      delim_table[eol2] = '1';        // EOL is always a delimiter


There's fine control lost as a result because sometimes the newline character
is supposed to behave differently.  In the following hunk of code, see how the
character c1 is checked for being a delimiter, i.e., is_delim (c1):


  // Skip delimiters -- multiple if MultipleDelimsAsOne specified.
  int
  textscan::skip_delim (delimited_stream& is)
  {
    int c1 = skip_whitespace (is, true);  // 'true': stop once EOL is read
    if (delim_list.numel () == 0)         // single character delimiter
      {
        if (is_delim (c1) || c1 == eol1 || c1 == eol2)
          {
            is.get ();
            if (c1 == eol1 && is.peek_undelim () == eol2)
              is.get_undelim ();          // if \r\n, skip the \n too.


Checking for c1 == eol1 or eol2 is superfluous as is_delim() will test true
based on the fact that eol1 and eol2 are currently in the list of delimiters. 
The thing is, there are circumstances where the EOL should not be discarded. 
I went about adding a EOLstop to the skip_delim() just as there is with
skip_whitespace(), i.e.,


        if (is_delim (c1) || ((c1 == eol1 || c1 == eol2) && ! EOLstop))
          {
            is.get ();


but that doesn't do anything in light of what I just pointed out.

Any agreement on the fact that the EOL characters can't be included within the
list of delimiters?  I think those need to be tested separately based on the
whether the EOL should be skipped or not.  Maybe we could conditionally
include the EOL characters in the list of delimiters, but because of the
two-character EOL sequence \r\n I think it might be best to keep them
separate.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?54661>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]