[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: .substring bug - indicies don't work as documented(?)

From: Ralph Corderoy
Subject: Re: .substring bug - indicies don't work as documented(?)
Date: Sun, 28 Oct 2001 08:51:32 +0000

Hi Werner,

> But this is not the behavior documented in the man page.  It talks
> about character indices, not the `point' between two characters (as
> used e.g. in Emacs).  So the table has to be
>          1  2  3  4  5  6  7
>          a  b  c  d  e  f  g
>         -6 -5 -4 -3 -2 -1  0

Yep, looks like my equivalent mail crossed in the post.

> I believe that this is natural for computing substrings.

Languages seem to do it all kinds of ways.

    % python
    >>> s = 'abcdefg'
    >>> s[1]
    >>> s[1:1]
    >>> s[1:2]
    >>> s[0:2]
    >>> s[-2]  
    >>> s[-2:]

>From http://python.org/doc/current/lib/typesseq.html

    s[i]      i'th item of s, origin 0    (2)
    s[i:j]    slice of s from i to j      (2), (3)

    (2) If i or j is negative, the index is relative to the end of the
    string, i.e., len(s) + i or len(s) + j is substituted. But note
    that -0 is still 0. 

    (3) The slice of s from i to j is defined as the sequence of items
    with index k such that i <= k < j. If i or j is greater than
    len(s), use len(s). If i is omitted, use 0. If j is omitted, use
    len(s). If i is greater than or equal to j, the slice is empty. 

In fact, when I saw it, I thought .substring had been based on Icon's


    In Icon, the characters in a string are identified by their
    position, counting from 1.  The positions refer to the gaps between
    the characters, not to the characters themselves. Positive numbers
    count from the left starting at 1, and nonpositive numbers count
    from the right starting at 0. For example, the positions of the
    gaps in the string "abc" are numbered like this:

           -3    -2    -1     0
            |  a  |  b  |  c  |
            1     2     3     4

    So position 1 (or -3) refers to the position before the first
    character; position 2 (or -2) refers to the position before the
    second character; and so on. Position 0 is always the end of the

> This is especially noteworthy.  The substring starting at index 1 and
> ending at index 1 is of course "a" and not the empty string.

Not in Python or Icon  ;-)  But at the end of the day, this area
differs so much between languages that people just have be certain of
how the current one does it.

I did see an explanation as to why Icon had got it right and it seemed
a good argument at the time, but I can't find or recall it now.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]