[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: .substring bug - indicies don't work as documented(?)
From: |
Ralph Corderoy |
Subject: |
Re: .substring bug - indicies don't work as documented(?) |
Date: |
Sun, 28 Oct 2001 08:51:32 +0000 |
Hi Werner,
> But this is not the behavior documented in the man page. It talks
> about character indices, not the `point' between two characters (as
> used e.g. in Emacs). So the table has to be
>
> 1 2 3 4 5 6 7
> a b c d e f g
> -6 -5 -4 -3 -2 -1 0
Yep, looks like my equivalent mail crossed in the post.
> I believe that this is natural for computing substrings.
Languages seem to do it all kinds of ways.
% python
>>> s = 'abcdefg'
>>> s[1]
'b'
>>> s[1:1]
''
>>> s[1:2]
'b'
>>> s[0:2]
'ab'
>>> s[-2]
'f'
>>> s[-2:]
'fg'
>From http://python.org/doc/current/lib/typesseq.html
s[i] i'th item of s, origin 0 (2)
s[i:j] slice of s from i to j (2), (3)
(2) If i or j is negative, the index is relative to the end of the
string, i.e., len(s) + i or len(s) + j is substituted. But note
that -0 is still 0.
(3) The slice of s from i to j is defined as the sequence of items
with index k such that i <= k < j. If i or j is greater than
len(s), use len(s). If i is omitted, use 0. If j is omitted, use
len(s). If i is greater than or equal to j, the slice is empty.
In fact, when I saw it, I thought .substring had been based on Icon's
indexing.
http://www.nmt.edu/tcc/help/lang/icon/positions.html
In Icon, the characters in a string are identified by their
position, counting from 1. The positions refer to the gaps between
the characters, not to the characters themselves. Positive numbers
count from the left starting at 1, and nonpositive numbers count
from the right starting at 0. For example, the positions of the
gaps in the string "abc" are numbered like this:
-3 -2 -1 0
+-----+-----+-----+
| a | b | c |
+-----+-----+-----+
1 2 3 4
So position 1 (or -3) refers to the position before the first
character; position 2 (or -2) refers to the position before the
second character; and so on. Position 0 is always the end of the
string.
> This is especially noteworthy. The substring starting at index 1 and
> ending at index 1 is of course "a" and not the empty string.
Not in Python or Icon ;-) But at the end of the day, this area
differs so much between languages that people just have be certain of
how the current one does it.
I did see an explanation as to why Icon had got it right and it seemed
a good argument at the time, but I can't find or recall it now.
Cheers,
Ralph.
Re: .substring bug - indicies don't work as documented(?), Werner LEMBERG, 2001/10/27