[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: docs for insert-file-contents use 'bytes'

From: Kenichi Handa
Subject: Re: docs for insert-file-contents use 'bytes'
Date: Thu, 02 Oct 2008 10:33:49 +0900
User-agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

In article <address@hidden>, Ted Zlatanov <address@hidden> writes:

KH> It's not that easy.  Some encoding requires to seek back an
KH> escape sequence to get the next character.  And, for UTF-16
KH> with BOM, we have to check the first 2-byte.

> OK.  Does it ever require going more than N*2 (where N = max sequence
> length for the encoding) bytes back?  Is N ever bigger than 10?  If not,
> it may be complicated code but at least it will be fairly fast.

N can be much much longer than 10.  For instance, the
following is the byte sequence of iso-2022-jp for a Japanese
sentence (ESC code is represented by "^[").


We must search back the sequence ^[$B or ^[(B for
iso-2022-jp.  Which pattern to search depends on the

> The semantics could be (given N as above):

> 1) jump to character number C: scan from beginning of file and count
> characters up to C if the encoding has a variable length.  Otherwise the
> offset is obvious.

> 2) jump to character around/at byte B: jump to B-N*2 and scan characters
> forward until you find the one that straddles or begins at B.  Also
> should have a way to report that character's actual starting byte
> position.

> 3) jump to byte: operate as now, just a fseek

> For my purposes (2) is most useful, but I can use (3) and bypass
> encodings.  (1) is not good for me, since the application is to view
> large files, but (1) is OK for small files.

As you now see from the above example, implementing (2) is
very difficult.  And, for small files, we don't need (1).
We can just read the whole file.

Kenichi Handa

reply via email to

[Prev in Thread] Current Thread [Next in Thread]