[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: BOM mark from Windows notepad

From: Hans Aberg
Subject: Re: BOM mark from Windows notepad
Date: Thu, 19 Nov 2009 13:28:36 +0100

On 19 Nov 2009, at 11:14, Francisco Vila wrote:

I think that it was changed. If the BOM is only allowed in the beginning of
the file, it becomes a state-dependent character. For example, if one
includes two files verbatim in another, then the BOMs will no longer be in
the beginning of the combined stream. So therefore this state-less
definition is to be preferred.

This problem is more frequent than you may think, at least in my
environment. Last week I promised to bring a case of faulty LY from
Windows notepad; now I realize that all cases which might have failed
were previously edited by me, putting the BOM away from the start of
the file. All my students work on Windows and every instance of their
documents that I edited did fail.

On UNIX-like systems, one can chain commands that only handle byte sequences though often used for text processing. UTF-9 was invented to make such usage possible. For example,
  cat file1 file2 > file3
will concatenate file1 and file2 into file3. It is not feasible to change 'cat', as it is a part of the operative system one will then have to asses the impact on all tools that may use 'cat'.

One can also use pipes and RPC - files can be made looking like streams and vice versa. A state-dependent BOM (only accepted at the beginning of a file) does not really work on UNIX-like platforms. So I think that state-less definition was triggered by requirement for those platforms, though it has wider applicability.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]