help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Checking if a file is binary (non-textual)


From: Jeff Clough
Subject: Re: Checking if a file is binary (non-textual)
Date: Mon, 28 Sep 2009 14:34:44 +0000

From: Nordlöw <per.nordlow@gmail.com>
Date: Mon, 28 Sep 2009 06:29:28 -0700 (PDT)

> What characters (bytes) should *not* be present in a text-file that
> may contain variable-length unicode characters.
> What does the unicode standard say about this?

If all you care about is UTF-8 and you believe Wiki:

http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

If you want more detail, that article has links to the relevant
standards.

For what it's worth though, following these standards aren't really
going to help you sort out the "binary" files from the "text" files.
Sure, if software wants to encode "text" in a way that is readable to
the rest of the world, it needs to follow these standards.  But if the
software wants to store data in the file in a non-"text" format, it
can do whatever it wants, including popping out a "binary" file that
looks like perfectly valid UTF-8.

MS systems tried to solve this by having a bit in the filesystem entry
as a binary/text flag, but even that can't be trusted.

Anyway, hope I helped! :)

Jeff



----------
Author of the Genesys System
A "free" universal role-playing game.
http://www.chaosphere.com/genesys/ 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]