[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#22838: New 'Binary file' detection considered harmful
From: |
Eric Blake |
Subject: |
bug#22838: New 'Binary file' detection considered harmful |
Date: |
Mon, 29 Feb 2016 10:54:52 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 |
On 02/29/2016 10:40 AM, Marcello Perathoner wrote:
>> Wrong, at least according to the POSIX definition of text file. A text
>> file is one with no encoding errors.
>
>
> """
> 3.397 Text File
>
> A file that contains characters organized into zero or more lines. The
> lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
> in length, including the <newline> character. Although POSIX.1-2008 does
> not distinguish between text files and binary files (see the ISO C
> standard), many utilities only produce predictable or meaningful output
> when operating on text files. The standard utilities that have such
> restrictions always specify "text files" in their STDIN or INPUT FILES
> sections.
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html
>
> 3.206 Line
>
> A sequence of zero or more non- <newline> characters plus a terminating
> <newline> character.
>
> 3.87 Character
>
> A sequence of one or more bytes representing a single graphic symbol or
> control code.
>
> Note:
> This term corresponds to the ISO C standard term multi-byte character, where
> a single-byte character is a special case of a multi-byte character. Unlike
> the usage in the ISO C standard, character here has no necessary relationship
> with storage space, and byte is used when storage space is discussed.
>
> See the definition of the portable character set in Portable Character Set
> for a further explanation of the graphical representations of (abstract)
> characters, as opposed to character encodings.
>
Encoding errors are not characters, but bytes. A line cannot contain
encoding errors. Therefore, a file with encoding errors is not a text file.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
- bug#22838: New 'Binary file' detection considered harmful, Marcello Perathoner, 2016/02/28
- bug#22838: New 'Binary file' detection considered harmful, Paul Eggert, 2016/02/28
- bug#22838: New 'Binary file' detection considered harmful, Marcello Perathoner, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Eric Blake, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Marcello Perathoner, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful,
Eric Blake <=
- bug#22838: New 'Binary file' detection considered harmful, Eric Blake, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Marcello Perathoner, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Holger Bruenjes, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Eric Blake, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Paul Eggert, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Marcello Perathoner, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Paul Eggert, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Eric Blake, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Hans Pelleboer, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Jim Meyering, 2016/02/29