Re: [Groff] mom : unicode in .INCLUDE'd files

groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] mom : unicode in .INCLUDE'd files

From:	Mike Bianchi
Subject:	Re: [Groff] mom : unicode in .INCLUDE'd files
Date:	Sun, 23 Jul 2017 08:23:51 -0400
User-agent:	Mutt/1.5.23 (2014-03-12)

This library purports to be a way to approach the problem ...

  
https://www.autoitconsulting.com/site/development/utf-8-utf-16-text-encoding-detection-library/
 

        UTF-8 and UTF-16 Text Encoding Detection Library
        by Jonathan Bennett | Aug 23, 2014 | Development |

This post shows how to detect UTF-8 and UTF-16 text and presents a fully
functional C++ and C# library that can be used to help with the detection.

I recently had to upgrade the text file handling feature of AutoIt to better
handle text files where no byte order mark (BOM) was present.  The older
version of code I was using worked fine for UTF-8 files (with or without BOM)
but it wasn't able to detect UTF-16 files without a BOM. I tried to the the
IsTextUnicode Win32 API function but this seemed extremely unreliable and
wouldn't detect UTF-16 Big-Endian text in my tests.

Note, especially for UTF-16 detection, there is always an element of ambiguity.
This post by Raymond shows that however you try and detect encoding there will
always be some sequence of bytes that will make your guesses look stupid.

Here are the detection methods I'm currently using for the various types of
text file.  The order of the checks I perform are:

    BOM
    UTF-8
    UTF-16 (newline)
    UTF-16 (null distribution)
        :
        :

--
 Mike Bianchi

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Groff] mom : unicode in .INCLUDE'd files, (continued)
- Re: [Groff] mom : unicode in .INCLUDE'd files, Peter Schaffter, 2017/07/20
  - Re: [Groff] mom : unicode in .INCLUDE'd files, E. Hoffmann, 2017/07/21
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Denis M. Wilson, 2017/07/21
- Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/21
  - Re: [Groff] mom : unicode in .INCLUDE'd files, Peter Schaffter, 2017/07/21
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/21
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Peter Schaffter, 2017/07/21
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/22
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Keith Marshall, 2017/07/22
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/23
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Mike Bianchi <=
    - Re: [Groff] mom : unicode in .INCLUDE'd files, John Gardner, 2017/07/23
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/23
    - Re: [Groff] mom : unicode in .INCLUDE'd files, John Gardner, 2017/07/23
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Keith Marshall, 2017/07/23
    - Re: [Groff] mom : unicode in .INCLUDE'd files, E. Hoffmann, 2017/07/22
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Mike Bianchi, 2017/07/22
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Mike Bianchi, 2017/07/22
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Steffen Nurpmeso, 2017/07/22
    - Re: [Groff] mom : unicode in .INCLUDE'd files, Ralph Corderoy, 2017/07/23
    - Re: [Groff] mom : unicode in .INCLUDE'd files, E. Hoffmann, 2017/07/23

Prev by Date: Re: [Groff] mom : unicode in .INCLUDE'd files
Next by Date: Re: [Groff] mom : unicode in .INCLUDE'd files
Previous by thread: Re: [Groff] mom : unicode in .INCLUDE'd files
Next by thread: Re: [Groff] mom : unicode in .INCLUDE'd files
Index(es):
- Date
- Thread