Changes to html_node/Character-Encoding.html

From: Jim Meyering
Subject: Changes to html_node/Character-Encoding.html
Date: Sun, 27 Sep 2020 23:36:54 -0400 (EDT)

+<h3 class="section">3.7 Character Encoding</h3>
+<a name="index-character-encoding"></a>
+<p>The <code>LC_CTYPE</code> locale specifies the encoding of characters in
+patterns and data, that is, whether text is encoded in UTF-8, ASCII,
+or some other encoding.  See <a 
+<p>In the &lsquo;<samp>C</samp>&rsquo; or &lsquo;<samp>POSIX</samp>&rsquo; 
locale, every character is encoded as
+a single byte and every byte is a valid character.  In more-complex
+encodings such as UTF-8, a sequence of multiple bytes may be needed to
+represent a character, and some bytes may be encoding errors that do
+not contribute to the representation of any character.  POSIX does not
+specify the behavior of <code>grep</code> when patterns or input data
+contain encoding errors or null characters, so portable scripts should
+avoid such usage.  As an extension to POSIX, GNU <code>grep</code> treats
+null characters like any other character.  However, unless the
+<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used, the
+presence of null characters in input or of encoding errors in output
+causes GNU <code>grep</code> to treat the file as binary and suppress
+details about matches.  See <a 
href="File-and-Directory-Selection.html#File-and-Directory-Selection">File and 
Directory Selection</a>.
+<p>Regardless of locale, the 103 characters in the POSIX Portable
+Character Set (a subset of ASCII) are always encoded as a single byte,
+and the 128 ASCII characters have their usual single-byte encodings on
+all but oddball platforms.

