[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Changes to html_node/Character-Encoding.html

From: Jim Meyering
Subject: Changes to html_node/Character-Encoding.html
Date: Sun, 27 Sep 2020 23:36:54 -0400 (EDT)

CVSROOT:        /webcvs/grep
Module name:    grep
Changes by:     Jim Meyering <meyering> 20/09/27 23:36:49

Index: html_node/Character-Encoding.html
RCS file: html_node/Character-Encoding.html
diff -N html_node/Character-Encoding.html
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ html_node/Character-Encoding.html   28 Sep 2020 03:36:49 -0000      1.1
@@ -0,0 +1,100 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
+<!-- This manual is for grep, a pattern matching engine.
+Copyright (C) 1999-2002, 2005, 2008-2020 Free Software Foundation,
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+Texts.  A copy of the license is included in the section entitled
+"GNU Free Documentation License". -->
+<!-- Created by GNU Texinfo 6.5, -->
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<title>Character Encoding (GNU Grep 3.5)</title>
+<meta name="description" content="Character Encoding (GNU Grep 3.5)">
+<meta name="keywords" content="Character Encoding (GNU Grep 3.5)">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="makeinfo">
+<link href="index.html#Top" rel="start" title="Top">
+<link href="Index.html#Index" rel="index" title="Index">
+<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
+<link href="Regular-Expressions.html#Regular-Expressions" rel="up" 
title="Regular Expressions">
+<link href="Matching-Non_002dASCII.html#Matching-Non_002dASCII" rel="next" 
title="Matching Non-ASCII">
+<link href="Basic-vs-Extended.html#Basic-vs-Extended" rel="prev" title="Basic 
vs Extended">
+<style type="text/css">
+a.summary-letter {text-decoration: none}
+blockquote.indentedblock {margin-right: 0em}
+blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
+blockquote.smallquotation {font-size: smaller}
+div.display {margin-left: 3.2em}
+div.example {margin-left: 3.2em}
+div.lisp {margin-left: 3.2em}
+div.smalldisplay {margin-left: 3.2em}
+div.smallexample {margin-left: 3.2em}
+div.smalllisp {margin-left: 3.2em}
+kbd {font-style: oblique}
+pre.display {font-family: inherit}
+pre.format {font-family: inherit} {font-family: serif} {font-family: serif}
+pre.smalldisplay {font-family: inherit; font-size: smaller}
+pre.smallexample {font-size: smaller}
+pre.smallformat {font-family: inherit; font-size: smaller}
+pre.smalllisp {font-size: smaller}
+span.nolinebreak {white-space: nowrap}
+span.roman {font-family: initial; font-weight: normal}
+span.sansserif {font-family: sans-serif; font-weight: normal} {list-style: none}
+<link rel="stylesheet" type="text/css" href="/software/gnulib/manual.css">
+<body lang="en">
+<a name="Character-Encoding"></a>
+<div class="header">
+Next: <a href="Matching-Non_002dASCII.html#Matching-Non_002dASCII" 
accesskey="n" rel="next">Matching Non-ASCII</a>, Previous: <a 
href="Basic-vs-Extended.html#Basic-vs-Extended" accesskey="p" rel="prev">Basic 
vs Extended</a>, Up: <a href="Regular-Expressions.html#Regular-Expressions" 
accesskey="u" rel="up">Regular Expressions</a> &nbsp; [<a 
href="index.html#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" 
+<a name="Character-Encoding-1"></a>
+<h3 class="section">3.7 Character Encoding</h3>
+<a name="index-character-encoding"></a>
+<p>The <code>LC_CTYPE</code> locale specifies the encoding of characters in
+patterns and data, that is, whether text is encoded in UTF-8, ASCII,
+or some other encoding.  See <a 
+<p>In the &lsquo;<samp>C</samp>&rsquo; or &lsquo;<samp>POSIX</samp>&rsquo; 
locale, every character is encoded as
+a single byte and every byte is a valid character.  In more-complex
+encodings such as UTF-8, a sequence of multiple bytes may be needed to
+represent a character, and some bytes may be encoding errors that do
+not contribute to the representation of any character.  POSIX does not
+specify the behavior of <code>grep</code> when patterns or input data
+contain encoding errors or null characters, so portable scripts should
+avoid such usage.  As an extension to POSIX, GNU <code>grep</code> treats
+null characters like any other character.  However, unless the
+<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used, the
+presence of null characters in input or of encoding errors in output
+causes GNU <code>grep</code> to treat the file as binary and suppress
+details about matches.  See <a 
href="File-and-Directory-Selection.html#File-and-Directory-Selection">File and 
Directory Selection</a>.
+<p>Regardless of locale, the 103 characters in the POSIX Portable
+Character Set (a subset of ASCII) are always encoded as a single byte,
+and the 128 ASCII characters have their usual single-byte encodings on
+all but oddball platforms.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]