[Groff-commit] groff ChangeLog src/preproc/preconv/preconv.cpp...

groff-commit
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Groff-commit] groff ChangeLog src/preproc/preconv/preconv.cpp...

From:	Werner LEMBERG
Subject:	[Groff-commit] groff ChangeLog src/preproc/preconv/preconv.cpp...
Date:	Thu, 08 Nov 2007 00:46:10 +0000
CVSROOT:        /cvsroot/groff
Module name:    groff
Changes by:     Werner LEMBERG <wl>     07/11/08 00:46:10

Modified files:
        .              : ChangeLog 
        src/preproc/preconv: preconv.cpp preconv.man 

Log message:
        * src/preproc/preconv/preconv.cpp (emacs_to_mime): Add `utf-16be'
        `utf-16le', `utf-16be-with-signature', `utf-16le-with-signature'.
        (is_comment_line): Handle '\" and '\# also.
        
        * src/preproc/preconv/preconv.man: Revise and make complete.

CVSWeb URLs:
http://cvs.savannah.gnu.org/viewcvs/groff/ChangeLog?cvsroot=groff&r1=1.1107&r2=1.1108
http://cvs.savannah.gnu.org/viewcvs/groff/src/preproc/preconv/preconv.cpp?cvsroot=groff&r1=1.13&r2=1.14
http://cvs.savannah.gnu.org/viewcvs/groff/src/preproc/preconv/preconv.man?cvsroot=groff&r1=1.2&r2=1.3

Patches:
Index: ChangeLog
===================================================================
RCS file: /cvsroot/groff/groff/ChangeLog,v
retrieving revision 1.1107
retrieving revision 1.1108
diff -u -b -r1.1107 -r1.1108
--- ChangeLog   30 Oct 2007 09:31:37 -0000      1.1107
+++ ChangeLog   8 Nov 2007 00:46:09 -0000       1.1108
@@ -1,3 +1,11 @@
+2007-11-08  Werner LEMBERG  <address@hidden>
+
+       * src/preproc/preconv/preconv.cpp (emacs_to_mime): Add `utf-16be'
+       `utf-16le', `utf-16be-with-signature', `utf-16le-with-signature'.
+       (is_comment_line): Handle '\" and '\# also.
+
+       * src/preproc/preconv/preconv.man: Revise and make complete.
+
 2007-10-25  Werner LEMBERG  <address@hidden>
 
        * tmac/cs.tmac: New file holding Czech strings, contributed by

Index: src/preproc/preconv/preconv.cpp
===================================================================
RCS file: /cvsroot/groff/groff/src/preproc/preconv/preconv.cpp,v
retrieving revision 1.13
retrieving revision 1.14
diff -u -b -r1.13 -r1.14
--- src/preproc/preconv/preconv.cpp     13 Aug 2006 12:37:20 -0000      1.13
+++ src/preproc/preconv/preconv.cpp     8 Nov 2007 00:46:10 -0000       1.14
@@ -151,9 +151,13 @@
   {"us-ascii",                         "US-ASCII"},    // Emacs
   {"utf8",                             "UTF-8"},       // alias
   {"utf-16",                           "UTF-16"},      // Emacs
+  {"utf-16be",                         "UTF-16BE"},    // Emacs
   {"utf-16-be",                                "UTF-16BE"},    // Emacs
+  {"utf-16be-with-signature",          "UTF-16"},      // Emacs, not UTF-16BE
   {"utf-16-be-with-signature",         "UTF-16"},      // Emacs, not UTF-16BE
+  {"utf-16le",                         "UTF-16LE"},    // Emacs
   {"utf-16-le",                                "UTF-16LE"},    // Emacs
+  {"utf-16le-with-signature",          "UTF-16"},      // Emacs, not UTF-16LE
   {"utf-16-le-with-signature",         "UTF-16"},      // Emacs, not UTF-16LE
   {"utf-8",                            "UTF-8"},       // Emacs
 
@@ -857,7 +861,7 @@
 {
   if (!s || !*s)
     return 0;
-  if (*s == '.')
+  if (*s == '.' || *s == '\'')
   {
     s++;
     while (*s == ' ' || *s == '\t')
@@ -932,11 +936,16 @@
 //
 // We search for the following line:
 //
-//   .\"...-*-<local variables list>-*-
+//   <comment> ... -*-<local variables list>-*-
 //
-// (`...' might be anything).  There can be blanks after
-// the leading `.'; additionally, you might use `\#' starting
-// a line instead of `.\"'.
+// (`...' might be anything).
+//
+// <comment> can be one of the following syntax forms at the
+// beginning of the line:
+//
+//   .\"   .\#   '\"   '\#   \#
+//
+// There can be whitespace after the leading `.' or "'".
 //
 // The local variables list must occur within the first
 // comment block at the very beginning of the data stream.

Index: src/preproc/preconv/preconv.man
===================================================================
RCS file: /cvsroot/groff/groff/src/preproc/preconv/preconv.man,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -b -r1.2 -r1.3
--- src/preproc/preconv/preconv.man     2 Jul 2006 09:06:19 -0000       1.2
+++ src/preproc/preconv/preconv.man     8 Nov 2007 00:46:10 -0000       1.3
@@ -1,5 +1,5 @@
 .ig
-Copyright (C) 2006 Free Software Foundation, Inc.
+Copyright (C) 2006, 2007 Free Software Foundation, Inc.
 
 Permission is granted to make and distribute verbatim copies of
 this manual provided the copyright notice and this permission notice
@@ -25,16 +25,22 @@
 .
 .
 .SH SYNOPSIS
-.B preconv
-[
-.B \-dhrv
-]
-[
-.BI \-e encoding
-]
-[
-.IR files \|.\|.\|.\|
-]
+.SY preconv
+.OP \-dr
+.OP \-e encoding
+.RI [ files
+.IR .\|.\|. ]
+.
+.SY preconv
+.B \-h
+|
+.B \-\-help
+.
+.SY preconv
+.B \-v
+|
+.B \-\-version
+.YS
 .
 .PP
 It is possible to have whitespace between the
@@ -79,6 +85,8 @@
 uses the algorithm described below to select the input encoding.
 .
 .TP
+.B \-\-help
+.TQ
 .B \-h
 Print help message.
 .
@@ -87,6 +95,8 @@
 Do not add .lf requests.
 .
 .TP
+.B \-\-version
+.TQ
 .B \-v
 Print version number.
 .
@@ -125,15 +135,15 @@
 .BR \-k .
 .
 .SS "Byte Order Mark"
-The Unicode Standard defines character U+FEFF as the the Byte Order Mark
+The Unicode Standard defines character U+FEFF as the Byte Order Mark
 (BOM).
 On the other hand, value U+FFFE is guaranteed not be a Unicode character at
 all.
 This allows to detect the byte order within the data stream (either
-big-endian or lower-endian), and the MIME encodings `UTF-16' and `UTF-32'
-mandate that the data stream starts with U+FEFF.
-Similarly, the data stream encoded as `UTF-8' might start with a BOM (to
-ease the conversion from and to UTF-16 and UTF-32).
+big-endian or lower-endian), and the MIME encodings \%`UTF-16' and
+\%`UTF-32' mandate that the data stream starts with U+FEFF.
+Similarly, the data stream encoded as \%`UTF-8' might start with a BOM (to
+ease the conversion from and to \%UTF-16 and \%UTF-32).
 In all cases, the byte order mark is
 .I not
 part of the data but part of the encoding protocol; with other words,
@@ -147,14 +157,136 @@
 .BR groff .
 .
 .SS "Coding Tags"
-To be written.
+Editors which support more than a single character encoding need tags
+within the input files to mark the file's encoding.
+While it is possible to guess the right input encoding with the help of
+heuristic algorithms for data which represents a greater amount of a natural
+language, it is still just a guess.
+Additionally, all algorithms fail easily for input which is either too short
+or doesn't represent a natural language.
+.
+.PP
+For these reasons,
+.B preconv
+supports the coding tag convention (with some restrictions) as used by
+.B "GNU Emacs"
+and
+.B XEmacs
+(and probably other programs too).
+.
+.PP
+Coding tags in
+.B "GNU Emacs"
+and
+.B XEmacs
+are stored in so-called
+.IR "File Variables" .
+.B preconv
+recognizes the following syntax form which must be put into a troff comment
+in the first or second line.
+.
+.RS
+.PP
+\-*\-
+.IR tag1 :
+.IR value1 ;
+.IR tag2 :
+.IR value2 ;
+\&.\|.\|.\& \-*\-
+.RE
+.
+.PP
+The only relevant tag for
+.B preconv
+is `coding' which can take the values listed below.
+Here an example line which tells
+.B Emacs
+to edit a file in troff mode, and to use \%latin2 as its encoding.
+.
+.RS
+.PP
+.EX
+\&.\[rs]" \-*\- mode: troff; coding: latin-2 \-*\-
+.EE
+.RE
+.
+.PP
+The following list gives all MIME coding tags (either lowercase or
+uppercase) supported by
+.BR preconv ;
+this list is hard-coded in the source.
+.
+.RS
+.PP
+.ad l
+\%big5, \%cp1047, \%euc-jp, \%euc-kr, \%gb2312, \%iso-8859-1, \%iso-8859-2,
+\%iso-8859-5, \%iso-8859-7, \%iso-8859-9, \%iso-8859-13, \%iso-8859-15,
+\%koi8-r, \%us-ascii, \%utf-8, \%utf-16, \%utf-16be, \%utf-16le
+.ad
+.RE
+.
+.PP
+In addition, the following hard-coded list of other tags is recognized which
+eventually map to values from the list above.
+.
+.RS
+.PP
+.ad l
+\%ascii, \%chinese-big5, \%chinese-euc, \%chinese-iso-8bit, \%cn-big5,
+\%\%cn-gb, \%cn-gb-2312, \%cp878, \%csascii, \%csisolatin1,
+\%cyrillic-iso-8bit, \%cyrillic-koi8, \%euc-china, \%euc-cn, \%euc-japan,
+\%euc-japan-1990, \%euc-korea, \%greek-iso-8bit, \%iso-10646/utf8,
+\%iso-10646/utf-8, \%iso-latin-1, \%iso-latin-2, \%iso-latin-5,
+\%iso-latin-7, \%iso-latin-9, \%japanese-euc, \%japanese-iso-8bit, \%jis8,
+\%koi8, \%korean-euc, \%korean-iso-8bit, \%latin-0, \%latin1, \%latin-1,
+\%latin-2, \%latin-5, \%latin-7, \%latin-9, \%mule-utf-8, \%mule-utf-16,
+\%mule-utf-16be, \%mule-utf-16-be, \%mule-utf-16be-with-signature,
+\%mule-utf-16le, \%mule-utf-16-le, \%mule-utf-16le-with-signature, \%utf8,
+\%utf-16-be, \%utf-16-be-with-signature, \%utf-16be-with-signature,
+\%utf-16-le, \%utf-16-le-with-signature, \%utf-16le-with-signature
+.ad
+.RE
+.
+.PP
+Those tags are taken from
+.B "GNU Emacs"
+and 
+.BR XEmacs ,
+together with some aliases.
+Trailing \%`-dos', \%`-unix', and \%`-mac' suffixes of coding tags (which
+give the end-of-line convention used in the file) are stripped off before
+the comparison with the above tags happens.
 .
 .SS "Iconv Issues"
-To be written.
+.B preconv
+by itself only supports three encodings: \%latin-1, cp1047, and \%UTF-8;
+all other encodings are passed to the
+.B iconv
+library functions.
+At compile time it is searched and checked for a valid
+.B iconv
+implementation; a call to `preconv \-\-version' shows whether
+.B iconv
+is used.
+.
+.
+.SH BUGS
+.B preconv
+doesn't support
+.I "local variable lists"
+yet.
+This is a different syntax form to specify local variables at the end of a
+file.
 .
 .
 .SH "SEE ALSO"
 .BR groff (@MAN1EXT@)
+.br
+the
+.B "GNU Emacs"
+and
+.B XEmacs
+info pages
 .
 .\" Local Variables:
 .\" mode: nroff
[Prev in Thread]
Current Thread
[Next in Thread]
[Groff-commit] groff ChangeLog src/preproc/preconv/preconv.cpp..., Werner LEMBERG <=
Prev by Date: [Groff-commit] groff ChangeLog.117 src/devices/grolbp/charset....
Next by Date: [Groff-commit] groff ChangeLog tmac/doc-syms tmac/groff_mdoc.man
Previous by thread: [Groff-commit] groff ChangeLog.117 src/devices/grolbp/charset....
Next by thread: [Groff-commit] groff ChangeLog tmac/doc-syms tmac/groff_mdoc.man
Index(es):
- Date
- Thread