bison-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFA/RFC] extract strings from m4 skeletons


From: Paolo Bonzini
Subject: [RFA/RFC] extract strings from m4 skeletons
Date: Thu, 18 Jan 2007 12:04:13 +0100
User-agent: Thunderbird 1.5.0.9 (Macintosh/20061207)

This patch implements the idea I suggested a few hours ago. Conventional wisdom is that only m4 can parse m4 correctly, so that's what we do.

It also fixes a couple of messages that could not be translated correctly.

One is what Joel had anticipated:

contained M4 macro invocations will not be expanded.

It's actually worse -- they're wrong, because the string must be fully known at extraction time. My solution does not cater for this, and indeed there are a couple of places. But we can put big flashing warnings in bison.m4.

In the other case, instead of saying "%define variable `foo'" we have to say `%define foo'" to make the translator's life easier. The problematic case occurred in bison.m4, but I did the same change to the wording in parse-gram.y too, for consistency.

Anyway, back to the meat of the patch, which is data/m4-xgettext.sh.

1) What does it do? The scripts takes the list of skeletons and the list of macros to trace, and produces a `fake' C output with the strings. This C output is then fed to xgettext. Producing C has the advantage that xgettext understands #line directives: this way the m4 source file names end up in bison.pot, instead of the fake C file.

2) How does it do that? The script will look inside macro invocations to find the traced macros. We should use a technique similar to what is done in autoupdate to achieve perfect results. For now, however, we rely on something much simpler. We redefine a few key macros in order to expand all arms of the conditional statements and in order to expand the loops only once -- which is enough if we only need to look at autom4te traces.

3) Is it good enough? There is a case in which this mechanism breaks, that is, if a macro defines another macro. Of course, there is an example of such a thing in bison.m4. However, the script is coded so that work arounds can be easily established. In fact, m4 makes "macros that define macros" complicated enough that I doubt the work around will need to be replicated in the future.

Paul, Akim, what do you think?  You are the real m4-fu-ers.

Paolo
2007-07-18  Paolo Bonzini  <address@hidden>

        * data/.cvsignore: Add m4strings.c.
        * data/Makefile.am (m4_xgettext_files,
        (dist_noinst_DATA, dist_noinst_SCRIPTS): New.
        (m4strings.c): New target.
        * data/m4-xgettext.sh: New file.
        * data/bison.m4: Fix strings that cannot be translated.  Warn about
        correct practices for errors issued in the skeleton.
        * data/glr.cc, data/lalr1.cc: Don't invoke macros in translated strings.
        * src/parse-gram.y: Adapt string to be consistent with data/bison.m4
        change.
        * po/POTFILES.in: Add m4strings.c.
        * tests/input.at: Adjust expected output.

Index: data/.cvsignore
===================================================================
RCS file: /sources/bison/bison/data/.cvsignore,v
retrieving revision 1.2
diff -u -r1.2 .cvsignore
--- data/.cvsignore     20 Jun 2006 11:32:19 -0000      1.2
+++ data/.cvsignore     18 Jan 2007 10:21:29 -0000
@@ -1,2 +1,3 @@
 Makefile.in
 Makefile
+m4strings.c
Index: data/Makefile.am
===================================================================
RCS file: /sources/bison/bison/data/Makefile.am,v
retrieving revision 1.16
diff -u -r1.16 Makefile.am
--- data/Makefile.am    19 Dec 2006 00:34:36 -0000      1.16
+++ data/Makefile.am    18 Jan 2007 10:21:29 -0000
@@ -15,9 +15,22 @@
 ## Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
 ## 02110-1301  USA
 
-dist_pkgdata_DATA = README bison.m4 \
+m4_xgettext_files = \
    c-skel.m4 c.m4 yacc.c glr.c push.c \
    c++-skel.m4 c++.m4 location.cc lalr1.cc glr.cc
 
+dist_pkgdata_DATA = README bison.m4 $(m4_xgettext_files)
+
 m4sugardir = $(pkgdatadir)/m4sugar
 dist_m4sugar_DATA = m4sugar/m4sugar.m4
+
+dist_noinst_DATA = m4strings.c
+dist_noinst_SCRIPTS = m4-xgettext.sh
+
+m4strings.c: $(m4_xgettext_files) m4-xgettext.sh
+       srcdir="$(srcdir)" AUTOM4TE="$(AUTOM4TE)" \
+         $(SHELL) $(srcdir)/m4-xgettext.sh -M -Bdata \
+               $(m4_xgettext_files) -- \
+               b4_warn:1 b4_warn_at:3 \
+               b4_complain:1 b4_complain_at:3 \
+               b4_fatal:1 b4_fatal_at:3 > $(srcdir)/m4strings.c
Index: data/bison.m4
===================================================================
RCS file: /sources/bison/bison/data/bison.m4,v
retrieving revision 1.14
diff -u -r1.14 bison.m4
--- data/bison.m4       18 Jan 2007 08:32:33 -0000      1.14
+++ data/bison.m4       18 Jan 2007 10:58:52 -0000
@@ -85,7 +85,9 @@
 
 # b4_warn(FORMAT, [ARG1], [ARG2], ...)
 # ------------------------------------
-# Write @warn(FORMAT@,ARG1@,ARG2@,...@) to diversion 0.
+# Write @warn(FORMAT@,ARG1@,ARG2@,...@) to diversion 0.  Note that FORMAT
+# should not contain any macros, or the extraction of strings for translators
+# will not work correctly.
 #
 # As a simple test suite, this:
 #
@@ -118,12 +120,17 @@
 # b4_warn_at(START, END, FORMAT, [ARG1], [ARG2], ...)
 # ---------------------------------------------------
 # Write @warn(START@,END@,FORMAT@,ARG1@,ARG2@,...@) to diversion 0.
+# Note that FORMAT should not contain any macros, or the extraction of
+# strings for translators will not work correctly.
+
 m4_define([b4_warn_at],
 [b4_error_at([[warn]], $@)])
 
 # b4_complain(FORMAT, [ARG1], [ARG2], ...)
 # ----------------------------------------
 # Write @complain(FORMAT@,ARG1@,ARG2@,...@) to diversion 0.
+# Note that FORMAT should not contain any macros, or the extraction of
+# strings for translators will not work correctly.
 #
 # See the test suite for b4_warn above.
 m4_define([b4_complain],
@@ -132,12 +139,16 @@
 # b4_complain_at(START, END, FORMAT, [ARG1], [ARG2], ...)
 # -------------------------------------------------------
 # Write @complain(START@,END@,FORMAT@,ARG1@,ARG2@,...@) to diversion 0.
+# Note that FORMAT should not contain any macros, or the extraction of
+# strings for translators will not work correctly.
 m4_define([b4_complain_at],
 [b4_error_at([[complain]], $@)])
 
 # b4_fatal(FORMAT, [ARG1], [ARG2], ...)
 # -------------------------------------
 # Write @fatal(FORMAT@,ARG1@,ARG2@,...@) to diversion 0.
+# Note that FORMAT should not contain any macros, or the extraction of
+# strings for translators will not work correctly.
 #
 # See the test suite for b4_warn above.
 m4_define([b4_fatal],
@@ -146,6 +157,8 @@
 # b4_fatal_at(START, END, FORMAT, [ARG1], [ARG2], ...)
 # ----------------------------------------------------
 # Write @fatal(START@,END@,FORMAT@,ARG1@,ARG2@,...@) to diversion 0.
+# Note that FORMAT should not contain any macros, or the extraction of
+# strings for translators will not work correctly.
 m4_define([b4_fatal_at],
 [b4_error_at([[fatal]], $@)])
 
@@ -328,7 +341,7 @@
 m4_pushdef([b4_end], m4_shift(m4_shift(b4_occurrence)))dnl
 m4_ifndef($3[(]m4_quote(b4_user_name)[)],
           [b4_warn_at([b4_start], [b4_end],
-                      [[%s `%s' is not used]],
+                      [[`%s %s' is not used]],
                       [$1], [b4_user_name])])[]dnl
 m4_popdef([b4_occurrence])dnl
 m4_popdef([b4_user_name])dnl
@@ -391,12 +404,12 @@
 ## --------------------------------------------------------- ##
 
 m4_define([b4_check_user_names_wrap],
-[m4_ifdef([b4_percent_]$1[_user_]$2[s],
-          [b4_check_user_names([[%]$1 $2],
-                               [b4_percent_]$1[_user_]$2[s],
-                               [[b4_percent_]$1[_skeleton_]$2[s]])])])
+[m4_ifdef([b4_percent_]$1[_user_]$2,
+          [b4_check_user_names([[%]$1],
+                               [b4_percent_]$1[_user_]$2,
+                               [[b4_percent_]$1[_skeleton_]$2])])])
 
 m4_wrap([
-b4_check_user_names_wrap([[define]], [[variable]])
-b4_check_user_names_wrap([[code]], [[qualifier]])
+b4_check_user_names_wrap([[define]], [[variables]])
+b4_check_user_names_wrap([[code]], [[qualifiers]])
 ])
Index: data/glr.cc
===================================================================
RCS file: /sources/bison/bison/data/glr.cc,v
retrieving revision 1.38
diff -u -r1.38 glr.cc
--- data/glr.cc 18 Jan 2007 08:32:33 -0000      1.38
+++ data/glr.cc 18 Jan 2007 10:21:31 -0000
@@ -53,7 +53,7 @@
 
 # The header is mandatory.
 b4_defines_if([],
-              [b4_fatal([b4_skeleton[: using %%defines is mandatory]])])
+              [b4_fatal([[%s: using %%defines is mandatory]], [b4_skeleton])])
 
 m4_include(b4_pkgdatadir/[c++.m4])
 m4_include(b4_pkgdatadir/[location.cc])
Index: data/lalr1.cc
===================================================================
RCS file: /sources/bison/bison/data/lalr1.cc,v
retrieving revision 1.156
diff -u -r1.156 lalr1.cc
--- data/lalr1.cc       18 Jan 2007 08:32:33 -0000      1.156
+++ data/lalr1.cc       18 Jan 2007 10:21:33 -0000
@@ -27,7 +27,7 @@
 
 # The header is mandatory.
 b4_defines_if([],
-              [b4_fatal([b4_skeleton[: using %%defines is mandatory]])])
+              [b4_fatal([[%s: using %%defines is mandatory]], [b4_skeleton])])
 
 # Backward compatibility.
 m4_define([b4_location_constructors])
Index: data/m4-xgettext.sh
===================================================================
RCS file: data/m4-xgettext.sh
diff -N data/m4-xgettext.sh
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ data/m4-xgettext.sh 18 Jan 2007 10:21:33 -0000
@@ -0,0 +1,186 @@
+#! /bin/sh
+
+# Extract translated strings from the skeletons.
+
+# This file is part of GNU Bison
+# Copyright 2007 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+# 02111-1307  USA
+
+# Usage: m4-xgettext.sh input-file.m4 ... -- MACRO:ARG MACRO:ARG ...
+#
+# The script will extract the ARG-th argument of each specified
+# macro.
+#
+# We should use a technique similar to what is done in autoupdate
+# to look inside every macro invocation.  For now, however, we rely
+# on something much simpler.  We produced completely garbled output
+# but there is actually no need to look at it because we only
+# need the autom4te traces.  It only breaks in the case of a macro
+# that defines another macro: there is an example of such thing in
+# bison.m4 and we work around it below.
+
+: ${srcdir=.}
+: ${AUTOM4TE=autom4te}
+
+gettext_m4=`pwd`/gettext.m4
+trap 'exit_status=$?
+      rm -f $gettext_m4
+      exit $exit_status' 0
+
+lc_all_sed ()
+{
+  LC_ALL=C sed "$@"
+}
+
+# ------------------------------
+# Gather command line arguments.
+# ------------------------------
+
+AUTOM4TE_ARGS="-l M4sugar --no-cache"
+while true; do
+  case "$1" in
+    -B)
+      AUTOM4TE_ARGS="$AUTOM4TE_ARGS -B $2"
+      file_prefix=$2/
+      shift
+      shift
+      ;;
+    -B*)
+      AUTOM4TE_ARGS="$AUTOM4TE_ARGS -B $2"
+      file_prefix=`echo $1 | sed 's,-B,,' `/
+      shift
+      ;;
+    -M|--melt|-v|--verbose|-d|--debug)
+      AUTOM4TE_ARGS="$AUTOM4TE_ARGS $1"
+      shift
+      ;;
+    *)
+      break
+      ;;
+  esac
+done
+
+# ---------------------------------------
+# Create the m4 file that will be traced.
+# ---------------------------------------
+
+cat > gettext.m4 <<\_EOF
+# Use a different prefix for macros that we need later.
+m4_copy([m4_builtin], [_m4_builtin])
+m4_copy([m4_ifdef], [_m4_ifdef])
+m4_copy([m4_define], [_m4_define])
+
+# Throw away all the output.  We only use the autom4te traces.
+m4_define([m4_divert_push], [])
+m4_define([m4_divert_pop], [])
+m4_define([m4_divert], [])
+
+# Throw away include files, we include all we need explicitly.
+m4_define([m4_include], address@hidden)
+m4_define([m4_sinclude], address@hidden)
+
+# Process both sides of conditional macros.
+m4_define([m4_eval], [$1])
+m4_define([m4_foreach], [$3])
+m4_define([m4_map], [$2])
+m4_define([m4_map_sep], [$2])
+m4_define([m4_if], address@hidden)
+m4_define([m4_ifdef], [$2[]$3])
+m4_define([m4_ifndef], [$2[]$3])
+m4_define([m4_ifval], [$2[]$3])
+m4_define([m4_case], address@hidden)
+
+# Throw away the special meanings of macros defined by/for the skeletons:
+# just expand the definition in order to get the occurrences of traced
+# macros in the expansion, then define the macro to "$@".
+#
+# Don't redefine already defined macros, though, so that we can override
+# problematic cases below.
+
+_m4_define([m4_define], [$2[]_m4_ifdef([$1], [], [_m4_define([$1], 
address@hidden)])])
+_m4_define([m4_pushdef], [$2[]_m4_ifdef([$1], [], [_m4_define([$1], 
address@hidden)])])
+_m4_define([m4_define_default], [$2[]_m4_ifdef([$1], [], [_m4_define([$1], 
address@hidden)])])
+
+# Almost the same as above, but no need to expand the macro name.
+_m4_define([m4_copy], [_m4_ifdef([$1], [], [_m4_define([$1], 
address@hidden)])])
+
+# These add nothing, since we expand the definition of the macro below.
+_m4_define([m4_defn], [])
+_m4_define([m4_indir], [])
+_m4_define([m4_shift], [])
+
+# This is unnecessary because our m4_pushdef is equivalent to m4_define.
+_m4_define([m4_popdef], [])
+
+# A hack: we need to override the definition of b4_define_flag_if in bison.m4,
+# because it defines other macros.
+_m4_define([b4_define_flag_if],
+[_m4_define([b4_$1_if], [$][1][$][2])])])
+
+# Include bison.m4 first to get the b4_*_if macros.
+_m4_builtin([include], [bison.m4])
+
+_EOF
+
+# Add includes for the requested files.
+while [ $# -gt 0 ] && [ "$1" != -- ]; do
+  case "$1" in
+    bison.m4 | gettext.m4 ) ;;
+    *) echo "_m4_builtin([include], [$1])" ;;
+  esac
+  shift
+done >> gettext.m4
+
+# -------------------------------------
+# Compose the -t arguments to autom4te.
+# -------------------------------------
+
+shift
+args="$*"
+set fnord
+for i in $args; do
+  name=`echo $i | sed 's,:.*,,' `
+  arg=`echo $i | sed 's,.*:,,' `
+  fmt=
+
+  set -- "$@" -t "$name:\$l,,${file_prefix}\$f,,\$n,,\$$arg"
+done
+
+# Remove the fnord.
+shift
+
+# ------------------
+# Gather the traces.
+# ------------------
+
+# The sed script formats the traces as C source code so that xgettext
+# interprets the #line directives.
+#
+# We need to remove [...] to cater for double-quoted arguments, and
+# to format the argument as a C string.
+
+cd $srcdir
+$AUTOM4TE $AUTOM4TE_ARGS "$@" $gettext_m4 | lc_all_sed                 \
+  -e 's:\(.*\),,\(.*\),,\(.*\),,\(.*\):#line \1 "\2"                   \
+_(\4) /* \3 */: '                                                      \
+  -e 'P'                                                               \
+  -e 's:.*\n::'                                                                
\
+  -e 's:\\:\\\\:g'                                                     \
+  -e 's:":\\":g'                                                       \
+  -e 's:^_(\[\(.*\)]) /\*:_(\1) /*:g'                                  \
+  -e 's:^_(\(.*\)) /\*:_("\1") /*:g' || exit 1
+
Index: po/POTFILES.in
===================================================================
RCS file: /sources/bison/bison/po/POTFILES.in,v
retrieving revision 1.23
diff -u -r1.23 POTFILES.in
--- po/POTFILES.in      1 Oct 2006 23:35:37 -0000       1.23
+++ po/POTFILES.in      18 Jan 2007 10:21:34 -0000
@@ -1,3 +1,5 @@
+data/m4strings.c
+
 src/complain.c
 src/conflicts.c
 src/files.c
Index: src/parse-gram.y
===================================================================
RCS file: /sources/bison/bison/src/parse-gram.y,v
retrieving revision 1.117
diff -u -r1.117 parse-gram.y
--- src/parse-gram.y    18 Jan 2007 02:18:17 -0000      1.117
+++ src/parse-gram.y    18 Jan 2007 10:33:30 -0000
@@ -242,7 +242,7 @@
       strcpy (name + sizeof name_prefix - 1, $2);
       strcpy (name + sizeof name_prefix - 1 + length, ")");
       if (muscle_find_const (name))
-        warn_at (@2, _("%s `%s' redefined"), "%define variable", $2);
+        warn_at (@2, _("`%%define %s' redefined"), $2);
       MUSCLE_INSERT_STRING (uniqstr_new (name), $3);
       free (name);
       muscle_grow_user_name_list ("percent_define_user_variables", $2, @2);
Index: tests/input.at
===================================================================
RCS file: /sources/bison/bison/tests/input.at,v
retrieving revision 1.75
diff -u -r1.75 input.at
--- tests/input.at      17 Jan 2007 08:36:07 -0000      1.75
+++ tests/input.at      18 Jan 2007 10:21:35 -0000
@@ -718,9 +718,9 @@
 start: ;
 ]])
 AT_CHECK([[bison input-c.y]], [0], [],
-[[input-c.y:1.7: warning: %code qualifier `q' is not used
-input-c.y:2.7-9: warning: %code qualifier `bad' is not used
-input-c.y:3.7-9: warning: %code qualifier `bad' is not used
+[[input-c.y:1.7: warning: `%code q' is not used
+input-c.y:2.7-9: warning: `%code bad' is not used
+input-c.y:3.7-9: warning: `%code bad' is not used
 ]])
 
 AT_DATA([input-c-glr.y],
@@ -731,9 +731,9 @@
 start: ;
 ]])
 AT_CHECK([[bison input-c-glr.y]], [0], [],
-[[input-c-glr.y:1.7: warning: %code qualifier `q' is not used
-input-c-glr.y:2.7-9: warning: %code qualifier `bad' is not used
-input-c-glr.y:3.8-10: warning: %code qualifier `bad' is not used
+[[input-c-glr.y:1.7: warning: `%code q' is not used
+input-c-glr.y:2.7-9: warning: `%code bad' is not used
+input-c-glr.y:3.8-10: warning: `%code bad' is not used
 ]])
 
 AT_DATA([input-c++.y],
@@ -744,9 +744,9 @@
 start: ;
 ]])
 AT_CHECK([[bison input-c++.y]], [0], [],
-[[input-c++.y:1.7: warning: %code qualifier `q' is not used
-input-c++.y:2.7-9: warning: %code qualifier `bad' is not used
-input-c++.y:3.8: warning: %code qualifier `q' is not used
+[[input-c++.y:1.7: warning: `%code q' is not used
+input-c++.y:2.7-9: warning: `%code bad' is not used
+input-c++.y:3.8: warning: `%code q' is not used
 ]])
 
 AT_DATA([input-c++-glr.y],
@@ -757,9 +757,9 @@
 start: ;
 ]])
 AT_CHECK([[bison input-c++-glr.y]], [0], [],
-[[input-c++-glr.y:1.7-9: warning: %code qualifier `bad' is not used
-input-c++-glr.y:2.7: warning: %code qualifier `q' is not used
-input-c++-glr.y:3.7: warning: %code qualifier `q' is not used
+[[input-c++-glr.y:1.7-9: warning: `%code bad' is not used
+input-c++-glr.y:2.7: warning: `%code q' is not used
+input-c++-glr.y:3.7: warning: `%code q' is not used
 ]])
 
 AT_DATA([special-char-@@.y],
@@ -770,9 +770,9 @@
 start: ;
 ]])
 AT_CHECK([[bison special-char-@@.y]], [0], [],
-[[special-char-@@.y:1.7-9: warning: %code qualifier `bad' is not used
-special-char-@@.y:2.7: warning: %code qualifier `q' is not used
-special-char-@@.y:3.7: warning: %code qualifier `q' is not used
+[[special-char-@@.y:1.7-9: warning: `%code bad' is not used
+special-char-@@.y:2.7: warning: `%code q' is not used
+special-char-@@.y:3.7: warning: `%code q' is not used
 ]])
 
 AT_DATA([special-char-@:>@.y],
@@ -783,9 +783,9 @@
 start: ;
 ]])
 AT_CHECK([[bison special-char-@:>@.y]], [0], [],
-[[special-char-@:>@.y:1.7-9: warning: %code qualifier `bad' is not used
-special-char-@:>@.y:2.7: warning: %code qualifier `q' is not used
-special-char-@:>@.y:3.7: warning: %code qualifier `q' is not used
+[[special-char-@:>@.y:1.7-9: warning: `%code bad' is not used
+special-char-@:>@.y:2.7: warning: `%code q' is not used
+special-char-@:>@.y:3.7: warning: `%code q' is not used
 ]])
 
 AT_CLEANUP
@@ -808,13 +808,13 @@
 ]])
 
 AT_CHECK([[bison input.y]], [0], [],
-[[input.y:2.9-11: warning: %define variable `var' redefined
-input.y:3.10-12: warning: %define variable `var' redefined
-input.y:1.9-11: warning: %define variable `var' is not used
-input.y:2.9-11: warning: %define variable `var' is not used
-input.y:3.10-12: warning: %define variable `var' is not used
-input.y:4.9-16: warning: %define variable `special1' is not used
-input.y:5.9-16: warning: %define variable `special2' is not used
+[[input.y:2.9-11: warning: `%define var' redefined
+input.y:3.10-12: warning: `%define var' redefined
+input.y:1.9-11: warning: `%define var' is not used
+input.y:2.9-11: warning: `%define var' is not used
+input.y:3.10-12: warning: `%define var' is not used
+input.y:4.9-16: warning: `%define special1' is not used
+input.y:5.9-16: warning: `%define special2' is not used
 ]])
 
 AT_CLEANUP

reply via email to

[Prev in Thread] Current Thread [Next in Thread]