[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: One function in Texinfo::Common to handle file na
From: |
Patrice Dumas |
Subject: |
branch master updated: One function in Texinfo::Common to handle file name encoding |
Date: |
Thu, 24 Feb 2022 17:42:29 -0500 |
This is an automated email from the git hooks/post-receive script.
pertusus pushed a commit to branch master
in repository texinfo.
The following commit(s) were added to refs/heads/master by this push:
new 69aa96fccc One function in Texinfo::Common to handle file name encoding
69aa96fccc is described below
commit 69aa96fccccb2fe1fa6e8609f80e697b977264be
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Thu Feb 24 23:42:15 2022 +0100
One function in Texinfo::Common to handle file name encoding
* tp/Texinfo/Common.pm (encode_file_name),
tp/Texinfo/Convert/Converter.pm (encoded_file_name),
tp/Texinfo/Convert/DocBook.pm, tp/Texinfo/Convert/HTML.pm,
tp/Texinfo/Convert/IXIN.pm, tp/Texinfo/Convert/Info.pm,
tp/Texinfo/Convert/LaTeX.pm, tp/Texinfo/Convert/Utils.pm
(expand_verbatiminclude), tp/Texinfo/ParserNonXS.pm: put the
main function encode_file_name() doing file name encoding
in Texinfo::Common and use encoded_file_name for Converters.
Return the file name encoding if there is a need to decode
the file name for error messages.
* tp/Texinfo/ParserNonXS.pm (_save_line_directive): encode CPP
line directive file name.
---
ChangeLog | 18 ++++++
tp/TODO | 11 ++++
tp/Texinfo/Common.pm | 27 +++++++++
tp/Texinfo/Convert/Converter.pm | 30 ++++------
tp/Texinfo/Convert/DocBook.pm | 3 +-
tp/Texinfo/Convert/HTML.pm | 4 +-
tp/Texinfo/Convert/IXIN.pm | 3 +-
tp/Texinfo/Convert/Info.pm | 3 +-
tp/Texinfo/Convert/LaTeX.pm | 3 +-
tp/Texinfo/Convert/Utils.pm | 38 ++++++++----
tp/Texinfo/ParserNonXS.pm | 33 +++++------
tp/t/input_files/cpp_lines.texi | 4 ++
tp/t/results/include/cpp_lines.pl | 63 +++++++++++++++++++-
tp/t/test_utils.pl | 55 ++++++++++++-----
tp/tests/formatting/list-of-tests | 5 +-
"tp/tests/formatting/os\303\251.texi" | 4 ++
.../formatting/res_parser/cpp_lines/cpp_lines.1 | 0
.../formatting/res_parser/cpp_lines/cpp_lines.2 | 3 +
.../formatting/res_parser/cpp_lines/cpp_lines.html | 68 ++++++++++++++++++++++
.../non_ascii_command_line/Chapteur.html | 2 +
.../os\303\251-texinfo.texi" | 4 ++
.../non_ascii_command_line/os\303\251.2" | 2 +
tp/tests/test_scripts/formatting_cpp_lines.sh | 19 ++++++
23 files changed, 333 insertions(+), 69 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 3d46c554fc..68261ceb60 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,21 @@
+2022-02-24 Patrice Dumas <pertusus@free.fr>
+
+ One function in Texinfo::Common to handle file name encoding
+
+ * tp/Texinfo/Common.pm (encode_file_name),
+ tp/Texinfo/Convert/Converter.pm (encoded_file_name),
+ tp/Texinfo/Convert/DocBook.pm, tp/Texinfo/Convert/HTML.pm,
+ tp/Texinfo/Convert/IXIN.pm, tp/Texinfo/Convert/Info.pm,
+ tp/Texinfo/Convert/LaTeX.pm, tp/Texinfo/Convert/Utils.pm
+ (expand_verbatiminclude), tp/Texinfo/ParserNonXS.pm: put the
+ main function encode_file_name() doing file name encoding
+ in Texinfo::Common and use encoded_file_name for Converters.
+ Return the file name encoding if there is a need to decode
+ the file name for error messages.
+
+ * tp/Texinfo/ParserNonXS.pm (_save_line_directive): encode CPP
+ line directive file name.
+
2022-02-24 Gavin Smith <gavinsmith0123@gmail.com>
Include file name encoding for XS parser
diff --git a/tp/TODO b/tp/TODO
index 9143ff6a06..fbc3b5878a 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -19,6 +19,17 @@ Before next release
for @example args, use *-user as class?
+
+byte encoding, check how used, check XS parser?
+l 3226 ParserNonXS.pm
+ unshift @{$self->{'input'}}, {
+ 'name' => $file,
+
+bytes: (global_information)
+$self->{'info'}->{'input_file_name'}
+$self->{'info'}->{'input_directory'}
+
+
Bugs
====
diff --git a/tp/Texinfo/Common.pm b/tp/Texinfo/Common.pm
index 746e1e4a60..b04ae79c49 100644
--- a/tp/Texinfo/Common.pm
+++ b/tp/Texinfo/Common.pm
@@ -1505,6 +1505,33 @@ sub parse_node_manual($)
# misc functions also interesting for converters
+# Reverse the decoding of the file name from the input encoding. When
+# dealing with file names, we want Perl strings representing sequences of
+# bytes, not Unicode codepoints.
+# This is necessary even if the name of the included file is purely
+# ASCII, as the name of the directory it is located within may contain
+# non-ASCII characters.
+# Otherwise, the -e operator and similar may not work correctly.
+# TODO document and add the possibility to use configuration_information
+sub encode_file_name($$;$)
+{
+ my $configuration_information = shift;
+ my $file_name = shift;
+ my $input_encoding = shift;
+
+ my $encoding;
+
+ if ($input_encoding and ($input_encoding eq 'utf-8'
+ or $input_encoding eq 'utf-8-strict')) {
+ utf8::encode($file_name);
+ $encoding = 'utf-8';
+ } else {
+ $file_name = Encode::encode($input_encoding, $file_name);
+ $encoding = $input_encoding;
+ }
+ return ($file_name, $encoding);
+}
+
sub locate_include_file($$)
{
my $configuration_information = shift;
diff --git a/tp/Texinfo/Convert/Converter.pm b/tp/Texinfo/Convert/Converter.pm
index b70c0d1e56..4ca8a64835 100644
--- a/tp/Texinfo/Convert/Converter.pm
+++ b/tp/Texinfo/Convert/Converter.pm
@@ -1009,36 +1009,26 @@ sub present_bug_message($$;$)
warn "You found a bug: $message\n\n".$additional_information;
}
-# Reverse the decoding of the file name from the input encoding. When
-# dealing with file names, we want Perl strings representing sequences of
-# bytes, not Unicode codepoints.
-# This is necessary even if the name of the included file is purely
-# ASCII, as the name of the directory it is located within may contain
-# non-ASCII characters.
-# Otherwise, the -e operator and similar may not work correctly.
-sub encode_file_name($$)
+# Reverse the decoding of the file name from the input encoding.
+# TODO document
+sub encoded_file_name($$)
{
my $self = shift;
my $file_name = shift;
- # FIXME use the locale instead?
- my $info = $self->{'parser_info'};
- if ($info) {
- my $encoding = $info->{'input_perl_encoding'};
- if ($encoding and ($encoding eq 'utf-8' or $encoding eq 'utf-8-strict')) {
- utf8::encode($file_name);
- } else {
- $file_name = Encode::encode($encoding, $file_name);
- }
- }
- return $file_name;
+ my $document_encoding;
+ $document_encoding = $self->{'parser_info'}->{'input_perl_encoding'}
+ if ($self->{'parser_info'}
+ and defined($self->{'parser_info'}->{'input_perl_encoding'}));
+ return Texinfo::Common::encode_file_name($self, $file_name,
$document_encoding);
}
sub txt_image_text($$$)
{
my ($self, $element, $basefile) = @_;
- my $text_file_name = $self->encode_file_name($basefile.'.txt');
+ my ($text_file_name, $file_name_encoding)
+ = $self->encoded_file_name($basefile.'.txt');
my $txt_file = Texinfo::Common::locate_include_file($self, $text_file_name);
if (!defined($txt_file)) {
diff --git a/tp/Texinfo/Convert/DocBook.pm b/tp/Texinfo/Convert/DocBook.pm
index e997d6f542..c240c25a02 100644
--- a/tp/Texinfo/Convert/DocBook.pm
+++ b/tp/Texinfo/Convert/DocBook.pm
@@ -1118,7 +1118,8 @@ sub _convert($$;$)
}
my @files;
foreach my $extension (@docbook_image_extensions) {
- my $file_name = $self->encode_file_name("$basefile.$extension");
+ my ($file_name, $file_name_encoding)
+ = $self->encoded_file_name("$basefile.$extension");
if ($self->Texinfo::Common::locate_include_file($file_name)) {
push @files, ["$basefile.$extension", uc($extension)];
}
diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index 484ef4e09d..374b41c4d8 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -271,7 +271,8 @@ sub html_image_file_location_name($$$$)
unshift @extensions, ("$extension", ".$extension");
}
foreach my $extension (@extensions) {
- my $file_name = $self->encode_file_name($image_basefile.$extension);
+ my ($file_name, $file_name_encoding)
+ = $self->encoded_file_name($image_basefile.$extension);
my $located_image_path
= $self->Texinfo::Common::locate_include_file($file_name);
if (defined($located_image_path) and $located_image_path ne '') {
@@ -296,6 +297,7 @@ sub html_image_file_location_name($$$$)
}
}
}
+ # TODO set and return $image_path_encoding?
return ($image_file, $image_basefile, $image_extension, $image_path);
}
diff --git a/tp/Texinfo/Convert/IXIN.pm b/tp/Texinfo/Convert/IXIN.pm
index aca2ca24e3..2ec7066016 100644
--- a/tp/Texinfo/Convert/IXIN.pm
+++ b/tp/Texinfo/Convert/IXIN.pm
@@ -839,7 +839,8 @@ sub output_ixin($$)
}
foreach my $extension (@extension, @image_files_extensions) {
my $file_name_text = "$basefile.$extension";
- my $file_name = $self->encode_file_name($file_name_text);
+ my ($file_name, $file_name_encoding)
+ = $self->encoded_file_name($file_name_text);
my $file = $self->Texinfo::Common::locate_include_file($file_name);
if (defined($file)) {
my $filehandle = do { local *FH };
diff --git a/tp/Texinfo/Convert/Info.pm b/tp/Texinfo/Convert/Info.pm
index 712d2d3d8f..7d0be98af3 100644
--- a/tp/Texinfo/Convert/Info.pm
+++ b/tp/Texinfo/Convert/Info.pm
@@ -510,7 +510,8 @@ sub format_image($$)
}
my $image_file;
foreach my $extension (@extensions) {
- my $file_name = $self->encode_file_name($basefile.$extension);
+ my ($file_name, $file_name_encoding)
+ = $self->encoded_file_name($basefile.$extension);
if ($self->Texinfo::Common::locate_include_file($file_name)) {
# use the basename and not the file found. It is agreed that it is
# better, since in any case the files are moved.
diff --git a/tp/Texinfo/Convert/LaTeX.pm b/tp/Texinfo/Convert/LaTeX.pm
index 666fc488bd..0e1bf16fb3 100644
--- a/tp/Texinfo/Convert/LaTeX.pm
+++ b/tp/Texinfo/Convert/LaTeX.pm
@@ -2308,7 +2308,8 @@ sub _convert($$)
my $image_file;
foreach my $extension (@LaTeX_image_extensions) {
- my $file_name = $self->encode_file_name("$basefile.$extension");
+ my ($file_name, $file_name_encoding)
+ = $self->encoded_file_name("$basefile.$extension");
my $located_file =
$self->Texinfo::Common::locate_include_file($file_name);
if (defined($located_file)) {
diff --git a/tp/Texinfo/Convert/Utils.pm b/tp/Texinfo/Convert/Utils.pm
index 317ae979c5..218a986f7e 100644
--- a/tp/Texinfo/Convert/Utils.pm
+++ b/tp/Texinfo/Convert/Utils.pm
@@ -196,28 +196,38 @@ sub expand_verbatiminclude($$$)
my $configuration_information = shift;
my $current = shift;
- return unless ($current->{'extra'} and
defined($current->{'extra'}->{'text_arg'}));
+ my $input_encoding;
+
+ return unless ($current->{'extra'}
+ and defined($current->{'extra'}->{'text_arg'}));
my $file_name_text = $current->{'extra'}->{'text_arg'};
- # FIXME $file_name_text should be encoded to the file system
- # encoding here to be passed to locate_include_file
+ $input_encoding = $current->{'extra'}->{'input_perl_encoding'}
+ if (defined($current->{'extra'}->{'input_perl_encoding'}));
+
+ my ($file_name, $file_name_encoding)
+ = Texinfo::Common::encode_file_name($configuration_information,
+ $file_name_text,
+ $input_encoding);
+
my $file = Texinfo::Common::locate_include_file($configuration_information,
- $file_name_text);
+ $file_name);
my $verbatiminclude;
if (defined($file)) {
if (!open(VERBINCLUDE, $file)) {
if ($registrar) {
- # FIXME $file should be decoded to perl internal codepoints here
+ my $decoded_file = $file;
+ # need to decode to the internal perl codepoints for error message
+ $decoded_file = Encode::decode($file_name_encoding, $file)
+ if (defined($file_name_encoding));
$registrar->line_error($configuration_information,
- sprintf(__("could not read %s: %s"), $file, $!),
- $current->{'line_nr'});
+ sprintf(__("could not read %s: %s"), $decoded_file, $!),
+ $current->{'line_nr'});
}
} else {
- if (defined $current->{'extra'}->{'input_perl_encoding'}) {
- binmode(VERBINCLUDE, ":encoding("
- . $current->{'extra'}->{'input_perl_encoding'}
- . ")");
+ if (defined($input_encoding)) {
+ binmode(VERBINCLUDE, ":encoding(" . $input_encoding . ")");
}
$verbatiminclude = { 'cmdname' => 'verbatim',
'parent' => $current->{'parent'},
@@ -229,10 +239,14 @@ sub expand_verbatiminclude($$$)
}
if (!close (VERBINCLUDE)) {
if ($registrar) {
+ my $decoded_file = $file;
+ # need to decode to the internal perl codepoints for error message
+ $decoded_file = Encode::decode($file_name_encoding, $file)
+ if (defined($file_name_encoding));
$registrar->document_warn(
$configuration_information, sprintf(__(
"error on closing \@verbatiminclude file %s: %s"),
- $file, $!));
+ $decoded_file, $!));
}
}
}
diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index 6845082616..ac48d2c1a0 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -1989,7 +1989,13 @@ sub _save_line_directive
my $input = $self->{'input'}->[0];
return if !$input;
$input->{'line_nr'} = $line_nr if $line_nr;
- $input->{'name'} = $file_name if $file_name;
+ # need to convert to bytes for file name
+ if (defined($file_name)) {
+ my ($encoded_file_name, $file_name_encoding)
+ = Texinfo::Common::encode_file_name($self, $file_name,
+ $self->{'info'}->{'input_perl_encoding'});
+ $input->{'name'} = $encoded_file_name;
+ }
}
# returns next text fragment, be it pending from a macro expansion or
@@ -3206,21 +3212,11 @@ sub _end_line($$$)
} elsif ($superfluous_arg) {
# An error message is issued below.
} elsif ($command eq 'include') {
- my $file_name = $text;
- # When dealing with file names, we want Perl strings representing
sequences
+ # We want Perl strings representing sequences
# of bytes, not codepoints in the internal perl encoding.
- # This is necessary even if the name of the included file is
purely
- # ASCII, as the name of the directory it is located within may
contain
- # non-ASCII characters.
- # Otherwise, the -e operator and similar may not work correctly.
- if (defined $self->{'info'}->{'input_perl_encoding'}) {
- my $encoding = $self->{'info'}->{'input_perl_encoding'};
- if ($encoding and ($encoding eq 'utf-8' or $encoding eq
'utf-8-strict')) {
- utf8::encode($file_name);
- } else {
- $file_name = Encode::encode($encoding, $file_name);
- }
- }
+ my ($file_name, $file_name_encoding)
+ = Texinfo::Common::encode_file_name($self, $text,
+ $self->{'info'}->{'input_perl_encoding'});
my $file = Texinfo::Common::locate_include_file($self, $file_name);
if (defined($file)) {
my $filehandle = do { local *FH };
@@ -3233,13 +3229,16 @@ sub _end_line($$$)
'line_nr' => 0,
'pending' => [],
'fh' => $filehandle };
+ # TODO note that it is bytes. No reason to have it used much
+ # Make sure to document that it is bytes.
+ # TODO add $file_name_encoding information?
$current->{'extra'}->{'file'} = $file;
# we set the type to replaced to tell converters not to
# expand the @-command
$current->{'type'} = 'replaced';
} else {
- # FIXME $text does not show the include directory. However
using $file
- # would require to decode it to perl internal codepoints
+ # FIXME $text does not show the include directory. Using $file
+ # would require to decode it to perl internal codepoints with
$file_name_encoding
$self->_command_error($current, $line_nr,
__("\@%s: could not open %s: %s"),
$command, $text, $!);
diff --git a/tp/t/input_files/cpp_lines.texi b/tp/t/input_files/cpp_lines.texi
index d3e56b6f4e..06dbde59f4 100644
--- a/tp/t/input_files/cpp_lines.texi
+++ b/tp/t/input_files/cpp_lines.texi
@@ -47,4 +47,8 @@ line before
@email{after verb}
+# line 5 "accentêd"
+
+@documentlanguage làng
+
@bye
diff --git a/tp/t/results/include/cpp_lines.pl
b/tp/t/results/include/cpp_lines.pl
index bbf2e73de5..3b942f5488 100644
--- a/tp/t/results/include/cpp_lines.pl
+++ b/tp/t/results/include/cpp_lines.pl
@@ -675,6 +675,47 @@ $result_trees{'cpp_lines'} = {
{
'parent' => {},
'text' => '
+',
+ 'type' => 'empty_line'
+ },
+ {
+ 'parent' => {},
+ 'text' => '
+',
+ 'type' => 'empty_line'
+ },
+ {
+ 'args' => [
+ {
+ 'contents' => [
+ {
+ 'parent' => {},
+ 'text' => "l\x{e0}ng"
+ }
+ ],
+ 'extra' => {
+ 'spaces_after_argument' => '
+'
+ },
+ 'parent' => {},
+ 'type' => 'line_arg'
+ }
+ ],
+ 'cmdname' => 'documentlanguage',
+ 'extra' => {
+ 'spaces_before_argument' => ' ',
+ 'text_arg' => 'là ng'
+ },
+ 'line_nr' => {
+ 'file_name' => 'accentêd',
+ 'line_nr' => 7,
+ 'macro' => ''
+ },
+ 'parent' => {}
+ },
+ {
+ 'parent' => {},
+ 'text' => '
',
'type' => 'empty_line'
}
@@ -817,6 +858,11 @@
$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[32]{'contents'}[0]{'parent
$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[32]{'contents'}[1]{'parent'}
= $result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[32];
$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[32]{'parent'} =
$result_trees{'cpp_lines'}{'contents'}[1];
$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[33]{'parent'} =
$result_trees{'cpp_lines'}{'contents'}[1];
+$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[34]{'parent'} =
$result_trees{'cpp_lines'}{'contents'}[1];
+$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[35]{'args'}[0]{'contents'}[0]{'parent'}
= $result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[35]{'args'}[0];
+$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[35]{'args'}[0]{'parent'}
= $result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[35];
+$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[35]{'parent'} =
$result_trees{'cpp_lines'}{'contents'}[1];
+$result_trees{'cpp_lines'}{'contents'}[1]{'contents'}[36]{'parent'} =
$result_trees{'cpp_lines'}{'contents'}[1];
$result_trees{'cpp_lines'}{'contents'}[1]{'extra'}{'node_content'}[0] =
$result_trees{'cpp_lines'}{'contents'}[1]{'args'}[0]{'contents'}[0];
$result_trees{'cpp_lines'}{'contents'}[1]{'extra'}{'nodes_manuals'}[0]{'node_content'}[0]
= $result_trees{'cpp_lines'}{'contents'}[1]{'args'}[0]{'contents'}[0];
$result_trees{'cpp_lines'}{'contents'}[1]{'parent'} =
$result_trees{'cpp_lines'};
@@ -873,6 +919,9 @@ line before
@email{after verb}
+
+@documentlanguage làng
+
@bye
';
@@ -915,6 +964,8 @@ after inc.
after verb
+
+
';
$result_nodes{'cpp_lines'} = {
@@ -933,7 +984,17 @@ $result_menus{'cpp_lines'} = {
'structure' => {}
};
-$result_errors{'cpp_lines'} = [];
+$result_errors{'cpp_lines'} = [
+ {
+ 'error_line' => "warning: l\x{e0}ng is not a valid language code
+",
+ 'file_name' => 'accentêd',
+ 'line_nr' => 7,
+ 'macro' => '',
+ 'text' => "l\x{e0}ng is not a valid language code",
+ 'type' => 'warning'
+ }
+];
$result_floats{'cpp_lines'} = {};
diff --git a/tp/t/test_utils.pl b/tp/t/test_utils.pl
index edf3f0ef15..d6d5e9ec77 100644
--- a/tp/t/test_utils.pl
+++ b/tp/t/test_utils.pl
@@ -27,6 +27,8 @@ require Texinfo::ModulePath;
Texinfo::ModulePath::init(undef, undef, 'updirs' => 2);
# For consistent test results, use the C locale
+# Note that this should prevent displaying some for non ascii characters
+# in error messages in particular
$ENV{LC_ALL} = 'C';
$ENV{LANGUAGE} = 'en';
@@ -34,17 +36,10 @@ $ENV{LANGUAGE} = 'en';
use Test::More;
-use Texinfo::Parser;
-use Texinfo::Convert::Text;
-use Texinfo::Convert::Texinfo;
-use Texinfo::Structuring;
-use Texinfo::Convert::Plaintext;
-use Texinfo::Convert::Info;
-use Texinfo::Convert::HTML;
-use Texinfo::Convert::TexinfoXML;
-use Texinfo::Convert::DocBook;
-use Texinfo::Convert::LaTeX;
-use Texinfo::Config;
+# to determine the locale encoding to output the Texinfo to Texinfo
+# result when regenerating
+use I18N::Langinfo qw(langinfo CODESET);
+use Encode;
use File::Basename;
use File::Copy;
use File::Compare; # standard since 5.004
@@ -57,6 +52,19 @@ use Storable qw(dclone); # standard in 5.007003
#use Struct::Compare;
use Getopt::Long qw(GetOptions);
+use Texinfo::Common;
+use Texinfo::Convert::Texinfo;
+use Texinfo::Config;
+use Texinfo::Parser;
+use Texinfo::Convert::Text;
+use Texinfo::Structuring;
+use Texinfo::Convert::Plaintext;
+use Texinfo::Convert::Info;
+use Texinfo::Convert::LaTeX;
+use Texinfo::Convert::HTML;
+use Texinfo::Convert::TexinfoXML;
+use Texinfo::Convert::DocBook;
+
# FIXME Is it really useful?
use vars qw(%result_texis %result_texts %result_trees %result_errors
%result_indices %result_sectioning %result_nodes %result_menus
@@ -105,6 +113,9 @@ foreach my $dir ('t', 't/results', $output_files_dir) {
}
}
+my $locale_encoding = langinfo(CODESET);
+$locale_encoding = undef if ($locale_encoding eq '');
+
ok(1);
our %formats = (
@@ -895,6 +906,8 @@ sub test($$)
$result = $parser->parse_texi_piece($test_text);
}
if (defined($test_input_file_name)) {
+ # FIXME should we need to encode or do we assume that
+ # $test_input_file_name is already bytes?
$parser->{'info'}->{'input_file_name'} = $test_input_file_name;
}
} else {
@@ -1144,8 +1157,16 @@ sub test($$)
print OUT 'use utf8;'."\n\n";
#print STDERR "Generate: ".Data::Dumper->Dump([$result], ['$res']);
+ # NOTE $test_name is in general used for directories and
+ # file names, and therefore should be be bytes. Here it is used as a
+ # text string, if non ascii, it should be decoded to internal
+ # perl codepoints as OUT is encoded as utf8. Alternatively it
+ # could be encoded to be used as file name, but it probably is not the
+ # best solution.
my $out_result;
{
+ # NOTE rare extra keys could be bytes. They could be incorrectly
+ # encoded here. Let's wait for actual cases before fixing.
local $Data::Dumper::Sortkeys = \&filter_tree_keys;
$out_result = Data::Dumper->Dump([$split_result],
['$result_trees{\''.$test_name.'\'}']);
}
@@ -1172,6 +1193,8 @@ sub test($$)
}
{
local $Data::Dumper::Sortkeys = 1;
+ # NOTE file names are bytes, therefore ther could be a need to
+ # decode them
$out_result .= Data::Dumper->Dump([$errors],
['$result_errors{\''.$test_name.'\'}']) ."\n\n";
$out_result .= Data::Dumper->Dump([$indices],
['$result_indices{\''.$test_name.'\'}']) ."\n\n"
if ($indices);
@@ -1207,8 +1230,13 @@ sub test($$)
print OUT $out_result;
close (OUT);
- print STDERR "-->
$test_name\n".Texinfo::Convert::Texinfo::convert_to_texinfo($result)."\n"
- if ($self->{'generate'});
+ if ($self->{'generate'}) {
+ my $texinfo_text =
Texinfo::Convert::Texinfo::convert_to_texinfo($result);
+ if (defined($locale_encoding)) {
+ $texinfo_text = Encode::encode($locale_encoding, $texinfo_text);
+ }
+ print STDERR "--> $test_name\n". $texinfo_text ."\n";
+ }
}
if (!$self->{'generate'}) {
%result_converted = ();
@@ -1377,6 +1405,7 @@ sub output_texi_file($)
mkdir $dir or die
unless (-d $dir);
my $file = "${dir}$test_name.texi";
+ # We have no idea about encodings, better use bytes everywhere
open (OUTFILE, ">$file") or die ("Open $file: $!\n");
my $first_line = "\\input texinfo \@c -*-texinfo-*-";
diff --git a/tp/tests/formatting/list-of-tests
b/tp/tests/formatting/list-of-tests
index f3751f1e68..ae16d92253 100644
--- a/tp/tests/formatting/list-of-tests
+++ b/tp/tests/formatting/list-of-tests
@@ -10,6 +10,10 @@ simplest_test_css simplest.texi --css-include file.css
# check that command line overrides document
documentlanguage_cmdline documentlanguage.texi --document-language=fr
+# already tested in t/*.t, but here want to have a result with
+# accented characters in error messages
+cpp_lines ../../t/input_files/cpp_lines.texi
+
# some command-line arguments when incorrect cause texi2any to die.
# easily tested by calling directly ./texi2any.pl and checking visually:
# ./texi2any.pl --footnote-style=bâd
@@ -18,5 +22,4 @@ documentlanguage_cmdline documentlanguage.texi
--document-language=fr
non_ascii_command_line osé.texi --html --split=Mekanïk
--document-language=Destruktïw -c 'Kommandöh vâl' -D TÛT -D 'vùr ké' -U ôndef
-c 'FORMAT_MENU mînù' --macro-expand=@OUT_DIR@osé-texinfo.texi
--internal-links=@OUT_DIR@intérnal.txt --css-include çss.css --css-include
cêss.css --css-ref=rëf --css-ref=öref
# test for the copying of image with non ascii characters for epub
-# to be added when it does not fail anymore
#non_ascii_test_epub osé.texi --init epub3.pm -c 'EPUB_CREATE_CONTAINER 0'
diff --git "a/tp/tests/formatting/os\303\251.texi"
"b/tp/tests/formatting/os\303\251.texi"
index db36c2c7f1..10141774bc 100644
--- "a/tp/tests/formatting/os\303\251.texi"
+++ "b/tp/tests/formatting/os\303\251.texi"
@@ -21,3 +21,7 @@ value vùr @value{vùr}.
@image{dîrectory/imàge,,,âlt,.êxt}
@include not_existïng.téxi
+
+@verbatiminclude included_akçentêd.texi
+
+@verbatiminclude vi_not_existïng.téxi
diff --git a/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.1
b/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.1
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.2
b/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.2
new file mode 100644
index 0000000000..e493afa3e6
--- /dev/null
+++ b/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.2
@@ -0,0 +1,3 @@
+g_f:74: @include: could not find file_with_cpp_lines.texi
+accentêd:7: warning: làng is not a valid language code
+cpp_lines.texi: warning: must specify a title with a title command or @top
diff --git a/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.html
b/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.html
new file mode 100644
index 0000000000..d07733d1ad
--- /dev/null
+++ b/tp/tests/formatting/res_parser/cpp_lines/cpp_lines.html
@@ -0,0 +1,68 @@
+<!DOCTYPE html>
+<html>
+<!-- Created by texinfo, http://www.gnu.org/software/texinfo/ -->
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<title>Untitled Document</title>
+
+<meta name="description" content="Untitled Document">
+<meta name="keywords" content="Untitled Document">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="texi2any">
+<meta name="viewport" content="width=device-width,initial-scale=1">
+
+<style type="text/css">
+<!--
+span.program-in-footer {font-size: smaller}
+-->
+</style>
+
+
+</head>
+
+<body lang="en">
+
+
+<p><a class="email" href="mailto:before top">before top</a>.
+</p>
+<a class="node" id="Top"></a>
+<p># 10 25 209
+# 1 2
+</p>
+<pre class="verbatim">
+ #line 5 "f"
+</pre>
+
+<p><a class="email" href="mailto:after lacro def">after lacro def</a>
+</p>
+<p># line 7 "k"
+</p>
+<p><a class="email" href="mailto:after macro call">after macro call</a>.
+</p>
+
+<p><a class="email" href="mailto:after macrotwo def">after macrotwo def</a>
+</p>
+<p>line before
+# line 666 "x"
+</p>
+<p><a class="email" href="mailto:after macrotwo call">after macrotwo call</a>.
+</p>
+<p><a class="email" href="mailto:after inc">after inc</a>.
+</p>
+<p><tt class="verb">
+#line 5 "in verb"
+</tt>
+</p>
+<p><a class="email" href="mailto:after verb">after verb</a>
+</p>
+
+
+<hr>
+<p>
+ <span class="program-in-footer">This document was generated on <em
class="emph">a sunny day</em> using <a class="uref"
href="http://www.gnu.org/software/texinfo/"><em
class="emph">texi2any</em></a>.</span>
+</p>
+
+
+</body>
+</html>
diff --git
a/tp/tests/formatting/res_parser/non_ascii_command_line/Chapteur.html
b/tp/tests/formatting/res_parser/non_ascii_command_line/Chapteur.html
index 98c49e654a..71f800ef1a 100644
--- a/tp/tests/formatting/res_parser/non_ascii_command_line/Chapteur.html
+++ b/tp/tests/formatting/res_parser/non_ascii_command_line/Chapteur.html
@@ -68,6 +68,8 @@ ul.mark-néni {list-style-type: "vàça"}
<img class="image" src="dîrectory/imàge.êxt" alt="âlt">
+
+
</div>
<hr>
<p>
diff --git
"a/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251-texinfo.texi"
"b/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251-texinfo.texi"
index 4ea951c406..e4ad1dc5aa 100644
---
"a/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251-texinfo.texi"
+++
"b/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251-texinfo.texi"
@@ -19,3 +19,7 @@ In included téxt.
@image{dîrectory/imàge,,,âlt,.êxt}
@include not_existïng.téxi
+
+@verbatiminclude included_akçentêd.texi
+
+@verbatiminclude vi_not_existïng.téxi
diff --git
"a/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251.2"
"b/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251.2"
index 4dbb7790d5..054aa9681a 100644
--- "a/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251.2"
+++ "b/tp/tests/formatting/res_parser/non_ascii_command_line/os\303\251.2"
@@ -3,3 +3,5 @@ texi2any: warning: Destruktïw is not a valid language code
texi2any: warning: unknown variable from command line: Kommandöh
osé.texi:23: @include: could not find not_existïng.téxi
osé.texi:21: warning: @image file `dîrectory/imàge' (for HTML) not found,
using `dîrectory/imàge.êxt'
+osé.texi:25: @verbatiminclude: could not find included_akçentêd.texi
+osé.texi:27: @verbatiminclude: could not find vi_not_existïng.téxi
diff --git a/tp/tests/test_scripts/formatting_cpp_lines.sh
b/tp/tests/test_scripts/formatting_cpp_lines.sh
new file mode 100755
index 0000000000..c20e239e41
--- /dev/null
+++ b/tp/tests/test_scripts/formatting_cpp_lines.sh
@@ -0,0 +1,19 @@
+#! /bin/sh
+# This file generated by maintain/regenerate_cmd_tests.sh
+
+if test z"$srcdir" = "z"; then
+ srcdir=.
+fi
+
+one_test_logs_dir=test_log
+
+
+dir=formatting
+name='cpp_lines'
+mkdir -p $dir
+
+"$srcdir"/run_parser_all.sh -dir $dir $name
+exit_status=$?
+cat $dir/$one_test_logs_dir/$name.log
+exit $exit_status
+
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch master updated: One function in Texinfo::Common to handle file name encoding,
Patrice Dumas <=