[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] libpoke: Enable octal and hex \-sequence in string litera
From: |
Jose E. Marchesi |
Subject: |
Re: [PATCH v2] libpoke: Enable octal and hex \-sequence in string literals |
Date: |
Tue, 03 Nov 2020 12:58:27 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) |
Hi Mohammad.
> 2020-11-03 Mohammad-Reza Nabipoor <m.nabipoor@yahoo.com>
>
> * libpoke/pkl-trans.c (pkl_trans1_ps_string): Add support for octal
> and hexadecimal escape sequences.
> * doc/poke.texi (String Literals): Document new escape sequences.
> * testsuite/poke.pkl/strings-esc-1.pk: New test.
> * testsuite/poke.pkl/strings-esc-2.pk: Likewise.
> * testsuite/poke.pkl/strings-esc-diag-1.pk: Likewise.
> * testsuite/poke.pkl/strings-esc-diag-2.pk: Likewise.
> * testsuite/poke.pkl/strings-esc-diag-3.pk: Likewise.
> * testsuite/poke.pkl/strings-esc-diag-4.pk: Likewise.
> * testsuite/poke.pkl/string-diag-1.pk: `\x` is now a valid sequence;
> use `\z` as an invalid escape sequence.
> * testsuite/Makefile.am (EXTRA_DIST): Add new tests.
> ---
>
> Hi, Jose.
>
> Thanks for your answers and comments.
>
> Changes from version 1:
> - Added documentation in `poke.texi`.
> - Updateed tests to add more normal characters.
> - Changed interpretation of octal escape sequences.
> Old behavior: `"\400" == " 0"` (space character + 0 character).
> Because 0o400 does not fit inside a byte it treats the last digit
> as a separate character (not part of the octal number).
> This behavior was incompatible with C compilers (gcc emits a warning
> and clang emits an error). I droped this "feature" in favor of an
> error.
> New behavior: `"\400"` (error: octal escape sequence out of range).
>
>
> Now two more questions :)
> - Is documentation understandable?
I think it is. But see a few notes below :)
> - Do you like the new behavior regarding octal escape sequences?
Makes sense to me, to error if the specified quantity overflows a byte.
>
> Regards,
> Mohammad-Reza
>
>
> ChangeLog | 15 +++
> doc/poke.texi | 11 ++
> libpoke/pkl-trans.c | 126 ++++++++++++++++++-----
> testsuite/Makefile.am | 6 ++
> testsuite/poke.pkl/string-diag-1.pk | 2 +-
> testsuite/poke.pkl/strings-esc-1.pk | 25 +++++
> testsuite/poke.pkl/strings-esc-2.pk | 25 +++++
> testsuite/poke.pkl/strings-esc-diag-1.pk | 3 +
> testsuite/poke.pkl/strings-esc-diag-2.pk | 3 +
> testsuite/poke.pkl/strings-esc-diag-3.pk | 5 +
> testsuite/poke.pkl/strings-esc-diag-4.pk | 5 +
> 11 files changed, 200 insertions(+), 26 deletions(-)
> create mode 100644 testsuite/poke.pkl/strings-esc-1.pk
> create mode 100644 testsuite/poke.pkl/strings-esc-2.pk
> create mode 100644 testsuite/poke.pkl/strings-esc-diag-1.pk
> create mode 100644 testsuite/poke.pkl/strings-esc-diag-2.pk
> create mode 100644 testsuite/poke.pkl/strings-esc-diag-3.pk
> create mode 100644 testsuite/poke.pkl/strings-esc-diag-4.pk
>
> diff --git a/ChangeLog b/ChangeLog
> index 9098ddd2..9cf32a73 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,18 @@
> +2020-11-03 Mohammad-Reza Nabipoor <m.nabipoor@yahoo.com>
> +
> + * libpoke/pkl-trans.c (pkl_trans1_ps_string): Add support for octal
> + and hexadecimal escape sequences.
> + * doc/poke.texi (String Literals): Document new escape sequences.
> + * testsuite/poke.pkl/strings-esc-1.pk: New test.
> + * testsuite/poke.pkl/strings-esc-2.pk: Likewise.
> + * testsuite/poke.pkl/strings-esc-diag-1.pk: Likewise.
> + * testsuite/poke.pkl/strings-esc-diag-2.pk: Likewise.
> + * testsuite/poke.pkl/strings-esc-diag-3.pk: Likewise.
> + * testsuite/poke.pkl/strings-esc-diag-4.pk: Likewise.
> + * testsuite/poke.pkl/string-diag-1.pk: `\x` is now a valid sequence;
> + use `\z` as an invalid escape sequence.
> + * testsuite/Makefile.am (EXTRA_DIST): Add new tests.
> +
> 2020-11-01 Egeyar Bagcioglu <egeyar@gmail.com>
>
> * libpoke/ios-dev-file.h: Include errno.h.
> diff --git a/doc/poke.texi b/doc/poke.texi
> index fb3aad87..16d81f5c 100644
> --- a/doc/poke.texi
> +++ b/doc/poke.texi
> @@ -6133,8 +6133,19 @@ Denotes an horizontal tab.
> Denotes a backlash @code{\} character.
> @item \"
> Denotes a double-quote @code{"} character.
> +@item \o
> +@item \oo
> +@item \ooo
> +Denotes an octal number (@code{o} represents an octal digit).
> +The number should be less than 256 (0o400).
I would use @var{num} instead of these `o', `oo' and `ooo'. Then you
can say that NUM is a number expressed with octal digits from 0 to 255.
> +@item \xX
> +@item \xXX
> +Denotes a hexadecimal number (@code{X} represents a hexadecimal digit).
Ditto for @var{X}. Better to use \x@var{num} IMO.
> @end table
>
> +Due to the fact that strings in Poke are @code{NULL}-terminated, inserting
> +a @code{NULL} character using escape sequence leads to compilation error.
I would make it explicit that in this context a "NULL character" is the
character with code 0.
> +
> @node String Types
> @subsection String Types
>
> diff --git a/libpoke/pkl-trans.c b/libpoke/pkl-trans.c
> index 18366ca0..6f414e8a 100644
> --- a/libpoke/pkl-trans.c
> +++ b/libpoke/pkl-trans.c
> @@ -20,6 +20,7 @@
>
> #include <gettext.h>
> #define _(str) gettext (str)
> +#include <ctype.h>
> #include <stdio.h>
> #include <string.h>
> #include <xalloc.h>
> @@ -304,6 +305,10 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_string)
> size_t string_length, i;
> bool found_backslash = false;
>
> +#define ISODIGIT(c) ((unsigned)(c) - '0' < 8) // is octal digit
> +#define XDIGIT(x) \
> + ((unsigned)(x) - '0' < 10 ? (x) - '0' : ((x) | 0x20) - 'a' + 10)
> +
Please do not use C++ comments.
> /* Please keep this code in sync with the string printer in
> pvm-val.c:pvm_print_val. */
>
> @@ -311,27 +316,45 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_string)
> \-expansion, and report errors in the contents of the string. */
> for (p = string_pointer, string_length = 0; *p != '\0'; ++p)
> {
> - if (p[0] == '\\')
> + string_length++;
> + if (*p != '\\')
> + continue;
> +
> + found_backslash = true;
> + ++p;
> +
> + if (ISODIGIT (p[0]))
> + {
> + if (ISODIGIT (p[1]))
> + p += ISODIGIT (p[2]) ? 2 : 1; // reason: 0377 == 0xff
Likewise.
> + continue;
> + }
> +
> + switch (*p)
> {
> - switch (p[1])
> + case '\\':
> + case 'n':
> + case 't':
> + case '"':
> + break;
> + case 'x':
> + ++p;
> + if (!isxdigit (p[0]))
> {
> - case '\\':
> - case 'n':
> - case 't':
> - case '"':
> - string_length++;
> - break;
> - default:
> PKL_ERROR (PKL_AST_LOC (string),
> - "invalid \\%c sequence in string", p[1]);
> + _ ("\\x used with no following hex digits"));
> PKL_TRANS_PAYLOAD->errors++;
> PKL_PASS_ERROR;
> }
> - p++;
> - found_backslash = true;
> + if (isxdigit (p[1]))
> + ++p;
> + break;
> + default:
> + PKL_ERROR (PKL_AST_LOC (string),
> + _ ("invalid \\%c sequence in string"), *p);
> + PKL_TRANS_PAYLOAD->errors++;
> + PKL_PASS_ERROR;
> }
> - else
> - string_length++;
> }
>
> if (!found_backslash)
> @@ -342,24 +365,77 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_string)
>
> for (p = string_pointer, i = 0; *p != '\0'; ++p, ++i)
> {
> - if (p[0] == '\\')
> + if (*p != '\\')
> + {
> + new_string_pointer[i] = *p;
> + continue;
> + }
> + ++p;
> +
> + // octal escape sequence
Likewise.
> + if (ISODIGIT (p[0]))
> {
> - switch (p[1])
> + unsigned int num = p[0] - '0';
> +
> + if (ISODIGIT (p[1]))
> {
> - case '\\': new_string_pointer[i] = '\\'; break;
> - case 'n': new_string_pointer[i] = '\n'; break;
> - case 't': new_string_pointer[i] = '\t'; break;
> - case '"': new_string_pointer[i] = '"'; break;
> - default:
> - assert (0);
> + ++p;
> + num = (num << 3) | (p[0] - '0');
> + if (ISODIGIT (p[1]))
> + {
> + ++p;
> + num = (num << 3) | (p[0] - '0');
> + }
> }
> - p++;
> + if (num == '\0')
> + {
> + PKL_ERROR (PKL_AST_LOC (string),
> + _ ("string literal cannot contain NULL character"));
> + PKL_TRANS_PAYLOAD->errors++;
> + PKL_PASS_ERROR;
> + }
> + else if (num > 255)
> + {
> + PKL_ERROR (PKL_AST_LOC (string),
> + _ ("octal escape sequence out of range"));
> + PKL_TRANS_PAYLOAD->errors++;
> + PKL_PASS_ERROR;
> + }
> + new_string_pointer[i] = num;
> + continue;
> + }
> +
> + switch (*p)
> + {
> + case '\\': new_string_pointer[i] = '\\'; break;
> + case 'n': new_string_pointer[i] = '\n'; break;
> + case 't': new_string_pointer[i] = '\t'; break;
> + case '"': new_string_pointer[i] = '"'; break;
> + case 'x':
> + ++p;
> + new_string_pointer[i] = XDIGIT (p[0]);
> + if (isxdigit (p[1]))
> + {
> + new_string_pointer[i] = (XDIGIT (p[0]) << 4) | XDIGIT (p[1]);
> + ++p;
> + }
> + if (new_string_pointer[i] == '\0')
> + {
> + PKL_ERROR (PKL_AST_LOC (string),
> + _ ("string literal cannot contain NULL character"));
> + PKL_TRANS_PAYLOAD->errors++;
> + PKL_PASS_ERROR;
> + }
> + break;
> + default:
> + assert (0);
> }
> - else
> - new_string_pointer[i] = p[0];
> }
> new_string_pointer[i] = '\0';
>
> +#undef ISODIGIT
> +#undef XDIGIT
> +
> free (string_pointer);
> PKL_AST_STRING_POINTER (string) = new_string_pointer;
> }
> diff --git a/testsuite/Makefile.am b/testsuite/Makefile.am
> index 5ba9900a..7b0a7c7c 100644
> --- a/testsuite/Makefile.am
> +++ b/testsuite/Makefile.am
> @@ -1370,6 +1370,12 @@ EXTRA_DIST = \
> poke.pkl/strings-2.pk \
> poke.pkl/strings-3.pk \
> poke.pkl/strings-4.pk \
> + poke.pkl/strings-esc-1.pk \
> + poke.pkl/strings-esc-2.pk \
> + poke.pkl/strings-esc-diag-1.pk \
> + poke.pkl/strings-esc-diag-2.pk \
> + poke.pkl/strings-esc-diag-3.pk \
> + poke.pkl/strings-esc-diag-4.pk \
> poke.pkl/strings-index-1.pk \
> poke.pkl/strings-index-2.pk \
> poke.pkl/strings-index-diag-1.pk \
> diff --git a/testsuite/poke.pkl/string-diag-1.pk
> b/testsuite/poke.pkl/string-diag-1.pk
> index 877bf6ad..1daf5c4e 100644
> --- a/testsuite/poke.pkl/string-diag-1.pk
> +++ b/testsuite/poke.pkl/string-diag-1.pk
> @@ -2,4 +2,4 @@
>
> /* Invalid \-sequence in string. */
>
> -defvar s = "foo\xbar"; /* { dg-error "" } */
> +defvar s = "foo\zbar"; /* { dg-error "" } */
> diff --git a/testsuite/poke.pkl/strings-esc-1.pk
> b/testsuite/poke.pkl/strings-esc-1.pk
> new file mode 100644
> index 00000000..35dd6a11
> --- /dev/null
> +++ b/testsuite/poke.pkl/strings-esc-1.pk
> @@ -0,0 +1,25 @@
> +/* { dg-do run } */
> +
> +/* Octal escape sequence */
> +
> +defvar s = "\1\12\1234;\12==\n";
> +
> +/* { dg-command { .set obase 8 } } */
> +
> +/* { dg-command { s'length } } */
> +/* { dg-output "0o11UL" } */
> +
> +/* { dg-command {s[0]} } */
> +/* { dg-output "\n0o1UB" } */
> +
> +/* { dg-command {s[1]} } */
> +/* { dg-output "\n0o12UB" } */
> +
> +/* { dg-command {s[2]} } */
> +/* { dg-output "\n0o123UB" } */
> +
> +/* { dg-command {s[3]} } */
> +/* { dg-output "\n0o64UB" } */
> +
> +/* { dg-command {s[4:]} } */
> +/* { dg-output "\n\";\\\\n==\\\\n\"" } */
> diff --git a/testsuite/poke.pkl/strings-esc-2.pk
> b/testsuite/poke.pkl/strings-esc-2.pk
> new file mode 100644
> index 00000000..35696ea0
> --- /dev/null
> +++ b/testsuite/poke.pkl/strings-esc-2.pk
> @@ -0,0 +1,25 @@
> +/* { dg-do run } */
> +
> +/* Hexadecimal escape sequence */
> +
> +defvar s = "He\x1llo \x12\x123, World!";
> +
> +/* { dg-command {.set obase 16} } */
> +
> +/* { dg-command {s'length} } */
> +/* { dg-output "0x12UL" } */
> +
> +/* { dg-command {s[2]} } */
> +/* { dg-output "\n0x1UB" } */
> +
> +/* { dg-command {s[7]} } */
> +/* { dg-output "\n0x12UB" } */
> +
> +/* { dg-command {s[8]} } */
> +/* { dg-output "\n0x12UB" } */
> +
> +/* { dg-command {s[9]} } */
> +/* { dg-output "\n0x33UB" } */
> +
> +/* { dg-command {s[0:1] + s[3:5] + s[10:]} } */
> +/* { dg-output "\n\"Hello, World!\"" } */
> diff --git a/testsuite/poke.pkl/strings-esc-diag-1.pk
> b/testsuite/poke.pkl/strings-esc-diag-1.pk
> new file mode 100644
> index 00000000..c51cbccd
> --- /dev/null
> +++ b/testsuite/poke.pkl/strings-esc-diag-1.pk
> @@ -0,0 +1,3 @@
> +/* { dg-do compile } */
> +
> +defvar s = "a\0b"; /* { dg-error "" } */
> diff --git a/testsuite/poke.pkl/strings-esc-diag-2.pk
> b/testsuite/poke.pkl/strings-esc-diag-2.pk
> new file mode 100644
> index 00000000..f4934e36
> --- /dev/null
> +++ b/testsuite/poke.pkl/strings-esc-diag-2.pk
> @@ -0,0 +1,3 @@
> +/* { dg-do compile } */
> +
> +defvar s = "x\x0z"; /* { dg-error "" } */
> diff --git a/testsuite/poke.pkl/strings-esc-diag-3.pk
> b/testsuite/poke.pkl/strings-esc-diag-3.pk
> new file mode 100644
> index 00000000..96f5a513
> --- /dev/null
> +++ b/testsuite/poke.pkl/strings-esc-diag-3.pk
> @@ -0,0 +1,5 @@
> +/* { dg-do compile } */
> +
> +/* Invalid \-sequence */
> +
> +defvar s = "a\8b"; /* { dg-error "" } */
> diff --git a/testsuite/poke.pkl/strings-esc-diag-4.pk
> b/testsuite/poke.pkl/strings-esc-diag-4.pk
> new file mode 100644
> index 00000000..241f301e
> --- /dev/null
> +++ b/testsuite/poke.pkl/strings-esc-diag-4.pk
> @@ -0,0 +1,5 @@
> +/* { dg-do compile } */
> +
> +/* Invalid octal \-sequence */
> +
> +defvar s = "\400"; /* { dg-error "" } */