[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: character ranges in regular expressions
From: |
Bruno Haible |
Subject: |
Re: character ranges in regular expressions |
Date: |
Thu, 23 Sep 2010 23:55:25 +0200 |
User-agent: |
KMail/1.9.9 |
Paolo,
> Bruno, ... Can you shed light on what __collseq_table_lookup is supposed
> to mean?
It is a runtime lookup function into a table that maps Unicode characters to
uint32_t values. For a 'char' value, the most efficient way to implement
a mapping from 'char' to uint32_t is through an array: uint32_t[UCHAR_MAX+1].
For a 'wchar_t' value whose width is up to 21 bits, the data structure we
use in glibc (and also in gnulib / libunistring) is a 3-level lookup table.
See the file locale/programs/3level.h for details.
In regcomp.c and regexec.c the _NL_COLLATE_COLLSEQWC field of the LC_COLLATE
part of the locale is encoded in this way. In glibc/locale/programs/ld-collate.c
this field is being constructed from a table called 'collate->wcseqorder'.
The role of this table is to be used in regular expression matching and
wildcard matching. The table is derived from (but does not represent the
entire information from) the LC_COLLATE portion of the locale input file.
Bruno
- [PATCH 0/2] process range expressions consistently with system regex, Paolo Bonzini, 2010/09/21
- [PATCH 1/2] dfa: process range expressions consistently with system regex, Paolo Bonzini, 2010/09/21
- [PATCH 2/2] tests: add testcase for previous fix, Paolo Bonzini, 2010/09/21
- Re: [PATCH 2/2] tests: add testcase for previous fix, Jim Meyering, 2010/09/23
- Re: [PATCH 2/2] tests: add testcase for previous fix, Paolo Bonzini, 2010/09/23
- Re: [PATCH 2/2] tests: add testcase for previous fix, Jim Meyering, 2010/09/23
- Re: [PATCH 2/2] tests: add testcase for previous fix, Paul Eggert, 2010/09/23
- Re: [PATCH 2/2] tests: add testcase for previous fix, Paolo Bonzini, 2010/09/23
- Re: character ranges in regular expressions,
Bruno Haible <=
- Re: character ranges in regular expressions, Paolo Bonzini, 2010/09/24
- Re: character ranges in regular expressions, Bruno Haible, 2010/09/24
- Re: character ranges in regular expressions, Paolo Bonzini, 2010/09/24
- Re: character ranges in regular expressions, Bruno Haible, 2010/09/24
- Re: character ranges in regular expressions, Paul Eggert, 2010/09/24
- Re: character ranges in regular expressions, Eric Blake, 2010/09/24