[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#63225: Using char table-based finite-state machines as a replacement
From: |
Ihor Radchenko |
Subject: |
bug#63225: Using char table-based finite-state machines as a replacement for re-search-forward (was: bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c)) |
Date: |
Tue, 09 May 2023 08:36:25 +0000 |
Mattias Engdegård <mattias.engdegard@gmail.com> writes:
>> (I now start thinking that it might be more efficient to create a bunch
>> or char tables and step over them to match "regexp", just like any
>> finite automaton would)
>
> I wish elisp were fast enough that we could do that in general.
I tried the following simple implementation for searching via char
tables:
(defun yant/make-char-re (string)
"Create a regexp matching STRING using char tables.
Return entry char table. The values are either char tables or return value for
terminals. This is a dumb version not trying to account for substring cycles."
(let (root (table (make-char-table 'regexp-table)) (idx 0))
(setq root table)
(set-char-table-range root t root)
(seq-map
(lambda (char)
(let ((next-table (make-char-table 'regexp-table root)))
(aset table char next-table)
(setq table next-table)))
string)
(set-char-table-range table t t)
root))
(defun yant/search-forward (regexp-table)
"Move point after next string matching REGEXP-TABLE."
(let ((pos (point-marker)))
(while (and (< pos (point-max))
(char-table-p
(setq regexp-table
(aref regexp-table (char-after pos))))
(move-marker pos (1+ pos))))
(unless (char-table-p regexp-table) (goto-char pos))))
DEFUN ("auto-search-forward", Fauto_search_forward, Sauto_search_forward, 1, 1,
0,
doc: /* Search forward using CHAR-TABLE. (WIP) */)
(Lisp_Object table)
{
CHECK_TYPE (CHAR_TABLE_P (table), Qchar_table_p, table);
ptrdiff_t pos_byte = PT_BYTE;
ptrdiff_t pos_char = PT;
if (pos_byte < BEGV_BYTE || pos_byte >= ZV_BYTE)
return Qnil;
table = CHAR_TABLE_REF (table, FETCH_CHAR (pos_byte));
while (pos_byte < ZV_BYTE && CHAR_TABLE_P (table))
{
pos_byte += next_char_len(pos_byte);
pos_char ++;
table = CHAR_TABLE_REF (table, FETCH_CHAR (pos_byte));
}
if (pos_byte >= ZV_BYTE)
return Qnil;
else
{
SET_PT (pos_char);
return Qt;
}
}
(setq yant/test (yant/make-char-re ":ID:"))
(benchmark-run 10 (org-with-wide-buffer (goto-char 1) (while
(yant/search-forward yant/test))))
(benchmark-run 10 (org-with-wide-buffer (goto-char 1) (while
(auto-search-forward yant/test))))
(benchmark-run 10 (org-with-wide-buffer (goto-char (point-min)) (while
(re-search-forward ":ID:" nil t))))
- yant/search-forward :: (24.677463432 0 0.0)
- re-search-forward :: (0.569693718 0 0.0)
- auto-search-forward :: (1.177098129 0 0.0)
The lisp version suffers from (1) Extra redirection when checking
function symbol value; (2) moving markers, as there is no efficient way
to move point forward from Elisp; `forward-char' is even slower.
auto-search-forward is twice slower than re-search-forward, but I expect
this to change if we ask for a more complex regexp. auto-search-forward
will not care about the complexity of the finite-state machine provided.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), (continued)
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Ihor Radchenko, 2023/05/03
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Mattias Engdegård, 2023/05/04
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Ihor Radchenko, 2023/05/05
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Mattias Engdegård, 2023/05/05
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Ihor Radchenko, 2023/05/06
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Mattias Engdegård, 2023/05/07
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Ihor Radchenko, 2023/05/08
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Mattias Engdegård, 2023/05/08
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Ihor Radchenko, 2023/05/08
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Mattias Engdegård, 2023/05/08
- bug#63225: Using char table-based finite-state machines as a replacement for re-search-forward (was: bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c)),
Ihor Radchenko <=
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Ihor Radchenko, 2023/05/09
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Mattias Engdegård, 2023/05/09
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Ihor Radchenko, 2023/05/09
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Mattias Engdegård, 2023/05/09
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Mattias Engdegård, 2023/05/07
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Ihor Radchenko, 2023/05/08
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Mattias Engdegård, 2023/05/08
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Ihor Radchenko, 2023/05/08
- bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Ihor Radchenko, 2023/05/04
bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c), Po Lu, 2023/05/02