[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[elpa] externals/trie b6ba36b 002/111: Minor improvements to trie-comple
From: |
Stefan Monnier |
Subject: |
[elpa] externals/trie b6ba36b 002/111: Minor improvements to trie-complete[-ordered] |
Date: |
Mon, 14 Dec 2020 11:35:08 -0500 (EST) |
branch: externals/trie
commit b6ba36b2e199e0569eb7bdbbb3b7a21d3470716d
Author: Toby Cubitt <toby-predictive@dr-qubit.org>
Commit: tsc25 <toby-predictive@dr-qubit.org>
Minor improvements to trie-complete[-ordered]
---
trie.el | 228 +++++++++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 168 insertions(+), 60 deletions(-)
diff --git a/trie.el b/trie.el
index 2e59811..1a8e941 100644
--- a/trie.el
+++ b/trie.el
@@ -30,27 +30,88 @@
;;; Commentary:
;;
-;; A ternary search tree associates data with keys. The keys can be any
-;; ordered sequence of elements: vector, list or string. It stores them
-;; in such a way that both storage size and data lookup are reasonably
-;; space- and time-efficient, respectively. But more importantly,
-;; returning all strings with a given prefix in alphabetical or any other
-;; sort-order is also time-efficient.
+;; A trie stores keys that are ordered sequences of elements (vectors,
+;; lists or strings in Elisp), in such a way that both storage and
+;; retrieval are reasonably space- and time-efficient. But, more
+;; importantly, searching for keys that match various patterns can also
+;; be done efficiently. For example, returning all strings with a given
+;; prefix, and sorting them in an arbitrary order. Or searching for keys
+;; matching a pattern containing wildcards (not yet implemented in this
+;; package).
;;
-;; Internally, a ternary search tree consists of two elements, the root
-;; node and the comparison function. The latter must take two arguments
-;; of the same type as individual elements of a key, and must return nil if
-;; the first is smaller than the second.
+;; Note that there are two uses for a ternary search tree: as a lookup
+;; table, in which case only presence of absence of a key is significant,
+;; or as an associative array, in which case keys are associated with
+;; data. Other similar data types often only implement lookup tables,
+;; leaving it up to you to implement an associative array on top of this
+;; (by storing key+data pairs in the data structure's keys, then defining
+;; a comparison function that only compares the key part). However, this
+;; would be somewhat space-inefficient in a trie, so this package does
+;; the opposite: it implements associative arrays, and leaves it up to
+;; you to use them as lookup tables if you so desire.
+;;
+;; There are numerous ways to implement trie data structures internally,
+;; each with advantages and disadvantages. By viewing a trie as a tree
+;; whose nodes are themselves lookup tables, this package is able to
+;; support all types of trie, providing there exists (or you write!) an
+;; Elisp implementation of the corresponding type of lookup table. The
+;; best implementation will depend on what trade-offs are appropriate for
+;; your particular application. The following gives an overview of the
+;; advantages and disadvantages of various types of trie. (Not all of the
+;; underlying lookup tables have been implemented in Elisp yet, so using
+;; some of them would require writing the missing Elisp package!)
+;;
+;; One of the most effective all-round implementations of a trie is a
+;; ternary search tree, which can be viewed as a tree of binary trees. If
+;; simple binary search trees are used for the nodes of the trie, we get
+;; a basic ternary search tree. If self-balancing binary trees are used
+;; (e.g. AVL or red-black trees), we get a self-balancing ternary search
+;; tree. If splay trees are used, we get yet another self-organising
+;; variant of a ternary search tree. All ternary search trees have in
+;; common reasonably good space-efficiency. The time-efficiency for trie
+;; operations is also reasonably good, providing the underlying binary
+;; trees are balanced. Assuming the binary trees are balanced, all
+;; variants of ternary search trees described below have the same
+;; time-complexity for all trie operations.
+;;
+;; Self-balancing trees ensure this is the case, with the usual
+;; trade-offs between different the types of self-balancing binary tree:
+;; AVL trees are slightly more efficient for lookup operations than
+;; red-black trees, but are slightly less efficienct for insertion
+;; operations, and less efficient for deletion operations, as compared to
+;; red-black trees. Splay trees give good average-case complexity and are
+;; simpler to implement than AVL or red-black trees (which can mean
+;; faster in practice!), at the expense of poor worst-case complexity.
+;;
+;; If your tries are going to be static (i.e. created once and rarely or
+;; never changed), then using perfectly balanced binary search trees
+;; might be more appropriate. Perfectly balancing the binary trees is
+;; very inefficient, but it only has to be done once after the trie is
+;; created, or on the rare occarions that it is modified. Lookup
+;; operations will then be as efficient as possible for ternary search
+;; trees.
+;;
+;; On the other hand, adding data to a binary search tree in a random
+;; order usually results in a reasonably balanced tree. If this is the
+;; likely scenario, using a simple binary tree will likely be quite
+;; efficient, and, being simpler to implement, could be faster overall.
+;;
+;; A digital trie is a different implementation of a trie, which can be
+;; viewed as a tree of arrays, and has different space- and
+;; time-complexity than a ternary search tree. Essentially, a digital
+;; trie has worse space-complexity, but better time-complexity. Using
+;; hash tables instead of arrays for the nodes gives something similar to
+;; a digital trie, potentially with better space-complexity and the same
+;; time-complexity most of the time, but at the expense of occasional
+;; significant inefficiency when inserting and deleting (whenever the
+;; hash table has to be resized). Indeed, an array can be viewed as a
+;; perfect hash table, but as such it requires the number of possible
+;; values to be known in advance.
+;;
+;; Finally, if you really need optimal efficiency from your trie, you
+;; could even write a custom lookup table optimised for your specific
+;; needs.
;;
-;; Note that there are two uses for a ternary search tree: as a
-;; "dictionary", in which case only presence of absence of a key is
-;; significant, or as an associative array, in which case keys are
-;; associated with data. Other similar data types often only implement
-;; the first of these, leaving it up to you to implement an associative
-;; on top of this. However, this would be somewhat space-inefficient in a
-;; ternary search tree, so this package does the opposite: it implements
-;; associative arrays, and leaves it up to you to use them as
-;; dictionaries if you so desire.
;;
;; You create a ternary search tree using `trie-create', create an
;; association using `trie-insert', retrieve an association using
@@ -271,13 +332,12 @@ If START or END is negative, it counts from the end."
(defun trie--mapc (trie--mapc--function trie--mapc--mapfun
- trie--root seq
- &optional reverse)
+ trie--root seq
+ &optional reverse)
;; Apply FUNCTION to all elements in a trie beneath ROOT, which should
;; correspond to the sequence SEQ. TRIE-FUNCTION is passed two arguments:
;; the trie node itself and the sequence it corresponds to. It is applied
- ;; in ascending order, or descending order if REVERSE is non-nil. ARGS are
- ;; passed as additional arguments to TRIE-FUNCTION
+ ;; in ascending order, or descending order if REVERSE is non-nil.
(funcall
trie--mapc--mapfun
(lambda (node)
@@ -443,13 +503,13 @@ Returns the new association of KEY."
(trie--node-create (elt key i) tree)
keep-old)))
;; If UPDATEFUN was supplied, wrap it for passing to `trie--enterfun'.
- (if updatefun
- (setq update-old
- (lambda (a b)
- (setf (trie--node-data b)
- (funcall updatefun
- (trie--node-data a)
- (trie--node-data b))))))
+ (when updatefun
+ (setq update-old
+ (lambda (a b)
+ (setf (trie--node-data b)
+ (funcall updatefun
+ (trie--node-data a)
+ (trie--node-data b))))))
;; Create or update data node.
(setq node (funcall (trie--insertfun tree)
(trie--node-subtree node)
@@ -639,10 +699,10 @@ also `trie-member-p', which does this for you.)"
(defun trie-complete (tree key &optional maxnum filter)
- "Return an alist containing all completions of KEY
-in ternary searh tree TREE along with their associated data, in
-\"lexical\" order (i.e. the order defined by the tree's
-comparison function). If no completions are found, return nil.
+ "Return an alist containing all completions of KEY in tree TREE
+along with their associated data, in \"lexical\" order (i.e. the
+order defined by the tree's comparison function). If no
+completions are found, return nil.
KEY must be a sequence (vector, list or string) containing
elements of the type used to reference data in the tree. (If KEY
@@ -660,36 +720,62 @@ completion with two arguments: the completion, and its
associated
data. If the filter function returns nil, the completion is not
included in the results, and does not count towards MAXNUM."
- (let* ((node (trie--node-find tree key))
- (trie--stack (make-vector 1 nil))
- (accumulator
- (lambda (node seq)
- (let ((data (trie--node-data node)))
- (when (or (null filter) (funcall filter seq data))
+ (let ((node (trie--node-find tree key))
+ (trie--stack (make-vector 1 nil))
+ accumulator)
+
+ ;; construct function to accumulate completions in stack
+ (setq accumulator
+ (cond
+ ((and filter maxnum)
+ (lambda (node seq)
+ (let ((data (trie--node-data node)))
+ (when (funcall filter seq data)
+ (aset trie--stack 0
+ (cons (cons seq data) (aref trie--stack 0)))
+ (and (>= (length (aref trie--stack 0)) maxnum)
+ (throw 'trie-complete--done nil))))))
+ ((and (not filter) maxnum)
+ (lambda (node seq)
+ (let ((data (trie--node-data node)))
(aset trie--stack 0
(cons (cons seq data) (aref trie--stack 0)))
- (and maxnum
- (>= (length (aref trie--stack 0)) maxnum)
- (throw 'trie-complete--done nil)))))))
+ (and (>= (length (aref trie--stack 0)) maxnum)
+ (throw 'trie-complete--done nil)))))
+ ((and filter (not maxnum))
+ (lambda (node seq)
+ (let ((data (trie--node-data node)))
+ (when (funcall filter seq data)
+ (aset trie--stack 0
+ (cons (cons seq data) (aref trie--stack 0)))))))
+ ((and (not filter) (not maxnum))
+ (lambda (node seq)
+ (let ((data (trie--node-data node)))
+ (aset trie--stack 0
+ (cons (cons seq data) (aref trie--stack 0))))))
+ ))
+
+ ;; accumulate completions in stack
(when node
(catch 'trie-complete--done
(trie--mapc accumulator (trie--mapfun tree) node key t))
(aref trie--stack 0))))
+
;; Note: We use a partial heap-sort to find the k=MAXNUM highest ranked
;; completions among n possibilities. This has worst-case time complexity
;; O(n log k), and is both simple and elegant. An optimal algorithm
;; (e.g. partial quick-sort where the irrelevant partition is discarded
;; at each step) would have complexity O(n + k log k), but is probably
;; not worth the extra coding effort, and would have worse space
-;; complexity. (I haven't done any benchmarking, though, so feel free to
-;; do so and let me know the results!)
+;; complexity unless coded to work "in-place". (I haven't done any
+;; benchmarking, though, so feel free to do so and let me know the
+;; results!)
(defun trie-complete-ordered (tree key rankfun &optional maxnum filter)
"Return an alist containing all completions of SEQUENCE found
-in ternary searh tree TREE along with their associated data, in
-the order defined by RANKFUN. If no completions are found, return
-nil.
+in tree TREE along with their associated data, in the order
+defined by RANKFUN. If no completions are found, return nil.
KEY must be a sequence (vector, list or string) containing
elements of the type used to reference data in the tree. (If KEY
@@ -711,20 +797,42 @@ completion with two arguments: the completion, and its
associated
data. If the filter function returns nil, the completion is not
included in the results, and does not count towards MAXNUM."
- (let* ((node (trie--node-find tree key))
- (trie--heap (heap-create
- `(lambda (a b) (not (,rankfun a b)))
- (when maxnum (1+ maxnum))))
- (accumulator
- (lambda (node seq)
- (let ((data (trie--node-data node)))
- (when (or (null filter) (funcall filter seq data))
+ (let ((node (trie--node-find tree key))
+ (trie--heap (heap-create
+ `(lambda (a b) (not (,rankfun a b)))
+ (when maxnum (1+ maxnum))))
+ accumulator)
+
+ ;; construct function to accumulate completions in heap
+ (setq accumulator
+ (cond
+ ((and filter maxnum)
+ (lambda (node seq)
+ (let ((data (trie--node-data node)))
+ (when (funcall filter seq data)
+ (heap-add trie--heap (cons seq data))
+ (and (> (heap-size trie--heap) maxnum)
+ (heap-delete-root trie--heap))))))
+ ((and filter (not maxnum))
+ (lambda (node seq)
+ (let ((data (trie--node-data node)))
+ (when (funcall filter seq data)
+ (heap-add trie--heap (cons seq data))))))
+ ((and (not filter) maxnum)
+ (lambda (node seq)
+ (let ((data (trie--node-data node)))
(heap-add trie--heap (cons seq data))
- (and maxnum
- (> (heap-size trie--heap) maxnum)
- (heap-delete-root trie--heap)))))))
- (when node (trie--mapc accumulator (trie--mapfun tree) node key))
+ (and (> (heap-size trie--heap) maxnum)
+ (heap-delete-root trie--heap)))))
+ ((and (not filter) (not maxnum))
+ (lambda (node seq)
+ (let ((data (trie--node-data node)))
+ (heap-add trie--heap (cons seq data)))))))
+
+ ;; accumulate completions in heap
+ (when node (trie--mapc accumulator (trie--mapfun tr) node key))
+ ;; extract completions from heap
(let (completions)
(while (not (heap-empty trie--heap))
(setq completions (cons (heap-delete-root trie--heap) completions)))
- [elpa] externals/trie 15b4de9 018/111: Simplified trie--create by storing functions for predefined trie types in symbol property lists, (continued)
- [elpa] externals/trie 15b4de9 018/111: Simplified trie--create by storing functions for predefined trie types in symbol property lists, Stefan Monnier, 2020/12/14
- [elpa] externals/trie fb1d096 034/111: Changed trie-wildcard-match to return grouping data if pattern matches and contains groups, Stefan Monnier, 2020/12/14
- [elpa] externals/trie 4ff2b48 057/111: Minor typo-fixes in docstrings., Stefan Monnier, 2020/12/14
- [elpa] externals/trie a8bc50f 041/111: Minor code tidying, Stefan Monnier, 2020/12/14
- [elpa] externals/trie 0d81a80 066/111: Remove dependency on Emacs version, since this is currently broken in ELPA., Stefan Monnier, 2020/12/14
- [elpa] externals/trie 673ca83 013/111: Don't compile wrapped functions explicitly, Stefan Monnier, 2020/12/14
- [elpa] externals/trie db78411 107/111: Switch to keyword arguments for trie/dictree query functions., Stefan Monnier, 2020/12/14
- [elpa] externals/trie f676ea0 091/111: Fix off-by-1 bug in Lewenstein distance queries., Stefan Monnier, 2020/12/14
- [elpa] externals/trie eef13c4 079/111: Document fuzzy matching functions and bump version number., Stefan Monnier, 2020/12/14
- [elpa] externals/trie 77aafc9 108/111: Fix byte-compilation errors and warnings., Stefan Monnier, 2020/12/14
- [elpa] externals/trie b6ba36b 002/111: Minor improvements to trie-complete[-ordered],
Stefan Monnier <=
- [elpa] externals/trie 503b286 004/111: Make bare avl trees which don't store cmpfun with tree, Stefan Monnier, 2020/12/14
- [elpa] externals/trie 0162b74 003/111: Added trie-stacks implementation., Stefan Monnier, 2020/12/14
- [elpa] externals/trie 45569c2 007/111: Added optional TEST function to trie-delete, Stefan Monnier, 2020/12/14
- [elpa] externals/trie 45accae 019/111: Bug-fix in trie--do-delete, Stefan Monnier, 2020/12/14
- [elpa] externals/trie 4f11b37 022/111: Docstring, change log, and version number updates, Stefan Monnier, 2020/12/14
- [elpa] externals/trie 510844e 035/111: trivial variable name change, Stefan Monnier, 2020/12/14
- [elpa] externals/trie 4b24754 008/111: Converted function wrapping macros into functions, Stefan Monnier, 2020/12/14
- [elpa] externals/trie a17e6df 056/111: Minor bug-fixes to [trie/dict-tree]--edebug-pretty-print, Stefan Monnier, 2020/12/14
- [elpa] externals/trie 3b61c64 065/111: More minor whitespace and commentary changes., Stefan Monnier, 2020/12/14
- [elpa] externals/trie 19e6dbe 010/111: Make weird variable names used to avoid dynamic scoping bugs more consistent, Stefan Monnier, 2020/12/14