emacs-elpa-diffs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[elpa] externals/trie b6ba36b 002/111: Minor improvements to trie-comple


From: Stefan Monnier
Subject: [elpa] externals/trie b6ba36b 002/111: Minor improvements to trie-complete[-ordered]
Date: Mon, 14 Dec 2020 11:35:08 -0500 (EST)

branch: externals/trie
commit b6ba36b2e199e0569eb7bdbbb3b7a21d3470716d
Author: Toby Cubitt <toby-predictive@dr-qubit.org>
Commit: tsc25 <toby-predictive@dr-qubit.org>

    Minor improvements to trie-complete[-ordered]
---
 trie.el | 228 +++++++++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 168 insertions(+), 60 deletions(-)

diff --git a/trie.el b/trie.el
index 2e59811..1a8e941 100644
--- a/trie.el
+++ b/trie.el
@@ -30,27 +30,88 @@
 
 ;;; Commentary:
 ;;
-;; A ternary search tree associates data with keys. The keys can be any
-;; ordered sequence of elements: vector, list or string. It stores them
-;; in such a way that both storage size and data lookup are reasonably
-;; space- and time-efficient, respectively. But more importantly,
-;; returning all strings with a given prefix in alphabetical or any other
-;; sort-order is also time-efficient.
+;; A trie stores keys that are ordered sequences of elements (vectors,
+;; lists or strings in Elisp), in such a way that both storage and
+;; retrieval are reasonably space- and time-efficient. But, more
+;; importantly, searching for keys that match various patterns can also
+;; be done efficiently. For example, returning all strings with a given
+;; prefix, and sorting them in an arbitrary order. Or searching for keys
+;; matching a pattern containing wildcards (not yet implemented in this
+;; package).
 ;;
-;; Internally, a ternary search tree consists of two elements, the root
-;; node and the comparison function. The latter must take two arguments
-;; of the same type as individual elements of a key, and must return nil if
-;; the first is smaller than the second.
+;; Note that there are two uses for a ternary search tree: as a lookup
+;; table, in which case only presence of absence of a key is significant,
+;; or as an associative array, in which case keys are associated with
+;; data. Other similar data types often only implement lookup tables,
+;; leaving it up to you to implement an associative array on top of this
+;; (by storing key+data pairs in the data structure's keys, then defining
+;; a comparison function that only compares the key part). However, this
+;; would be somewhat space-inefficient in a trie, so this package does
+;; the opposite: it implements associative arrays, and leaves it up to
+;; you to use them as lookup tables if you so desire.
+;;
+;; There are numerous ways to implement trie data structures internally,
+;; each with advantages and disadvantages. By viewing a trie as a tree
+;; whose nodes are themselves lookup tables, this package is able to
+;; support all types of trie, providing there exists (or you write!) an
+;; Elisp implementation of the corresponding type of lookup table. The
+;; best implementation will depend on what trade-offs are appropriate for
+;; your particular application. The following gives an overview of the
+;; advantages and disadvantages of various types of trie. (Not all of the
+;; underlying lookup tables have been implemented in Elisp yet, so using
+;; some of them would require writing the missing Elisp package!)
+;;
+;; One of the most effective all-round implementations of a trie is a
+;; ternary search tree, which can be viewed as a tree of binary trees. If
+;; simple binary search trees are used for the nodes of the trie, we get
+;; a basic ternary search tree. If self-balancing binary trees are used
+;; (e.g. AVL or red-black trees), we get a self-balancing ternary search
+;; tree. If splay trees are used, we get yet another self-organising
+;; variant of a ternary search tree. All ternary search trees have in
+;; common reasonably good space-efficiency. The time-efficiency for trie
+;; operations is also reasonably good, providing the underlying binary
+;; trees are balanced. Assuming the binary trees are balanced, all
+;; variants of ternary search trees described below have the same
+;; time-complexity for all trie operations.
+;;
+;; Self-balancing trees ensure this is the case, with the usual
+;; trade-offs between different the types of self-balancing binary tree:
+;; AVL trees are slightly more efficient for lookup operations than
+;; red-black trees, but are slightly less efficienct for insertion
+;; operations, and less efficient for deletion operations, as compared to
+;; red-black trees. Splay trees give good average-case complexity and are
+;; simpler to implement than AVL or red-black trees (which can mean
+;; faster in practice!), at the expense of poor worst-case complexity.
+;;
+;; If your tries are going to be static (i.e. created once and rarely or
+;; never changed), then using perfectly balanced binary search trees
+;; might be more appropriate. Perfectly balancing the binary trees is
+;; very inefficient, but it only has to be done once after the trie is
+;; created, or on the rare occarions that it is modified. Lookup
+;; operations will then be as efficient as possible for ternary search
+;; trees.
+;;
+;; On the other hand, adding data to a binary search tree in a random
+;; order usually results in a reasonably balanced tree. If this is the
+;; likely scenario, using a simple binary tree will likely be quite
+;; efficient, and, being simpler to implement, could be faster overall.
+;;
+;; A digital trie is a different implementation of a trie, which can be
+;; viewed as a tree of arrays, and has different space- and
+;; time-complexity than a ternary search tree. Essentially, a digital
+;; trie has worse space-complexity, but better time-complexity. Using
+;; hash tables instead of arrays for the nodes gives something similar to
+;; a digital trie, potentially with better space-complexity and the same
+;; time-complexity most of the time, but at the expense of occasional
+;; significant inefficiency when inserting and deleting (whenever the
+;; hash table has to be resized). Indeed, an array can be viewed as a
+;; perfect hash table, but as such it requires the number of possible
+;; values to be known in advance.
+;;
+;; Finally, if you really need optimal efficiency from your trie, you
+;; could even write a custom lookup table optimised for your specific
+;; needs.
 ;;
-;; Note that there are two uses for a ternary search tree: as a
-;; "dictionary", in which case only presence of absence of a key is
-;; significant, or as an associative array, in which case keys are
-;; associated with data. Other similar data types often only implement
-;; the first of these, leaving it up to you to implement an associative
-;; on top of this. However, this would be somewhat space-inefficient in a
-;; ternary search tree, so this package does the opposite: it implements
-;; associative arrays, and leaves it up to you to use them as
-;; dictionaries if you so desire.
 ;;
 ;; You create a ternary search tree using `trie-create', create an
 ;; association using `trie-insert', retrieve an association using
@@ -271,13 +332,12 @@ If START or END is negative, it counts from the end."
 
 
 (defun trie--mapc (trie--mapc--function trie--mapc--mapfun
-                                           trie--root seq
-                                           &optional reverse)
+                                       trie--root seq
+                                       &optional reverse)
   ;; Apply FUNCTION to all elements in a trie beneath ROOT, which should
   ;; correspond to the sequence SEQ. TRIE-FUNCTION is passed two arguments:
   ;; the trie node itself and the sequence it corresponds to. It is applied
-  ;; in ascending order, or descending order if REVERSE is non-nil. ARGS are
-  ;; passed as additional arguments to TRIE-FUNCTION
+  ;; in ascending order, or descending order if REVERSE is non-nil.
   (funcall
    trie--mapc--mapfun
    (lambda (node)
@@ -443,13 +503,13 @@ Returns the new association of KEY."
                          (trie--node-create (elt key i) tree)
                          keep-old)))
     ;; If UPDATEFUN was supplied, wrap it for passing to `trie--enterfun'.
-    (if updatefun
-       (setq update-old
-             (lambda (a b)
-               (setf (trie--node-data b)
-                     (funcall updatefun
-                              (trie--node-data a)
-                              (trie--node-data b))))))
+    (when updatefun
+      (setq update-old
+           (lambda (a b)
+             (setf (trie--node-data b)
+                   (funcall updatefun
+                            (trie--node-data a)
+                            (trie--node-data b))))))
     ;; Create or update data node.
     (setq node (funcall (trie--insertfun tree)
                        (trie--node-subtree node)
@@ -639,10 +699,10 @@ also `trie-member-p', which does this for you.)"
 
 
 (defun trie-complete (tree key &optional maxnum filter)
-  "Return an alist containing all completions of KEY
-in ternary searh tree TREE along with their associated data, in
-\"lexical\" order (i.e. the order defined by the tree's
-comparison function). If no completions are found, return nil.
+  "Return an alist containing all completions of KEY in tree TREE
+along with their associated data, in \"lexical\" order (i.e. the
+order defined by the tree's comparison function). If no
+completions are found, return nil.
 
 KEY must be a sequence (vector, list or string) containing
 elements of the type used to reference data in the tree. (If KEY
@@ -660,36 +720,62 @@ completion with two arguments: the completion, and its 
associated
 data. If the filter function returns nil, the completion is not
 included in the results, and does not count towards MAXNUM."
 
-  (let* ((node (trie--node-find tree key))
-        (trie--stack (make-vector 1 nil))
-        (accumulator
-         (lambda (node seq)
-           (let ((data (trie--node-data node)))
-             (when (or (null filter) (funcall filter seq data))
+  (let ((node (trie--node-find tree key))
+       (trie--stack (make-vector 1 nil))
+       accumulator)
+
+    ;; construct function to accumulate completions in stack
+    (setq accumulator
+         (cond
+          ((and filter maxnum)
+           (lambda (node seq)
+             (let ((data (trie--node-data node)))
+               (when (funcall filter seq data)
+                 (aset trie--stack 0
+                       (cons (cons seq data) (aref trie--stack 0)))
+                 (and (>= (length (aref trie--stack 0)) maxnum)
+                      (throw 'trie-complete--done nil))))))
+          ((and (not filter) maxnum)
+           (lambda (node seq)
+             (let ((data (trie--node-data node)))
                (aset trie--stack 0
                      (cons (cons seq data) (aref trie--stack 0)))
-               (and maxnum
-                    (>= (length (aref trie--stack 0)) maxnum)
-                    (throw 'trie-complete--done nil)))))))
+               (and (>= (length (aref trie--stack 0)) maxnum)
+                    (throw 'trie-complete--done nil)))))
+          ((and filter (not maxnum))
+           (lambda (node seq)
+             (let ((data (trie--node-data node)))
+               (when (funcall filter seq data)
+                 (aset trie--stack 0
+                       (cons (cons seq data) (aref trie--stack 0)))))))
+          ((and (not filter) (not maxnum))
+           (lambda (node seq)
+             (let ((data (trie--node-data node)))
+               (aset trie--stack 0
+                     (cons (cons seq data) (aref trie--stack 0))))))
+          ))
+
+    ;; accumulate completions in stack
     (when node
       (catch 'trie-complete--done
        (trie--mapc accumulator (trie--mapfun tree) node key t))
       (aref trie--stack 0))))
 
 
+
 ;; Note: We use a partial heap-sort to find the k=MAXNUM highest ranked
 ;; completions among n possibilities. This has worst-case time complexity
 ;; O(n log k), and is both simple and elegant. An optimal algorithm
 ;; (e.g. partial quick-sort where the irrelevant partition is discarded
 ;; at each step) would have complexity O(n + k log k), but is probably
 ;; not worth the extra coding effort, and would have worse space
-;; complexity. (I haven't done any benchmarking, though, so feel free to
-;; do so and let me know the results!)
+;; complexity unless coded to work "in-place". (I haven't done any
+;; benchmarking, though, so feel free to do so and let me know the
+;; results!)
 (defun trie-complete-ordered (tree key rankfun &optional maxnum filter)
   "Return an alist containing all completions of SEQUENCE found
-in ternary searh tree TREE along with their associated data, in
-the order defined by RANKFUN. If no completions are found, return
-nil.
+in tree TREE along with their associated data, in the order
+defined by RANKFUN. If no completions are found, return nil.
 
 KEY must be a sequence (vector, list or string) containing
 elements of the type used to reference data in the tree. (If KEY
@@ -711,20 +797,42 @@ completion with two arguments: the completion, and its 
associated
 data. If the filter function returns nil, the completion is not
 included in the results, and does not count towards MAXNUM."
 
-  (let* ((node (trie--node-find tree key))
-        (trie--heap (heap-create
-                       `(lambda (a b) (not (,rankfun a b)))
-                       (when maxnum (1+ maxnum))))
-        (accumulator
-         (lambda (node seq)
-           (let ((data (trie--node-data node)))
-             (when (or (null filter) (funcall filter seq data))
+  (let ((node (trie--node-find tree key))
+       (trie--heap (heap-create
+                    `(lambda (a b) (not (,rankfun a b)))
+                    (when maxnum (1+ maxnum))))
+       accumulator)
+
+    ;; construct function to accumulate completions in heap
+    (setq accumulator
+         (cond
+          ((and filter maxnum)
+           (lambda (node seq)
+             (let ((data (trie--node-data node)))
+               (when (funcall filter seq data)
+                 (heap-add trie--heap (cons seq data))
+                 (and (> (heap-size trie--heap) maxnum)
+                      (heap-delete-root trie--heap))))))
+          ((and filter (not maxnum))
+           (lambda (node seq)
+             (let ((data (trie--node-data node)))
+               (when (funcall filter seq data)
+                 (heap-add trie--heap (cons seq data))))))
+          ((and (not filter) maxnum)
+           (lambda (node seq)
+             (let ((data (trie--node-data node)))
                (heap-add trie--heap (cons seq data))
-               (and maxnum
-                    (> (heap-size trie--heap) maxnum)
-                    (heap-delete-root trie--heap)))))))
-    (when node (trie--mapc accumulator (trie--mapfun tree) node key))
+               (and (> (heap-size trie--heap) maxnum)
+                    (heap-delete-root trie--heap)))))
+          ((and (not filter) (not maxnum))
+           (lambda (node seq)
+             (let ((data (trie--node-data node)))
+               (heap-add trie--heap (cons seq data)))))))
+
+    ;; accumulate completions in heap
+    (when node (trie--mapc accumulator (trie--mapfun tr) node key))
 
+    ;; extract completions from heap
     (let (completions)
       (while (not (heap-empty trie--heap))
        (setq completions (cons (heap-delete-root trie--heap) completions)))



reply via email to

[Prev in Thread] Current Thread [Next in Thread]