[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Extending the ecomplete.el data store.

From: Karl Fogel
Subject: Re: Extending the ecomplete.el data store.
Date: Tue, 06 Feb 2018 14:17:33 -0600
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

Lars Ingebrigtsen <address@hidden> writes:
>> * Mailaprop remembers all the real-name variations and case variations
>> individually, including case variations in the email address portion
>> as well as in the real name portion.  So each variation gets its own
>> record, but they're all tied together under the same case-folded KEY
>> so they can be scored together.  (Contrast with ecomplete, where I
>> believe `ecomplete-add-item' just remembers the most recently-seen
>> variant for a given key.)
>Yes, I see the advantages of storing all the variations (it gives us a
>larger search space).
>However, I've found that in practice the simple "store the last
>variation" thing works surprisingly well.  But the disadvantage is that
>you basically lose the completion if the last variation is degenerate,
>like if you'd written "From: HAHAHA <address@hidden>", then my
>Message/icomplete wouldn't be able to complete on "Karl" (which is what
>you'd get normally).
>On the other hand, if you store all variations, then HAHAHA will forever
>be an available completion, too, which also has disadvantages.

That's where creative scoring comes in.  For example, mailaprop handles that 
case by inspecting the variants and simply assigning higher scores to the 
better ones.  It has an idea of what "better" means: "Lars Ingebrigtsen 
<address@hidden>" is better than "L. Ingebrigtsen <address@hidden>", according 
to mailaprop.

>So: Either complete historical completion, or uncomplete, but pretty
>up-to-date completion.

I don't think that's the choice we face.  Rather, the choice is: have enough 
information to make interesting decisions, or not have enough information :-).

I think you're conflating the storage format with the in-session UI behavior.  
Ecomplete can continue to throw away all but the most recent variant, if it 
wishes.  Other programs can have use all of the data and run it through super 
fancy machine-learning convoluted neural network AI bots working in tandem with 
a crowdsourced social media strategy that leverages the power of decentralized 
blockchain advertising affiliate networks to determine what completions they're 
going to offer.

But for programs to have this choice, the storage format must hold all the data 
that seems obviously relevant (and be extensible, in case somebody thinks of 
something later).  Then it's up to the programs to decide what subset of that 
data they want to use.  They don't have to use all of it.

>If you have too much to complete on, you just end up with noise.

Not really, because scoring allows one to put the right completions near the 
top.  I rely on this every day now: for the vast majority of recipient 
addresses, I only have to type one or two letters and hit Return, because the 
choice I wanted is also the one that's scored highest.  Very occasionally I 
have to type a longer substring -- and in those cases, being able to type just, 
say, "lars ing RET" and have the Right Thing happen is a lovely user experience.

>> I guess we would also switch to UTF-8 for the coding system for the
>> database?  (Right now `ecomplete-database-file-coding-system' defaults
>> to `iso-2022-7bit'.)
>The latter can store more than the former, but UTF-8 is fine by me.

Thanks.  I didn't know that; until now, I didn't realize what ISO-2022 actually 
is [1].  I tend to lean UTF-8 because it's a widely-supported standard, e.g., 
if someone brings up their database file in a buffer or pages through it with a 
command-line pager, it'll usually be readable in both cases.  

Best regards,

[1] Just looked at 
https://en.wikipedia.org/wiki/ISO/IEC_2022#Comparison_with_other_encodings now.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]