nano-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Nano-devel] WIP: Use caching for regex


From: Devin Hussey
Subject: [Nano-devel] WIP: Use caching for regex
Date: Tue, 30 Oct 2018 17:32:10 -0400

Sorry about the HTML from before. I send a lot of things from my phone
and apparently, the Gmail app doesn't have an option for plain text,
despite the fact that there are open feature requests dating back to
2012. I do a lot of coding in Termux on my phone.

I was trying to figure out how to make RC file parsing faster. I
initially thought it was strcmp hell, but I made a gperf and there was
a whopping 0.010 second speedup, even with the nano syntax files
included 8 times.

So, I profiled nano, and I found the real bottleneck. Nano regcomps
and regfrees aggressively (see: found_in_list). regcomp is a pretty
expensive function, and that is the real reason that parsing is slow.

Nano has a lot of syntax rules that use the same regex, so I was
thinking we should reuse the regexes whenever possible.

Here is what I have now (it is a GitHub Gist, because I am not sure
whether attaching stuff is a good idea):

    https://gist.github.com/easyaspi314/07e667636fde11216dae190f03d362c6

That is a standalone test program with a bunch of logging. Nano-isms
are defined to macros, and the code should fit right in once the
macros are removed and proto.h is included.

I use the FNV-1a (32-bit) hashing algorithm and a hash table to do the
caching, and I use reference counting to allow code to call regfree
safely.
Collisions are handled by using a doubly-linked list (I am thinking of
only a one-way list, I am not sure yet) and comparing the following
information:
 - The full 32-bit hash
 - The regex flags
 - The regex string itself

I also check if any regex special characters exist in the string. If
so, I skip compiling the regex altogether, instead opting for a
strstr/strcasestr approach.

Obviously, it isn't complete and it needs more testing (especially
nmatch and pmatch), but do you have any thoughts?

Would it be better to just not free the regexes until the program
closes instead of reference counting? I don't know what effect it
would have on the memory usage. If so, we can easily remove the
backward link and refcount because we don't need it.

Either way, I think this would be better than what we have now.

-- Devin Hussey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]