[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: incremental history i/o? (was Re: A Feature Request for History)

From: Marcel (Felix) Giannelia
Subject: Re: incremental history i/o? (was Re: A Feature Request for History)
Date: Fri, 17 Jun 2011 14:56:55 -0700
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20110601 Thunderbird/3.1.10

I wonder how much de-duping the really old history would help. It seems that HISTCONTROL='erasedups' only affects the history of the current bash process (i.e. commands that were typed since you started that shell), and it leaves all the stuff it loaded from .bash_history alone.

As a quick test, removing duplicates from a 4 MB history file reduced the number of commands in it from 125236 to 36937, so that file was about 70% duplicated data (not quite, 'cause the longer and more interesting commands mostly stayed...). Doing that to your 11 MB file might get rid of that loading delay.

Of course, de-duplicating the history destroys its role of "accurately record everything I've done", so if you also use your history for that it's not a good idea. For that latter use though, I can't think of a good reason for loading it on shell start, so maybe those roles should be split -- .bash_log and .bash_commands? The log is write-only, never clobbered, and has the equivalent of a HISTTIMEFORMAT set; the commands file is an efficiently stored hash table of unique commands, maybe with tweakable parameters for how "interesting" a command has to be to go in it (store "mount -o loop,ro,uid=1000 -t vfat /some/file /mnt/temp" but ignore "cd ~" 'cause you really don't need Ctrl+R to remember the latter).


On 16/06/11 12:55, Bradley M. Kuhn wrote:
I agree with Marcel's points about keeping a big bash history, although
I wasn't sure if discussing "why" users keep a big bash history was on
topic or not.

Marcel (Felix) Giannelia wrote at 13:16 (EDT) on Tuesday:
A .bash_history file going back years and years is still only a few
Actually, this relates to a thing I'd been looking into recently.  My
bash history is 11MB now, and on some machines I have a noticeable load
time as it reads the history.  I'd thought about adding support for
incremental read to bash history/readline code.  Basically, it would
load only the parts of the history it needed based on the history
requested.  Obviously running "history" would read it all, but if
reverse-search was requested, it could perhaps be read incrementally

Given that this would be a big change (esp. to make it seamless to
existing readline API users), and would provide a feature clearly that
isn't universally desired (ability to have really big history files),
I'm asking, albeit with some trepidation, if such a rewrite of the
history reading/writing code would likely be accepted, and if so what it
would need to look like to be an acceptable patch.

I noticed someone previously attempted to implement mmap() in the
history code, but it's #ifdef'd out (IIRC from my investigations a few
weeks ago).  I theorized that it was #ifdef'd out because implementing
mmap() didn't help anything, since the history reading code immediately
goes through the whole array of history anyway, so the file will be
immediately read in to RAM anyway the way the code currently operates,
even if you mmap() it.  In other words, just slapping mmap() in place
wouldn't work (in fact, it's seem to have been tried and abandoned);
more in-depth changes would be made.

Thoughts on this idea?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]