On Sun, 2012-08-12 at 22:31 +0800, rushan chen wrote:good
> Hi Mark,
> Very appreciate for your reply.
> I see you mention that it's useful to implement a larger library of
> efficient data structure, and I'm interested in that very much. I used to
> work on projects which involve complicated but very interesting data
> structures, implementing them could be challenging, but once done I feel a
> great sense of achievement.
OK, it's very good
> One such project is implementing a language model (LM) which is a core
> component of speech recognition and machine translation. I don't know if
> you heard of it before. Unfortunately, I can't cover it too detailed here,
> that would complicate things too much.
> Basically, one of the key operations LM supports is it should return a
> probability associated with any given id sequence. All id sequences are of
> the same length, and there are a mass amount of such id sequences (a
> commonly-seen LM may contain billions of them). So it's required to store
> LM in a concise way, and at the same time make the search for each id
> sequence very quickly.
Seems we don't have any prefix-tree implementation yet?
> Trie is finally chosen to be the data structure for LM (there were many
> papers discussing this issue). All id sequences with the same prefix share
> the same internal node, for example, for <1, 2, 3, 4> and <1, 2, 3, 5>,
> only one copy of <1, 2, 3> will be stored in LM, and a search for a id
> sequence is done by a sequence of binary search until the leaf is met. One
> extra thing worth mentioning is that I store the whole trie structure in a
> single large piece of memory (usually around 2 gigabytes), which makes
> it convenient to write out to disk and load into memory by simply using
> mmap, and I think it also makes the system faster than if you allocate
> memory every time it's needed.
Maybe some hero too busy to share it? ;-)
I'd like to see you make it, or I must write myself one.
IIRC, many guys here wrote their own data-structure/algorithm
Sharing makes our world better.
But, sometimes we reinvent wheels just for fun.
So just do what you want to do if it's interesting to you.
> There are some other projects I worked or working on like Spell Corrector,
> I can't say much about it.
If something force you not to share, you may rewrite it all by
friends will see your creativity, and your work be enhanced by others.
> All in all, I'm very interested in it, and I really really hope I can help.
> Looking forward to your reply. Thanks in advance.
> Have fun!
> Rushan Chen