[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[silpa-discuss] Modified spellchecker
From: |
Vasudev Kamath |
Subject: |
[silpa-discuss] Modified spellchecker |
Date: |
Wed, 5 May 2010 22:38:30 +0530 |
User-agent: |
KMail/1.12.4 (Linux/2.6.33-3.slh.4-sidux-686; KDE/4.3.4; i686; ; ) |
Hi,
PFA the diff patch file for spellchecker.py for spellchecker module. This patch
holds the logic change to integrate the indexing logic. I'm also attaching
indexer script which should be run first to generate index file for each
dictionary. Please note you need to place the indexer script inside
spellchecker module as of now to work properly. In coming days i'll modify it
to make it work independent of its location. Also note that indexer script
fails for mr_IN.dic i'm still not sure of the reason, here is the trace output
and output from the file command for mr_IN.dic
Traceback (most recent call last):
File "indexer.py", line 129, in <module>
index.createIndex("mr_IN.dic")
File "indexer.py", line 60, in createIndex
item = self.fp.readline()
File "/usr/lib/python2.5/codecs.py", line 622, in readline
return self.reader.readline(size)
File "/usr/lib/python2.5/codecs.py", line 477, in readline
data = self.read(readsize, firstline=True)
File "/usr/lib/python2.5/codecs.py", line 424, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid
data
dicts/mr_IN.dic: UTF-8 Unicode text, with CRLF, LF line terminators
One more thing is we need to convert the english dictionary encoding to UTF-8
which is currently ISO-8859-1 and hence causes data loss while reading. If its
ok i'll convert the encoding and commit the dictionary to repo.
Note about the performance improvement. As I noticed the new version of
spellchecker works pretty faster than the existing code. At the first time
there will be slight delay (may be because of loading the index from the file)
I tested the performance by running new code as standalone and existing silpa
over apache. Please verify this.
Thanks and Regards
Vasudev Kamath
spellchecker.diff
Description: Text Data
indexer.py
Description: Text Data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [silpa-discuss] Modified spellchecker,
Vasudev Kamath <=