[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-apl] Spell corrector - APL
From: |
Ala'a Mohammad |
Subject: |
Re: [Bug-apl] Spell corrector - APL |
Date: |
Sat, 10 Sep 2016 12:02:02 +0400 |
Thanks to all for the input,
Replacing Find and Each OR with Match helped, now I'm parsing a 159K
(~1545 lines) text file (a sample chunk from the big.txt).
The strange thing for me that I'm trying to understand is that the APL
process (when fed the 159K text file) start allocating memory until it
reaches 2.7GiB, then after printing the result settle down to 50MiB.
Why do I need 2.7GiB? is there any memory utils (i.e. Garbage
collection utility) which can be used to mitigate this issue?
Here is the updated code:
a ← 'abcdefghijklmnopqrstuvwxyz'
A ← 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
downcase ← { (a,⎕AV)[(A,⎕AV)⍳⍵] }
nl ← ⎕UCS 13 ◊ cr ← ⎕UCS 10 ◊ tab ← ⎕UCS 9
nonalpha ← nl, cr, tab, ' 0123456789()[]!?%$,.:;/+*=<>-_#"`~@&'
alphamask ← { ~ ⍵ ∊ nonalpha }
words ← { (alphamask ⍵) ⊂ downcase ⍵ }
hist ← { (⍪∪⍵),+/(∪⍵)∘.≡⍵ } ⍝ as suggested by Kacper
desc ← {⍵[⍒⍵[;2];]}
ftxt ← { ⎕FIO[26] ⍵ }
fhist ← { hist words ftxt ⍵ }
file ← '/misc/llaa' ⍝ llaa contains 1546 text lines
⎕ ← ⍴w ← words ftxt file
⎕ ← ⍴u ← ∪w
desc 39 2 ⍴ fhist file
And here is a sample run
: apl -s -f fhist.apl
30186
4155
the 1560
to 804
of 781
in 493
for 219
be 173
holmes 164
your 132
this 114
all 99
by 97
are 97
or 73
other 56
over 51
our 48
should 47
before 43
sherlock 39
any 35
sir 26
sure 13
country 9
project 6
gutenberg 6
ebook 5
adventures 5
world 5
arthur 4
conan 4
doyle 4
series 2
copyright 2
laws 2
check 2
header 2
changing 1
downloading 1
redistributing 1
Also attached the sample input file
Regards,
On Sat, Sep 10, 2016 at 9:20 AM, Kacper Gutowski <address@hidden> wrote:
> On 9 September 2016 at 23:39, Ala'a Mohammad wrote:
>> the errors happened inside 'hist' function, and I presume mostly due
>> to the jot dot find (if understand correctly, operating on a matrix of
>> length equal to : unique-length * words-length)
>
> Try (∪⍵)∘.≡⍵ instead of ∨/¨(∪⍵)∘.⍷⍵.
>
> -k
llaa
Description: Binary data
- [Bug-apl] Spell corrector - APL, Ala'a Mohammad, 2016/09/09
- Re: [Bug-apl] Spell corrector - APL, Christian Robert, 2016/09/09
- Re: [Bug-apl] Spell corrector - APL, Kacper Gutowski, 2016/09/10
- Re: [Bug-apl] Spell corrector - APL,
Ala'a Mohammad <=
- Re: [Bug-apl] Spell corrector - APL, Jay Foad, 2016/09/12
- Re: [Bug-apl] Spell corrector - APL, Ala'a Mohammad, 2016/09/12
- Re: [Bug-apl] Spell corrector - APL, Jay Foad, 2016/09/13
- Re: [Bug-apl] Spell corrector - APL, Juergen Sauermann, 2016/09/13
- Re: [Bug-apl] Spell corrector - APL, Ala'a Mohammad, 2016/09/13
Re: [Bug-apl] Spell corrector - APL, Juergen Sauermann, 2016/09/10