bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Use with word2vec


From: Fred Weigel
Subject: Re: [Bug-apl] Use with word2vec
Date: Sat, 29 Apr 2017 16:26:06 -0400

Thanks!

I'll probably go with SHMEM for future cuda/opencl use (I was thinking
along those lines). I don't yet need typical size -- the model I am
working with this weekend is vector8.bin, which is 71000 x 200 floats
(71000 words, each with 200 floats = 57MB) in size, but the *big* one is
much larger.

Fred Weigel

On Fri, 2017-04-28 at 21:32 -0400, Xiao-Yong Jin wrote:
> If shared variables can go through SHMEM, you can probably interface
> cuda that way without much bottle neck.
> But with the way GNU APL is implemented now, there are just too many
> other limitations on performance with arrays of such size.
> 
> > On Apr 28, 2017, at 9:19 PM, Fred Weigel <address@hidden> wrote:
> > 
> > Jeurgen, and other GNU APL experts.
> > 
> > I am exploring neural nets, word2vec and some other AI related
> > areas.
> > 
> > Right now, I want to tie in google's word2vec trained models (the
> > billion word one GoogleNews-vectors-negative300.bin.gz)
> > 
> > This is a binary file containing a lot of floating point data --
> > about
> > 3.5GB of data. These are words, followed by cosine distances. I
> > could
> > attempt to feed this in slow way, and put it into an APL workspace. 
> > But... I also intend on attempting to feed the data to a GPU. So,
> > what I
> > am looking for is a modification to GNU APL (and yes, I am willing
> > to do
> > the work) -- to allow for the complete suppression of normal C++
> > allocations, etc. and allow the introduction of simple float/double
> > vectors or matrices (helpful to allow "C"-ish or UTF-8-ish strings:
> > the
> > data is (C string containing word name) (fixed number of floating
> > point)... repeated LOTs of times.
> > 
> > The data set(s) may be compressed, so I don't want read them
> > directly --
> > possibly from a shared memory region (64 bit system only, of
> > course), or
> > , perhaps using shared variables... but I don't think that would be
> > fast
> > enough.
> > 
> > Anyway, this begins to allow the push into "big data" and AI
> > applications. Just looking for some input and ideas here.
> > 
> > Many thanks
> > Fred Weigel
> > 
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]