[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Matrix operations on large datasets
From: |
Ed |
Subject: |
Re: Matrix operations on large datasets |
Date: |
Fri, 30 May 2008 21:51:34 +0100 |
Nb I forgot to mention matrix inverse too! I saw the code already has
a sweep operator for generalized inverses. I'm not clear on how to
best calculate the inverse of a very large, very nonsquare matrix. The
form of the generalized inverse is I think something like:
Q^(-1)- = | Q^-1 A |
| B C |
where the top left corner is the significant bit, and the other three
submatrices are much simpler. Since the matrix is very nonsquare, its
possible that the top corner is small relative to the total matrix
size. This might be a dumb question but is it possible to avoid even
constructing A,B,C and using some simple generated form (perhaps less
optimal than the sweep would give us, but much much cheaper)?
Ed
2008/5/30 Ed <address@hidden>:
> (To avoid polluting the standard thread I copied this part across)
>
>> 2008/5/30 Ben Pfaff <address@hidden>:
>> But I'm not sure why you think that supporting large data sets
>> has to be highly complex. A lot of it is just Computer Science
>> 101 type stuff; for example, merge sorts for efficiently sorting
>> large data sets (which we already implement). What do you have
>> in mind?
>
> I guess I was thinking of matrix operations in particular. I've been
> reading the linear regression code as Jason suggested, and the best
> way to implement linear algebra on a large dataset isn't immediately
> clear to me.
>
> My recollection of the maths in this area is a little hazy. The
> obvious cases we might need seem to be:
>
> vector <dot> vector:
>
> is straightforward (stream in matching chunks of the two vectors).
> working memory is 2* chunk size + 1 (for the total)
>
> matrix x vector
>
> the matrix could be wide or high (we assume its very nonsquare since
> otherwise no algorithm can probably run in useful time anyway). All
> this basically changes is the order we stream things in. If wide, we
> stream across the matrix, calculating all n result elements
> simultaneously. If high, we stream down the matrix, calculating each
> of the n result elements one after another and saving the result to an
> output stream.
>
> memory usage:
> wide: n totals + 2n chunks.
> high: 1 total + m locations (much cheaper I guess)
>
> matrix x matrix
>
> In the case (n x m) x (m x p) where n,p <<< m, can we build the entire
> result matrix as we stream the inputs across m?
>
> The other cases I'm not thinking so clearly about...
>
> Is there an existing library that would do this for us?
>
> This is the sort of thing that seems complicated to me anyway.
>
> Ed
>