help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

uint2?


From: Mike Miller
Subject: uint2?
Date: Thu, 2 Dec 2010 18:49:51 -0600 (CST)
User-agent: Alpine 2.00 (DEB 1167 2008-08-23)

I know that we have a uint8 data type, but that's as small as we're allowed to go -- one byte per element. In genetics research we're doing a lot these days with very high dimensional data (millions of markers per person) where every marker genotype can be encoded as 0, 1, 2 or missing.

Side remark: We typically count the number of minor alleles for a biallelic marker. Suppose a single-nucleotide marker has alleles A and T and suppose that A is the rarer of the two (the "minor allele"). Then we would count the number of A alleles per genotype: TT = 0, AT = 1, AA = 2.

Thus, the data could be stored using 00, 01, 10 and 11 (missing) and we could store four genotypes per byte instead of only one.

This scheme is used by the GPL-licensed program PLINK. It uses it to store data files and also to work with the data in memory. Even with the PLINK system it's pretty easy to have data that use a full gigabyte, so it provides a very significant savings in RAM.

I'm asking because I'm wondering if it is conceivable that a uint2 type could be developed for Octave. Or the type could be a special snp type where binary 11 always referred to a missing value (NA when displayed or stored in text output).

I have no idea how much work that would be. I'm willing to work on it, but I'm also not much of a programmer, so I doubt I could add much. The availability of the PLINK GPL'd code could help a lot, I suppose. I don't know if R developers have been working on this problem, but I'll find out (a lot of genetics researchers use R).

Mike


reply via email to

[Prev in Thread] Current Thread [Next in Thread]