Re: [Discuss-gnuradio] Continuously Write FFT Samples to a File

On Tue, Jan 24, 2017 at 1:28 PM, Marcus Müller <address@hidden> wrote:

Dear Mallesham,

2. The closeness of the sampled data from the actual spectral data can be measured easily using some test hypotheses such as KS-test

ah, yes, but throwing Kolmogorov at an estimation problem requires knowledge of the CDF of what is to be observed, or am I mistaken here? Would that not contradict

> The characteristics of transmitters are not known.

?

Best regards,
Marcus
On 01/24/2017 07:11 PM, Mallesham Dasari wrote:
Hi Marcus,

1. Yes, as you said collecting raw IQ samples at that speed (100MBps) is a feasible option here as it throttles the network as well as other resources. I still do know how to handle this. As of now, I am just collecting the PSD and performing some spectrum monitoring applications.

2. The closeness of the sampled data from the actual spectral data can be measured easily using some test hypotheses such as KS-test or any time series prediction. The characteristics of transmitters are not known. They can be active, bursty or intermittent traffic generators. It is independent of transmitters.
On Mon, Jan 23, 2017 at 1:14 PM, Marcus Müller <address@hidden> wrote:
Hi Mallesham,

this is one of the cases where capitalization makes a difference: mbps=millibit per second is really not that much, and even 100 Mb/s would still be pretty tolerable, but that would only be a little more than 3 MS/s; what you're referring to is multiple 100 MegaBytes per second, so that doesn't even fit through gigabit ethernet anymore.

I want to find the suitable duty cycle parameters for this particular setup with minimal error compared to actual spectrum data.

"actual spectral data" is the interesting phrase here. Because that isn't actually all that easy to define: Is having a higher rate at which you "revisit" one frequency better than covering a larger overall bandwidth? What is the actual dynamic range that you want to achieve? Are the transmitters you want to observe expected to be active for prolonged periods, or are they only active in short, temporally uncorrelated bursts?

What's your metric for how "close" your observation is to the "actual spectral data"? I think you must define that first!

Best regards,
Marcus
On 01/23/2017 06:44 PM, Mallesham Dasari wrote:
Hi Marcus and Kyeong,

Thanks for the suggestions, particularly for the fixed point storage. Coming to Marcus's question: My aim to find the optimal sampling of spectrum sensor data. A large number of low-cost spectrum sensors are deployed around some area to monitor the spectrum. As these sensors generate a huge (around 100mbps, considering raw IQ) amount of data with 32msps (USRP B12), I want to find the suitable duty cycle parameters for this particular setup with minimal error compared to actual spectrum data.
On Sat, Jan 21, 2017 at 9:23 PM, Kyeong Su Shin <address@hidden> wrote:
To Whom it may concern:

I don't know much about distributed computing, but I also agree that it is right idea to store PSD data in dBFS (of dBm, if calibrated) scale fixed point format. Microsoft Spectrum Observatory uses 16-bit Q format fixed point number to store PSD data in dBFS, and you can probably go down to 8bit per a point if you are happy with 0.5~1dB resolution.

Regards,

Kyeong Su Shin
On Sat, Jan 21, 2017 at 11:04 AM, Marcus Müller <address@hidden> wrote:
Hi Mallesham,

that does indeed sound interesting, but you first of all have a local problem – that of data volume concentration on your single receiver node. 32MS/s is already more than you can shift out through a single Gigabit Ethernet connection – so either you must immediately update to more datacenter-style interconnects, or you must start thinking about consolidating your data where it happens. On the other hand, compared to other SDR systems, a mere 32 MS/s from a single channel with a non-100% duty cycle is "not that much"; I really feel like you might be running this on slightly undersized hardware.

I, again, ask you to describe what you *want* rather than what you *do* – a system specification is very crucial here, and I hope that Greg agrees with my opinion that the possibility to handle Big Data (whatever that is, in the end) alone is not a solution to a data problem. Partitioning, analyzing, reducing / compressing, filtering and discarding of data can only be designed if you have a clear concept of what your target is – and in the case of signal processing, much more than in many other big data applications, that concept is often pretty well-known a priori.

So, whilst I really think that you're on to something very interesting here, combining distributed computing with SDR, and hope you can share a lot of your insights in the future, I also really think that you should start with a well-though out design of what you want to process and store. This far, you've only told us you have "FFT data" (with which you imply "spectral power estimates", which already is a reduction by a half), but you haven't really explained how much, in how much detail, you need that. A lot of interesting aspects might arise from that – for example, if you're really after power spectra, a logarithmic storage (dB!) would make a lot of sense; combine that with storing these logarithmic values in a fixed-point format could easily save you another factor of two in storage bandwidth – without you ever losing the "essence" of your data. The way in which you capture your data might, as Greg mentioned, be a key indicator of the granularity in which you distribute it.

In short: it might be helpful if you could formulate what you want to *do* with your data, not only how you want to do that.

Best regards,
Marcus
On 01/21/2017 07:37 PM, Mallesham Dasari wrote:
Dear Greg,

Thanks for bringing this into the picture. My long term goal is exactly what you have just explained. My plan is to use Spark for storing this big data and come with a data processing algorithms to monitor the spectrum data. For instance, a simple case where I would find the duty cycle parameters that gives me how coarse-grained my sub-sampling could be so that I would not loose much of the spectrum data. Similarly, there could be many applications by integrating Bigdata and SDR platforms.

I will share the same if I can integrate these successfully. Thanks!
On Sat, Jan 21, 2017 at 11:33 AM, Gregory Ratcliff <address@hidden> wrote:
I spend my working hours on big data and Hadoop.

It occurs to me you really need to be thinking about something outside of a normal file system. HDFS lets you write out data in chunks that you later combine when you have time. There are some really (really) fast implementation projects that write to hdfs. Most of the new work is in java, but I think you are asking for something pretty light.

I can visualize a "gatherer" for RF and a "filer" in HDFS that writes out xx MB chunks every period. Now as others have said, you don't just slap some stuff together, you will need to optimize the integration points and think about the best caching and write speeds of the "filer" system and the persistent storage.

Likewise, there are plenty of apache tools that will recombine the HDFS chunks back into files of arbitrary size.....which you can then analyze later with gnuradio...when time doesn't matter as much. You might not need much of Hadoop that the file system and some tools.

I have always though HDFS + Gnuradio are destined for each other. It may be a bit early for this with today's hardware; Mr. Moore is helping us along just fine, so is AWS.

Greg

Nz8r

On Jan 20, 2017, at 2:46 PM, Marcus Müller <address@hidden> wrote:
I can assure you that 32 GHz is not your sampling rate. Do you mean 32 MHz?

The problem here is that at first, your operating system can be smart and cache write accesses to files on mass storage devices in RAM (or you use a RAM disk, so everything happens in RAM). But at some point, RAM is going to run out – and then, your recording speed is effectively limited by how fast you can write to your storage (in case of a RAM disk, you simply run full, or your OS starts "swapping", ie. writing RAM to storage. same problem).

So, unless you find a way to *reduce* the amount of data you want to record, or simply buy a faster storage system, there's not much you can do.

Best regards,

Marcus

On 01/20/2017 08:42 PM, Mallesham Dasari wrote:
Hi Marcus,

Thanks for the quick response. I am recording the FFT samples continuously. But, I am getting overflow after some time when the file size has become huge. My sample rate is high (32GHz) and hence writing to the file takes so long and hence the usrp_spectrum_sense getting overflow.
On Fri, Jan 20, 2017 at 2:33 PM, Marcus Müller <address@hidden> wrote:
Hello Mallesham,

I'm afraid not, since I'm afraid that to my current understanding, what you want is mathematically impossible. Either you want much data – and that seems to be the case, since you want to record 24h of raw IQ data – or you can store it in what comparably little RAM modern computers have.

Maybe, however, we haven't fully understood the problem. Can you, mathematically, define what you want to observe and record?

Best regards,

Marcus
On 01/20/2017 08:28 PM, Mallesham Dasari wrote:
Hello everyone,

Can anyone give some solution for this? Even writing to the ramdisk is not enough for running the flow graph for so long. I am facing the same issue.

Thank you!
On Thu, Jan 12, 2017 at 5:41 PM, Hasini Abeywickrama <address@hidden> wrote:
Hi all,

Thank you very much for the informative responses.

My requirement is to run the flowgraph for a long time (ideally 24 hours) and store the FFT data in the memory (ramdisk) to they can be processed later or in chunks, not everything at the same time.

So far, I have increased the size of the ramdisk and it works fine for a few hours. But it still is not the solution I'm looking for.

Regards,

Hasini
On Thu, Jan 12, 2017 at 8:30 PM, Marcus Müller <address@hidden> wrote:
But if you do a single 1024-FFT, you'd only operate on 1024 of the input samples!

And: the FFT doesn't just give you power values, but complex values; mathematically, the FFT is a DFT, and the DFT is an invertible linear operator <mime-attachment.png>:

<mime-attachment.png>

which maps complex vectors to complex vectors of size <mime-attachment.png>. It is, in fact, representable as square matrix with column (and row) vectors being samples of the orthogonal complex sinusoids $e^{j\frac{2\pi}N nk},\, k=0,\ldots,N-1$; that is, it can also be understood as a base change matrix, that just represents the "input vector" according to a different base, orthogonal base.

In the physical sense: the input vector base was represented by the standard basis $\mathbf e_N$, meaning that each base vector represents a single point in time – the sample time of the respective entry; the "output" of the transform is represented on a base of orthogonal frequencies. This is an invertible operation – really just another way to look at the same signal. I think this is really important to keep in mind:
The Fourier transforms are not magical by any means. What they do is represent the same signal from a different point of view. It can be interpreted as transform between time and frequency domain (or space and impulse, or...). The DFT is still just a boring, old, square, orthogonal, invertible matrix that produces output of the same dimensionality as it takes input.

As you can see, the DFT/FFT itself never reduces the amount of data.

What you might be referring to is some kind PSD estimate done by first |·|² a lot of DFTed vectors and then averaging them. The data reduction here lies in the magnitude square operation and the average, not in the DFT.
The point here is that you're throwing away a whole lot of information, and I'm not convinced that's what Hasini needs!

Best regards,

Marcus

On 12.01.2017 05:54, Mallesham Dasari wrote:
Hi Marcus,

Raw IQ samples take lots of memory because each sample will be around 8Bytes. Suppose, if we 1Msps sample rate, just for 10 minutes of data, we get 10*60*1M*8B = 4.8GB data. On the other hand, if you store just FFT with 1024 bin, we get 4.8GB/1024 power values right (which has very less size)?

Please correct me if I am wrong.

Thanks
On Wed, Jan 11, 2017 at 7:32 AM, Marcus Müller <address@hidden> wrote:
Hi Mallesham,

I don't understand – the raw IQ samples and their FFT have the same size, and data type.

Maybe you've understood something that I (and Martin) didn't – could you elaborate?

Best regards,
Marcus
On 01/11/2017 12:56 AM, Mallesham Dasari wrote:
Hi Hasini,

If you are trying to print just the FFT, it should not be an issue. If you print raw iq samples, then you will run out of memory. By long, you mean how long? Days?

On Tue, Jan 10, 2017 at 3:16 PM, Martin Braun <address@hidden> wrote:

Hasini,

can you please re-state what you're trying to do? That might help you
getting some answers. It is not quite clear from this email.

Cheers,
Martin

On 01/02/2017 09:16 PM, Hasini Abeywickrama wrote:
> Hi all,
>
> I have a flowgraph that reads a signal and writes its FFT samples to a
> file. I need to run this continuously (for a long time), without running
> out of memory.
>
> I tired deleting the earlier FFT samples from the file but that messes
> up with reading the data. I also tried starting writing to a different
> file after some time so the initial file can be completely deleted. But
> it did not work as well.
>
> What would be the best approach for this? Any thought would be very much
> appreciated.
>
> Regards,
> Hasini
>
>

> _______________________________________________
> Discuss-gnuradio mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>

_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

--

Best Regards,
Mallesham Dasari

Department of Computer Science

Stony Brook University

USA - 11794
_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________ Discuss-gnuradio mailing list address@hidden https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
--
Best Regards,
Mallesham Dasari
Department of Computer Science
Stony Brook University
USA - 11794
_______________________________________________ Discuss-gnuradio mailing list address@hidden https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
--
Best Regards,
Mallesham Dasari
Department of Computer Science
Stony Brook University
USA - 11794
--
Best Regards,
Mallesham Dasari
Department of Computer Science
Stony Brook University
USA - 11794
_______________________________________________ Discuss-gnuradio mailing list address@hidden https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________ Discuss-gnuradio mailing list address@hidden https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
--
Best Regards,
Mallesham Dasari
Department of Computer Science
Stony Brook University
USA - 11794
_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________ Discuss-gnuradio mailing list address@hidden https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________ Discuss-gnuradio mailing list address@hidden https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
--
Best Regards,
Mallesham Dasari
Department of Computer Science
Stony Brook University
USA - 11794
_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________ Discuss-gnuradio mailing list address@hidden https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
--
Best Regards,
Mallesham Dasari
Department of Computer Science
Stony Brook University
USA - 11794

Best Regards,

Mallesham Dasari

Department of Computer Science

Stony Brook University

USA - 11794

From:	Mallesham Dasari
Subject:	Re: [Discuss-gnuradio] Continuously Write FFT Samples to a File
Date:	Tue, 24 Jan 2017 13:42:30 -0500