libextractor
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [libextractor] Question about python bindings for libextract (extrac


From: M. David Allen
Subject: Re: [libextractor] Question about python bindings for libextract (extractor)
Date: Tue, 17 Sep 2013 09:12:10 -0400

Thanks Christian - that answer makes complete sense.

At times I need to read data streaming off of a URL download; I certainly could save it to a file locally, but I'd rather not due to some of the file sizes and frequencies.  I have noticed that for many files, you can still get away with some useful things by feeding libextractor a buffer of just the first 64KB of the file;  if the stream is a ZIP file,  it will derive that it's a ZIP file, but it won't give you a list of all of the contained files, which it would if you gave it the entire file.

In short, feeding it an initial buffer as big as I'm willing to hold does produce some useful results - "not ideal, but better than nothing".  :)

David

On Tue, Sep 17, 2013 at 8:58 AM, Christian Grothoff <address@hidden> wrote:
While LE doesn't have to run on a file (you can pass NULL for the filename),
it currently does not expose an API to incrementally give it parts of a
file; while is is theoretically possible add an API to do so, please
remember that
some LE plugins require random access to the data, so mere streaming
wouldn't work
anyway.  Adding such an API would thus still require the client to support
random access, which does not match what you would usually be getting
from say a TCP stream.

The best place to ask these questions is on the mailinglist, which I've
cc'ed.

Happy hacking!

Christian

On 09/17/2013 02:16 PM, M. David Allen wrote:
> Hello,
>
> I am wondering if it is possible to incrementally feed buffers to the
> extractor object in order to extract all metadata items from a file.
>
> The extractor program already provides for passing a file name as an
> argument, and an individual buffer - but sometimes it may be desirable to
> have it look over more data than is just in one buffer (e.g. when reading
> from a socket which may contain a lot of data).   It is possible to
> incrementally feed the object data read from this socket, rather than save
> all of the data to a local file and then run the extractor on the file?
>
> If there is a more appropriate place to take this question, please let me
> know.
>
> Thanks,
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]