Re: [Orgmode] Searching inside of attachments (pdf, odt)?

From: Samuel Wales
Subject: Re: [Orgmode] Searching inside of attachments (pdf, odt)?
Date: Tue, 13 Oct 2009 10:09:10 -0700


My idea is to keep it simple at first.  Everybody will come
up with great ways to integrate with his favorite IR tool.

Here I want to focus on the org interface.

The org interface can be the same as any other agenda
search, with all the same controls.  The back end can use
special-purpose textifiers like pdf2text (or whatever) or
general-purpose textifiers from IR tools.  Doesn't matter.

Later, the mechanism can get more fancy if desired.  But
first, we should implement existing behavior.  I often move
things to attachments merely because they are large.  I
don't want search to work differently just because I did
that.  Search should IMO work the same as it does for
outline bodies.

This includes regexp syntax.  If we use anything other than
Emacs, we risk one regexp syntax for attachments and another
for outline bodies.  That makes me shudder.

Later, we can use the fancier IR tools, or use reverse
indexes.  But not everybody has IR tools installed, and
reverse indexes might be premature optimization.

If you're worried about speed, this is a perfect, simple
application for caching.  I'd try it before concluding that
it is too slow.  If it is, we have a good foundation into
which we can hook your favorite IR.

I don't think there's a downside to achieving compatibility
and full agenda integration first, then only after that
doing the fancy stuff.

Have you tried the agenda search feature yet?  If not, perhaps trying
it first will help ground the discussion.

