[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Org Mode and PDF Notes!

From: Ramon Diaz-Uriarte
Subject: Re: [O] Org Mode and PDF Notes!
Date: Fri, 13 Nov 2015 00:55:14 +0100
User-agent: mu4e 0.9.13; emacs 24.5.1

On Thu, 12-11-2015, at 15:28, Matt Lundin <address@hidden> wrote:
> Ramon Diaz-Uriarte <address@hidden> writes:
>> so we get the location of the highlight (and its properties), but not the
>> textual contents. And this is the case whether I make the annotation with
>> EzPDF or Okular or, for that matter, with pdf-tools itself.
>> So it seems RepliGO is actually giving you a lot more by default :-)
>>> Politza and I are discussing this here:
>>> https://github.com/politza/pdf-tools/issues/137
>>> that might be a good place to ocntinue the conversation.
>> I'll do. In the meantime, I think this is a limitation coming from
>> poppler. Other people have mentioned similar things (e.g.,
>> http://coda.caseykuhlman.com/entries/2014/pdf-extract.html) and using other
>> tools that depend on poppler (such as Leela:
>> https://github.com/TrilbyWhite/Leela) also will not give us the text
>> itself. 
> I don't think this is a limitation of poppler so much as the way that
> pdf annotations work. Typically, the subject/text field is not populated
> by the text of the highlighted region. Rather, a highlight annotation
> specifies bounds, color, style, etc. Basically what Repligo does (I
> wouldn't recommend using it, as it is closed source and severely out of
> date) is to grab the text *at the time of highlighting* and add it to
> the notes field. I don't know of any other annotation tool that does the
> same thing. Applications built on poppler could do it, though they
> currently do not.

I stand corrected. You are right; sorry for the sloppiness in the wording
and ideas.

> For extracting the text of highlighted regions *after the fact*, I've
> had good luck with this script that relies on the pdf-reader gem for
> ruby:
> https://gist.github.com/danlucraft/5277732

That is also what I use for extracting the text from the highlighted


> Matt

Ramon Diaz-Uriarte
Department of Biochemistry, Lab B-25
Facultad de Medicina
Universidad Autónoma de Madrid 
Arzobispo Morcillo, 4
28029 Madrid

Phone: +34-91-497-2412

Email: address@hidden


reply via email to

[Prev in Thread] Current Thread [Next in Thread]