[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Org Mode and PDF Notes!

From: Matt Price
Subject: Re: [O] Org Mode and PDF Notes!
Date: Wed, 11 Nov 2015 15:33:52 -0500

On Wed, Nov 11, 2015 at 3:17 PM, Ramon Diaz-Uriarte <address@hidden> wrote:
Dear Matt,

On Wed, 11-11-2015, at 15:42, Matt Price <address@hidden> wrote:
> I've just written up a post on my workflow for PDF's Since my blog has, I
> think, a readership of 0 (surely there's a way to get emacsers to follow
> me? ah well), I will post a link here in the hopes that someone will be

Add another 1 :-)

> interested:
> http://matt.hackinghistory.ca/2015/11/11/note-taking-with-pdf-tools/

Really neat! A few comments/questions/ramblings:

- The type of highlights you get from RepliGo contain the text itself. I
  mean, when in your pdf I use C-c C-a l, the buffer showing the contents
  of each highlight contain  the highlighted text.

  This is not what I get from, say, EzPDF (which is what I use on Android),
  or from highlighting from pdf-tools itself using C-c C-a h, or from
  highlighting from Okular. The contents just gives the rectangle). Hummmm...

  Because of this, when I use your code on my pdfs, I only get things
  such as


  instead of the text. Bummer! I wonder if RepliGO gives you a lot more
  than the rest, or if I am doing something silly.

I think that there is no standard way of storing the highlight contents. I chose Repligo over EZPDF because it gives you access to the text of the highlights! 

Okular, I think, stores your annotations in its own database, rather than in the pdf. You can (I think!) attach the annotations to the pdf from inside Okular.  At leasts, that's what I remember from when I was looking around.

Repligo stores the highlighted text in the "subject" field of the annotation. It's possible that the content of the annotation is stored in some other field, like "content".  Maybe you can try:

M-: (pdf-annot-get-annots) and look at the output in the *Messages* buffer.  Can you see any evidence of the the text? Can you share what you learned?

Politza and I are discussing this here:

that might be a good place to ocntinue the conversation.

- You have to call mwp/pdf-multi-extract on each file/set of files. I guess
  if I knew elisp, I'd find it trivial to iterate over a set of directories
  and subdirectories (and do this using a cron job at night), and also
  place everything in one single org file. Would this be something
  reasonable to do?

for sure.  My elisp sucks too but I bet someone will answer you here on the list.
  (This might be related to your second Todo)

well, wasn't what I was planning but would still be useful.

- I know nothing about how it works, and it does not use pdf-tools, but in
  your first Todo you mention: "extend the pdfview link type (in
  org-pdfview) to permit me to specify the precise location of an
  annotation,".  PDF.js (https://mozilla.github.io/pdf.js/), which is
  used for instance by zotfile (http://zotfile.com/) does that and it works
  out of the box with Okular (but I've not been able to get it to work with

Until I found pdf-tools, I had planned to write a node wrapper for pdf.js and grab the annotations that way.  But I don't really know how to do that, so this turned out to be easier :-)

Anyway, I've judated the post, and it's now possible to create links to individualt annotations, though you will have to use my updated version of org-pdfview, until/unless Markus accepts my patch.  

- In case it matters, I have somewhat similar modus operandi.  I do a lot
  of PDF reading, including note-taking and highlighting, in android
  tablets ---I use EzPDF, which also embeds the notes in the PDF. I have a
  cron job that extracts all the highlights and annotations of all the PDFs
  and places them in a single org file. The kludge is explained here:
  The truth is I use two mechanisms for PDF annotation and highlighting
  extraction, since none is fully satisfactory to me, but the one that uses
  Ruby (i.e., that does not depend on poppler) is able to actually extract
  the text of the highlights.

ah, man, that looks really cool and I'm sorry I didn't know about it earlier! I haven't read through your whole document but looks like there's a lot useful stuff there. 


Best, and thanks again for sharing,

you're welcome & thank you!

reply via email to

[Prev in Thread] Current Thread [Next in Thread]