[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: I'm is really I'm
From: |
Lennart Borgman |
Subject: |
Re: I'm is really I'm |
Date: |
Wed, 7 Jul 2010 03:57:39 +0200 |
On Wed, Jul 7, 2010 at 3:38 AM, Harald Hanche-Olsen <address@hidden> wrote:
> + Lennart Borgman <address@hidden>:
>
>> I was just copying some text from a pdf file to store in org-mode in
>> Emacs. Some of the characters are not readable with my font. For
>> example when pasting something that looked like
>>
>> I'm
>>
>> I got in Emacs
>>
>> I‟m
>>
>> where the middle char is
>
> DOUBLE HIGH-REVERSED-9 QUOTATION MARK
> which is really sort of wrong.
>
> I suspect that the PDF file has used a non-unicode font for that
> character. And once the PDF file creator resorts to such then ...
The PDF file says "PDF Producer: Microsoft Office Word 2007".
>> Do we have any tool for replacing such characters in Emacs? Or is
>> there a better way?
>
> ... all bets for an automatic recovery are off, except for some AI
> technique. Otherwise, I am very much afraid that good old search and
> replace is the best you can do. Of course, if you have a lot of these
> files and they all suffer the same symptoms, you might want to build a
> translation table of sorts. Is your question really about how you can
> build and apply such tables to text?
I hoped there were some easy cases where some characters commonly used
for typographic reasons could be replaced by more "wellknown"
characters.
Otherwise a very simple "AI" technique could perhaps be to just build
a table of things with ("I'm" "I'm").