[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I'm is really I'm

From: Lennart Borgman
Subject: Re: I'm is really I'm
Date: Wed, 7 Jul 2010 03:57:39 +0200

On Wed, Jul 7, 2010 at 3:38 AM, Harald Hanche-Olsen <address@hidden> wrote:
> + Lennart Borgman <address@hidden>:
>> I was just copying some text from a pdf file to store in org-mode in
>> Emacs. Some of the characters are not readable with my font. For
>> example when pasting something that looked like
>>   I'm
>> I got in Emacs
>>   I‟m
>> where the middle char is
> which is really sort of wrong.
> I suspect that the PDF file has used a non-unicode font for that
> character. And once the PDF file creator resorts to such then ...

The PDF file says "PDF Producer: Microsoft Office Word 2007".

>> Do we have any tool for replacing such characters in Emacs? Or is
>> there a better way?
> ... all bets for an automatic recovery are off, except for some AI
> technique. Otherwise, I am very much afraid that good old search and
> replace is the best you can do. Of course, if you have a lot of these
> files and they all suffer the same symptoms, you might want to build a
> translation table of sorts. Is your question really about how you can
> build and apply such tables to text?

I hoped there were some easy cases where some characters commonly used
for typographic reasons could be replaced by more "wellknown"

Otherwise a very simple "AI" technique could perhaps be to just build
a table of things with ("I'm" "I'm").

reply via email to

[Prev in Thread] Current Thread [Next in Thread]