Re: [ft] FW: Getting the charcode Value when the Glyph ID is known

Dear Suzuki,

I see. What software translating from PDF to HTML

you're using? Could you post (or upload to any web

site) a sample PDF that you have some issue?

We are using Argus software for PDF to HTML conversion to create the ePUB format. All the Unicode character will not be supported by ePUB. For that, we required some of the Unicode character as an image. To avoid manual process, we are planned to extract the character as an image at the time of extraction itself. Argus is having the features to remap the character (Eg. xFB03 =ffi;, xFB03 =<img src="" in the conversion stage.

We have successfully extracted the character as an image using one of the samples (example2.cpp) provided in the Freetype site. But after extraction there is no difference between (comma & right quote). We have taken comma & quote right for example only.

Please provide the option to extract the expected output.

Current Output:

Expected Output:

Regards,

Balraj Balakrishnan

Assistant Manager – IRL

Integra Software Services Pvt. Ltd.

100 Feet Road, ECR, Pondicherry-605008

Phone: +91 413 4212124 x 321

Mobile: +91 9842385151

Life is a one way journey, not a destination. Travel it with a smile and never regret anything. Yesterday is history, tomorrow is a mystery, today is gift - that’s why we call it present.

This email and any accompanying attachments is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure, distribution, or copying is strictly prohibited. If you are not the intended recipient of this communication or received the email by mistake, please notify the sender and destroy all copies. Integra Software Services Pvt Ltd. reserves the right, subject to applicable local law, to monitor and review the content of any electronic message or information sent to or from its company allotted employee email address/ID without informing the sender or recipient of the message.

-----Original Message-----
From: address@hidden [mailto:address@hidden
Sent: 04 May 2011 18:26
To: Balraj Balakrishnan, Integra-PDY, IN
Cc: address@hidden; address@hidden
Subject: Re: [ft] FW: Getting the charcode Value when the Glyph ID is known

On Mon, 2 May 2011 09:19:37 +0000

"Balraj Balakrishnan, Integra-PDY, IN"

<address@hidden> wrote:

>As am new to freetype and all these font stuffs, I couldn't rather

>frame my requirement in a right manner. I shall be making an another

>attempt to bring about much more clarity in what I really want from

>freetype:

OK. I think there is not special manner specific to

this list, the clarification of input, process and

output is important in any mailing list of open sources.

>1. The scenario here is, we are trying to convert the source PDF into

>an HTML, while doing this there are many fonts in the PDF which are

>extracted or mapped to a wrong character.

I see. What software translating from PDF to HTML

you're using? Could you post (or upload to any web

site) a sample PDF that you have some issue?

Basically, an elementary font object in PDF (a data

segment which you spliced from PDF and pass to

FT_New_Face()) is not expected to hold an interface

to character encoding. For the relationship between

glyph index (or glyph name) and the character code,

/Encoding or /ToUnicode elements in wrapping font

object in PDF (which refers its elementary font object

via /BaseFont object). Referrer's /Encoding dictionary

can override the built-in encoding info in the referred

font.

I think there are existing softwares like pdftohtml

which do such work in good level.

>So we are extracting the font files from the PDF, to

>convert glyph's (Symbols, Unicode) in the font file

>as an image and replace the wrongly extracted characters

>/Symbols/Unicode in the HTML file with the image.

As I've written in above, extacted font file is insufficient

resource to guess the codespoint for the glyphs.

>In the above mentioned scenario the image should maintain

>its position in the outline in order place it in an HTML

>file. If you look at the image below the fonts Quote

>right and the Comma is differentiated based on its position

>in a given line.

Do you say that your program (at present) cannot detect

the character code point for the single quote glyph and

the comma glyph from PDF, then you want to guess the

codepoints by checking the indepth of the font?

Does Adobe Acrobat extract the text from your PDF?

From:	Balraj Balakrishnan, Integra-PDY, IN
Subject:	Re: [ft] FW: Getting the charcode Value when the Glyph ID is known
Date:	Wed, 4 May 2011 14:17:20 +0000