|
From: | Balraj Balakrishnan, Integra-PDY, IN |
Subject: | Re: [ft] FW: Getting the charcode Value when the Glyph ID is known |
Date: | Wed, 4 May 2011 14:17:20 +0000 |
Dear Suzuki, I see. What software translating from PDF to HTML you're using? Could you post (or upload to any web site) a sample PDF that you have some issue? We are using Argus software for PDF to HTML conversion to create the ePUB format. All the Unicode character will not be supported by ePUB. For that, we required some of the Unicode character as an image. To avoid manual process, we are
planned to extract the character as an image at the time of extraction itself. Argus is having the features to remap the character (Eg. xFB03 =ffi;, xFB03 =<img src="" in the conversion stage. We have successfully extracted the character as an image using one of the samples (example2.cpp) provided in the Freetype site. But after extraction there is no difference between (comma & right quote). We have taken comma & quote
right for example only. Please provide the option to extract the expected output. Current Output: Expected Output: Regards, Balraj Balakrishnan Assistant Manager – IRL Integra Software Services Pvt. Ltd. 100 Feet Road, ECR, Pondicherry-605008 Phone: +91 413 4212124 x 321 Mobile: +91 9842385151 Life is a one way journey, not a destination. Travel it with a smile and never regret anything. Yesterday is history, tomorrow is a mystery, today is gift - that’s why we call it present.
This email and any accompanying attachments is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure, distribution, or copying is strictly prohibited.
If you are not the intended recipient of this communication or received the email by mistake, please notify the sender and destroy all copies. Integra Software Services Pvt Ltd. reserves the right, subject to applicable local law, to monitor and review the
content of any electronic message or information sent to or from its company allotted employee email address/ID without informing the sender or recipient of the message. -----Original Message----- On Mon, 2 May 2011 09:19:37 +0000 "Balraj Balakrishnan, Integra-PDY, IN" <address@hidden> wrote: >As am new to freetype and all these font stuffs, I couldn't rather >frame my requirement in a right manner. I shall be making an another >attempt to bring about much more clarity in what I really want from >freetype: OK. I think there is not special manner specific to this list, the clarification of input, process and output is important in any mailing list of open sources. >1. The scenario here is, we are trying to convert the source PDF into >an HTML, while doing this there are many fonts in the PDF which are >extracted or mapped to a wrong character. I see. What software translating from PDF to HTML you're using? Could you post (or upload to any web site) a sample PDF that you have some issue? Basically, an elementary font object in PDF (a data segment which you spliced from PDF and pass to FT_New_Face()) is not expected to hold an interface to character encoding. For the relationship between glyph index (or glyph name) and the character code, /Encoding or /ToUnicode elements in wrapping font object in PDF (which refers its elementary font object via /BaseFont object). Referrer's /Encoding dictionary can override the built-in encoding info in the referred font. I think there are existing softwares like pdftohtml which do such work in good level. >So we are extracting the font files from the PDF, to >convert glyph's (Symbols, Unicode) in the font file >as an image and replace the wrongly extracted characters >/Symbols/Unicode in the HTML file with the image. As I've written in above, extacted font file is insufficient resource to guess the codespoint for the glyphs. >In the above mentioned scenario the image should maintain >its position in the outline in order place it in an HTML >file. If you look at the image below the fonts Quote >right and the Comma is differentiated based on its position >in a given line. Do you say that your program (at present) cannot detect the character code point for the single quote glyph and the comma glyph from PDF, then you want to guess the codepoints by checking the indepth of the font? Does Adobe Acrobat extract the text from your PDF? |
[Prev in Thread] | Current Thread | [Next in Thread] |