lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev tech. question: translating strings to different charsets


From: Klaus Weide
Subject: Re: lynx-dev tech. question: translating strings to different charsets
Date: Thu, 2 Sep 1999 16:45:24 -0500 (CDT)

On Fri, 3 Sep 1999, Vlad Harchev wrote:

> On Wed, 1 Sep 1999, Klaus Weide wrote:
> 
> > On Thu, 2 Sep 1999, Vlad Harchev wrote:
> > 
> > >  I started implemented support for hyphenation. I need to know how can I
> > > translate the string from current charset of the document to some other, 
> > > given
> > > by chset handle (don't want to dig through lynx headers to discover this).
> > > No entity conversion desired.
> > 
> > The most complete function for that is in LYCharUtils.c,
> > LYUCFullyTranslateString_1 or one of the wrappers around it
> > (or make a new one).
> > 
> > I hope you know what you mean with "current charset of the document".
> 
>  OK, now I understood that I will need the translation from display charset to
> some charset (the name of which will be known after reading the lynx.cfg).
>  I checked LYUCFullyTranslateString_1 - and was very upset. 

You are _upset_?  You want to add a luxury item like hyphenation support
and then you get _upset_ when you find out it's not as convenient as you
thought?  I hope you're not serious...

> In order
> hyphenation to work, the last word on each line, that doesn't fit on
> that line, should be hyphenated. In order it to be hyphenated, it should be
> translated to the charset of the hyphenation rules (that are loaded at
> startup).

Then maybe the whole approach is flawed.

> But using this function, especially due to the fact that it uses
> dynamic allocation, will take a lot CPU time. 

Do you know how much it takes?

It can't be too bad, at least in normal cases.  We're running most
attribute strings that are handled in some way through it.

> So I ask, what is more
> preferable way to get rid of dynamic allocation:
> 1) Use UC* tables directly (may be hack them in order to do this)
> 2) Insure (looking at the code) or hack LYUCFullyTranslateString_1 so it won't
> reallocate buffer - so it will be possible to pass pointer to static storage
> to it (but it will be slow due to the generality).

It's good enough for "normal" use.  I don't see why it's suddenly not fast
enough for _your_ purpose.

> PS: information about what characters are "human letters" and the mapping from
>  "tolower" mapping will be included in the file with hyphenation rules, so the
>   translation from one charset to another is the only performance problem I 
>   see.

I think your approach is terribly flawed.

Given charsets A and B, in the general case
 - You cannot assume that A can be translated to B at all.
 - You cannot assume that strings will stay the same lenght.
 - You cannot assume that translations are reversible without loss.

Maybe you should translate your "rules", not the text.

But having hyphenation rules bound to a specific character encoding is
flawd from the outset.  The should be expressed in terms of _characters_,
i.e. Unicode values.  Everything else is a hack.

That means you should apply them to the text strings ("words") while they
are in a compatible encoding.   Yes, that prbably means to change the
whole chartrans thing, as to when/where things get transformed.  If you
try to ad hyphenation support without that, you are trying to take the
easy way out which I predict won't work reliably.

   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]