[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev hyphenation (was tech. question: translating strings)

From: Vlad Harchev
Subject: Re: lynx-dev hyphenation (was tech. question: translating strings)
Date: Wed, 8 Sep 1999 17:04:03 +0500 (SAMST)

Tom, please read the last two paragraphs of this message!

On Tue, 7 Sep 1999, Klaus Weide wrote:

> [...] 
> >  Then you'll spent month on redesigning lynx in this fashion (if you were
> > serious).
> And if it takes a month, that would be a month better spent than by
> adding user features that no user but one wants.

 There are a lot of features everybody wants to be implemented like complete
table support, key sequences, style sheets. If you really have spare time (or
time to make that internal redesign) ask lynx-dev people what they want more
before starting it.

 As for your redesign - it would be useful if some more modifications
GridText.c to be added. I don't see any except table support. So, why you will
do it - only for idealization (the results won't be used in any code) ? -
seems everything works fine now even without that.
> > > There are lots of things that could become simpler if a Unicode
> > > representation were used throughout.
> > 
> >  They could be done simpler (ie they are done). Why do you plan to spend
> > precious time on unnecessary internal redesigns (be pragmatic not paranoid)
> > that can be spent on more useful things?
> If everyony had been thinking that way, lynx would have long collapsed
> completely under the weight of arbitrary features.  I don't find the
> idea satisfying that I am contributing to that (although in practice
> I probably am).

  How it can collapse? As long as people are doing changes in a clever
way, there won't be such problem. Seems all new features are spread around 
lynx code, and a lot of aspects of lynx are near to ideal - I can't invent
more than 10 features to be added to lynx.
 And overweighting lynx is a pain for programmers, not users.

> Your idea of "useful" is obviously different from mine.  I find the
> code more useful if I can understand better what it does.  Lots of
> ad hoc features for a very limited purpose don't help there.

 Why don't you simply believe the implementor that s/he is doing the right
things? What don't you understand in hyphenation code in particular?
> > > >  I'm glad that you understand that UTF-8 (and UCS*) doesn't  have 
> > > > anything
> > > > with "mixing several languages that use the same repertoire in one 
> > > > document"
> > > > (I thought I thought that this was a solution).
> > > 
> > > Huh?  It was you who seemed to somehow seem a connection between "UTF8
> > > in documents" (i.e. externally) and "mixing languages".  Now you seem
> > > to change the topic to something else completely.
> > 
> >  May be it's my bad english. By I tried to inspire you that the use of 
> > unicode
> > can't prevent from hyrules collision (or incorrect hyphenation) for document
> > with mixed languages with non-disjoing repertoires.
> > 
> > > > The 'lang=' is for solving 
> > > > this. Why do you push "unicode" everywhere?
> > > 
> > > It is already used in lynx for the character translations.  Whether you
> > > know it or not, when you view a cp<something> Russian text with KOI8-R
> > > you are using it.  Using it as a common lingua franca allows translation
> > > between N charsets with O(N) instead of O(N**2) tables.  That alone
> > > should be good enough reasons for using it internally.
> > 
> >  But conversion between 2 given chsets would take much more time if Unicode 
> > is
> > used (and libhnj should be rewritten).
> You still don't get it that it *already is being used* for exactly that, for
> conversion between 2 given charsets.  Since we are already using it, we might
> as well use it everywhere where it makes sense.

 I didn't look at the UC*.c at all, so I don't know how conversion is
implemented. As I can predict, when UCTrans is being initializied, some lookup
table is build from those unicode table.
 I personally confused by the way the lookup tables for unicode symbols should
be used - how big they are? How much time looking up by the unicode character
(ie 16 or 32-bit) would take? Compared to the byte-table lookup (ie direct
loading of value)? How info about whether it's a letter for a given language
or whether it's an upper and what's the lowercased equivalent of the letter to
be stored, distributed, etc.  

> >  Well, lynx without hyphenation doesn't look too bad :)
> >  But seems russian is one of the very few languages that doesn't use latin
> > letter - hebrew, arabic, greek, turkish and ukrainian are others. So, such
> > problem is very rare. 
> Wrong about Turkish (at least in Turkey).  Wrong about "Very few languages".
> And probably wrong about the conclusion (although I last track of what the
> "such problem" is you are talking about now)

  "such problem" - absense of letters from different languages in one charset.
Compare E+G - all letters of G and E are in one charset - the one that G uses.

> Basically you are saying you don't care bacause it's not worth your time,
> but apparently you expect your stuff to be added to everyone's lynx. Right?

 I expect my staff to be added to the source of the everyone's lynx. But I
don't dream that it will be compiled in in the lynx of everyone. Everything
has it's limitations...

> > > > As for utf8-encoded hyrules  - the hyphenation simply
> > > > won't work or dictionary won't load by libhnj. In other words, each 
> > > > signle 
> > > > byte in  hyrules denotes a single "human letter", each single byte in 
> > > > d.c.s.
> > > > denotes a single "human letter" (and not part of letter) - to make 
> > > > direct
> > > > table-driven translation possible.
> > > 
> > > You could change it to operate on shorts instead of bytes, right?
> > 
> >  Of course, but this will take a lot of my time (5 days of 8-hours hacking 
> > for
> > implementing exactly what you want - hacking libhnj, gathering SGML tables,
> > etc) - I can't spent so much time (remember - I have to implement lynx.cfg
> > settings too  -this is 3 days more). So I prefer not to deal with unicode, I
> > will describe interested people how to add support for utf8-d.c.s 
> > hyphenation 
> > in lynx. Currently, hyphenation won't be ever take place if d.c.s is utf8 or
> That's decidedly half-assed.  And if you can describe to people how to do
> it, you could also do it!

 Describing will consist of some theory of how hyphenation takes place and the 
words "hack it". Mostly modifications will be required to libhnj and UC* - 
nfo about lowercase and uppercase letters should be added, and probably one
place in GridText.c (your redesign will make the life slightly easier, but not
much compared to modifications to libhnj).

>[...> It just doesn't make sense to me to add hyphenation that works only
> in some display character sets when it *could* be done in a more general
> way.  Well that, and I still think adding hyphenation at all makes little
> sense except for hack value.
 A lot of things could be done in more general or clever way (entire lynx for
example). But the "could" shouldn't imply "should". As for hack value - I
really love lynx with justification. Hyphenation makes the display even nicer.
And for majority of people, that don't use utf8 d.c.s., hyphenation will work
well (provided the hyrules for that language exist).
> > HTCJK != NOCJK, so no crashes, just silent rejection. You won't use it, so 
> > you
> > won't suffer.
> But I *will* suffer, if you get Tom to include the code in the general
> lynx, by having to wade through confusing #ifdefs and so on.

 You shouldn't. And there is a big locality in their placement (for example
only one big #ifdef insertion corresponding to hyphenation is located in 
HText_appendCharacter). And I tr to use meaning variable names.
And you can always ask me (as I asked you). 

> > I don't set utf8 d.c.s., so I won't suffer. IMO very few people use utf8
> > d.c.s.
> More will.

 I hope.

> >  I afraid that if I'll try to implement utf8 in a limited period of time,
> > I'll be fired.
> Nobody is imposing a limit.

 There is a deisred limit - time between dev* (and not dev*) releases.
 And I can't hack something for entire week 8 hours per day if I wouldn't use
the results and that I know that the results will be used by no more than 
10 people on earth.

 So, it's time to ask Tom whether he will accept the patch.
 Asking lynx-dev people about whether they use utf8 display character set will
also be helpful. I also like to hear people that used lynx with justification
enabled (since hyphenation doesn't make sense with justification disabled).

 I'm perpared to leave the hy. patch as my local patch (I really don't have
time to make what you ask, Klaus).  Let's it will be better for lynx - whether
to have hyphenation integrated into lynx (that won't activate if display
character set is utf8) or have it integrated (with hopes that the missing
functionality will be implemented sooner when requests for it will be posted
and someone will have time) the people should decide.

 Best regards,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]