microdc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [microdc-devel] German Umlauts


From: Vladimir Chugunov
Subject: Re: [microdc-devel] German Umlauts
Date: Thu, 14 Dec 2006 08:50:48 +0300
User-agent: Thunderbird 1.5.0.8 (Windows/20061025)

Steffen Schulz wrote:

On 061213 at 21:40, Vladimir Chugunov wrote:
> Probably I'll disappoint you but according to dcpp client used in
> StrongDC it looks like the transmition between hub and client is made
> using current codepage set in Windows.

Hub and client? You mean client system and client software. Yes, it
would be typical for windows software to not care at all and just send
what they got...

I mean client software. It looks strange for me because according to dcpp client sources I have in StrongDC package the StrongDC internally stores all strings in UTF8 but translate them to active codepage before sending to a hub. I have the only one explanation for this behavior - it was done for backward compatibility.

> there is no way to automatically detect encoding between CP1250 and
> CP1251 codepage for example. It can be implemented just in some
> special cases like for example UTF8 encoding detection.

I think so, too.

> > This may be an interesting feature. I think it would already help
> > alot if at least utf8 is recognised automagically. Hubs using
> > different old charsets should not be that common, are they?
> Unfortunately hub has no encoding at all, I think. Just because it
> doesn't need to understand the client messages.

Yes, I wasn't very accurate(one could go as far as to say I was wrong).
I meant the set of client connected to the hub.

Windows guys will indeed just send whatever they like. But as Hermann
said, he can see "some" umlauts, so some people are obviously sending
in utf8.

The point I wanted to make is that this may indeed be pretty common, as
(windows-)clients are getting modernized to use utf8. A set of clients
connected to a hub with different local, old charsets by contrast
should not be so common.

This is because utf8-detection may be interesting. Use the encoding
that should be used by legacy systems and detect utf8 automatically.
But its an ungly hack and I don't really care..

I don't like this solution too much because it looks like just workaround. What charset should we use in this case for sending a message to the public chat? According to the dcpp.net documentation there is a flag in the chat string what mark utf8 encoded strings. But I've never seen it before so if somebody can provide me a complete log file with such *mixed* chat I'll implement the correct algorithm in the microdc2. (To make such log you have to specify "debug" value in the log variable in addition to the values you have there already). Of course I can just believe documentation and implement it in way the documentation proposes but I prefer to check it first.

> P.S. Hermann, try to set the hub_charset variable to CP1250 or CP1252
> value.

This may break the umlauts for case where they were shown correctly.
Its simply stupid protocol design to not specify the encoding..

Sure, it was easily to specify utf8 encoding for protocol level and then have no problem with national characters at all. Unfortunately we are powerless to make such changes because the implementation is dictated by this weak specification.

Regards, Vladimir.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]