[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-developers] meta-data and keyword encoding [Was: Music insertion

From: Christian Grothoff
Subject: [GNUnet-developers] meta-data and keyword encoding [Was: Music insertion]
Date: Sun, 5 Dec 2004 15:09:47 -0500
User-agent: KMail/1.7.1

On Friday 03 December 2004 17:56, N. Durner wrote:
> Hi,
> > To find precisely the music files and albums, I use keywords like
> > <title:>foo or <encoding:>ogg.
> There's a request for a date field (Mantis #789), too.
> Perhaps we should put all the meta-data into an extensible format with
> certain fixed and well-known fields (the ones you mentioned) in GNUnet 0.7.

Actually, I was thinking of having a format with a variable set of fields, but 
with well-known field types (using the list of libextractor, extended by some 
more entries).  

> > Is this a problem or not ?
> Rather not.
> > Other question : are keywords in UTF-8 ?
> Not yet. It's planed for 0.7.
> > And which encoding does
> > libextractor use ?
> Plain ASCII using your locale.

Not quite.  libextractor currently returns whatever was in the file and 
totally ignores character sets.  libextractor _should_ be changed to use 
UTF-8 everywhere (convert if necessary, guess format if file format does not 
specify, when writing to the console convert from UTF-8 to locale).  But 
that's not currently the case, but I'd definitively like to have LE use 
UTF-8.  So if anyone wants to even start on this, please let me know.

> > In other words : should I convert keywords before
> > inserting them ?
> It doesn't make too much sense at the moment.

It does.  It will be the future default, so I would recommend new code to 
convert to UTF-8 if possible. 

> I have thought about a module for libExtractor that converts special
> national characters to an alternative representation. For example, the
> German umlauts ä, ö and ü can be written as ae, oe and ue. Is there a
> similiar rule for other characters like "ç" (c cedille)?
> This would be a solution to the problem that I usually don't know how to
> type these chars using a foreign keyboard layout.

Character input is a different (UI) issue.  I would prefer to just handle 
character sets correctly and nicely in GNUnet/LE, and that means UTF-8. As 
for typing umlauts on a non-German keyboard, that does not feel like a 
problem that we should even try to address (it gets far too complicated once 
you add more and more languages -- and I have the impression that most people 
had to figure out how to type their native language on an English keyboard, 
it's mostly us Germans that are lazy and use "ae" :-).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]