Re: Text collation

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Text collation

From:	Ludovic Courtès
Subject:	Re: Text collation
Date:	Sun, 10 Dec 2006 13:30:00 +0100
User-agent:	Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux)

Hi,

Just a few notes about your remarks regarding `(ice-9 i18n)'.  A patch
will follow soon.

Kevin Ryde <address@hidden> writes:

>> address@hidden The ice-9 i18n Module
>
> See if you can think of a better section name.

Actually, since we're not going to include this module in 1.8, I think
I'd be in favor of moving the `gettext'-related functions in `(ice-9
i18n)'.  Then the doc would have to be rearranged accordingly.

>> address@hidden {Scheme Procedure} string-locale<? s1 s2 [locale]
>> address@hidden {Scheme Procedure} string-locale>? s1 s2 [locale]
>> address@hidden {Scheme Procedure} string-locale-ci<? s1 s2 [locale]
>> address@hidden {Scheme Procedure} string-locale-ci>? s1 s2 [locale]
>> address@hidden {Scheme Procedure} string-locale-ci=? s1 s2 [locale]
>
> These could be described in one block I think, to avoid five very
> similar descriptions.  Likewise the char ones.

Yes, done.

>> +...  Note that SRFI-13 provides procedures that
>> +look similar (@pxref{Alphabetic Case Mapping}).  However, the SRFI-13
>> +procedures are locale-independent.
>
> That's the intention of the srfi I guess, but it's not true currently
> is it?  Don't they use toupper() and therefore get whatever nonsense
> the current setlocale() gives.  Perhaps better leave the description
> of srfi-13 to that section.

Perhaps, but this is undocumented behavior.  :-)

> Do you need a caveat about multibyte characters there, for now?  Like
> "Note that in the current implementation Guile has no notion of
> multibyte characters and in a multibyte locale characters may not be
> converted correctly."

Yes.

>> address@hidden {Scheme Procedure} locale-string->integer str [base [locale]]
>> address@hidden {Scheme Procedure} locale-string->inexact str [locale]
>
> I think you should cross-reference strtol and strtod here, since their
> parsing is rather idiosyncratic.  I'd even be a bit tempted to name
> them strtol and strtod in guile, to make it clear they're only one
> possible way of parsing.  Except those names aren't very nice ...

I added a cross-ref to glibc's `strto{dl}', but I'm not willing to
change the names to the C library names (I'm not sure that's what you
were suggesting though).

>> +...  Return two values:
>
> Consider @pxref{Multiple Values}, since multi-values are (thankfully)
> fairly rare.

Yes, done.

>> -    scmconfig.h.top gettext.h
>> +    scmconfig.h.top libgettext.h
>
> I don't think that's good.  Best leave gettext.h the gettext one, and
> use another name for guile.  Gettext got there first, and it doesn't
> really matter which guile header has which prototypes.

The former `i18n.c' (which contained only `gettext'-related code) was
renamed to `gettext.c' which seems more appropriate.  Thus, for
consistency, the corresponding header file had to be renamed from
`i18n.h' to `gettext.h'.  Since `gettext.h' was already used for the one
coming from Gettext, it had to be renamed.  `libgettext.h' doesn't seem
such a bad name to me.

>> +/* This mutex is used to serialize invocations of `setlocale ()' on non-GNU
>> +   systems (i.e., systems where a reentrant locale API is not available).
>> +   See `i18n.c' for details.  */
>> +scm_i_pthread_mutex_t scm_i_locale_mutex;
>
> There's an scm_i_misc_mutex for use when protection is (or should be)
> rarely needed.

It seems more robust to use a dedicated mutex.

>> +/* Provide the locale category masks as found in glibc (copied from
>> +   <locale.h> as found in glibc 2.3.6).  This must be kept in sync with
>> +   `locale-categories.h'.  */
>> +# define LC_CTYPE_MASK              (1 << LC_CTYPE)
>> +# define LC_COLLATE_MASK    (1 << LC_COLLATE)
>> +# define LC_MESSAGES_MASK   (1 << LC_MESSAGES)
>> +# define LC_MONETARY_MASK   (1 << LC_MONETARY)
>> +# define LC_NUMERIC_MASK    (1 << LC_NUMERIC)
>> +# define LC_TIME_MASK               (1 << LC_TIME)
>
> I think you should put some privately selected bits there, not depend
> on LC_CTYPE etc being in range 0 to 31.

Good point, done.

>> +/* Alias for glibc's locale type.  */
>> +typedef locale_t scm_t_locale;
>
> I suppose the emulation could provide locale_t.  Might make it hard to
> exercise on an actual gnu system.  A #define locale_t would likely be
> ok.

It seems safer to make changes only in the `scm_' name space.  As a
matter of fact, I just discovered that Darwin now implements the
`locale_t' "GNU" API (I suppose that change is quite recent):

  
http://developer.apple.com/documentation/Darwin/Reference/ManPages/man3/newlocale.3.html

Thus, defining `locale_t', `newlocale', et al. internally would have
been a potential source of problems when building on that platform.

>> +#ifdef USE_GNU_LOCALE_API
>> +  freelocale ((locale_t)c_locale);
>> +#else
>> +  c_locale->base_locale = SCM_UNDEFINED;
>> +  free (c_locale->locale_name);
>> +  scm_gc_free (c_locale, sizeof (* c_locale), "locale");
>> +#endif
>
> A possibility there, and with other funcs, would be to implement a
> compatible freelocale(), instead of sticking conditionals in each
> usage.

(See above).

>> +#ifdef USE_GNU_LOCALE_API
>> +
>> +  c_locale = newlocale (c_category_mask, c_locale_name, c_base_locale);
>> +  if (!c_locale)
>> +    locale = SCM_BOOL_F;
>
> Your docs call for an exception on unknown locale don't they?

Indeed, fixed.

> And should you tell the gc something about the size of a locale_t, and
> perhaps extra for its underlying data?  To approximate memory used,
> for the gc triggers.

Yes, but `locale_t' is typically a pointer type, and the size of the
struct pointed to by `locale_t' could be opaque (although that is
currently not the case with glibc).  So we could provide a guess for the
underlying object size, but maybe we can also just safely ignore it?

>> +void
>> +scm_init_i18n ()
>> +{
>> +  scm_add_feature ("ice-9-i18n");
>
> Is there any point adding a feature after the module is loaded? :)

Indeed, removed.  :-)

>> +(define (under-french-locale-or-unresolved thunk)
>> +  ;; On non-GNU systems, an exception may be raised only when the locale is
>> +  ;; actually used rather than at `make-locale'-time.  Thus, we must guard
>> +  ;; against both.
>> +  (if %french-locale
>> +      (catch 'system-error thunk
>> +             (lambda (key . args)
>> +               (throw 'unresolved)))
>> +      (throw 'unresolved)))
>
> Do you mean 'unsupported rather than 'unresolved, when fr_FR isn't
> available from the system?

I really meant "unresolved", in the sense that the test cannot be run
when `fr_FR' isn't available.

>> +(with-test-prefix "number parsing"
>
> Some french number parsing too?  Just to show there's a point to
> locale dependent parsing :).

Done.

Thanks for your detailed review!

Ludovic.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Text collation, Kevin Ryde, 2006/12/02
- Re: Text collation, Ludovic Courtès, 2006/12/04
  - Re: Text collation, Kevin Ryde, 2006/12/04
    - Re: Text collation, Carl Witty, 2006/12/05
    - Re: Text collation, Kevin Ryde, 2006/12/05
    - Re: Text collation, Carl Witty, 2006/12/05
- Re: Text collation, Kevin Ryde, 2006/12/02
- Re: Text collation, Kevin Ryde, 2006/12/04
- Re: Text collation, Ludovic Courtès <=
  - Re: Text collation, Kevin Ryde, 2006/12/11
    - Re: Text collation, Ludovic Courtès, 2006/12/12
    - Re: Text collation, Kevin Ryde, 2006/12/12
    - Re: Text collation, Ludovic Courtès, 2006/12/13
    - Re: Text collation, Neil Jerram, 2006/12/31
    - Re: Text collation, Kevin Ryde, 2006/12/15
- Re: Text collation, Kevin Ryde, 2006/12/12
  - Re: Text collation, Ludovic Courtès, 2006/12/13
- Re: Text collation, Kevin Ryde, 2006/12/12
  - Re: Text collation, Ludovic Courtès, 2006/12/13

Prev by Date: Re: top-repl priority of guile module
Next by Date: More i18n
Previous by thread: Re: Text collation
Next by thread: Re: Text collation
Index(es):
- Date
- Thread