[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #20932] locale2lang-0.1 - BUG + fix: fallback to language_only extr

From: boud
Subject: [bug #20932] locale2lang-0.1 - BUG + fix: fallback to language_only extracted from Accept-Language http header is needed (fwd)
Date: Fri, 31 Aug 2007 01:59:59 +0200 (CEST)

hi samizdat-devel,

i think the bug + fix below could solve a practical problem for many
non-english speaking indymedia collectives or other independent media
groups: "activist spam" which someone posts as identical articles, in
English, on several dozen different local indymedia sites. This sort
of article is sometimes serious and sometimes more like conspiracy
theory, but AFAIK the people doing it usually have "en-US"  in their
browser http Accept-Language header.  If the mono option is enabled
by sysadmin and the user chooses this option:
and if his/her preferred language is non-English, then s/he will not even notice the presence of the "activist spam" article.

This could possibly imply less intervention or less urgent intervention
is needed by moderators (depending on the editorial policy, of course):
the decision and filtering of what to read (ignoring non-preferred languages
rather than just not preferring them) is made by the reader, not by
an editorial collective de facto deciding on behalf of the whole local
activist community.  (Of course, ignoring real spam is not a good idea.
For that we have the Antispam class in antispam.rb .)

Anyway, read on if you're interested. :)


[bug #20932] locale2lang-0.1 - BUG + fix: fallback to language_only
    extracted from Accept-Language http header is needed


                 Summary: locale2lang-0.1 - BUG + fix: fallback to
language_only extracted from Accept-Language http header is needed
                 Project: Samizdat
            Submitted by: boud
            Submitted on: Wednesday 08/29/2007 at 22:04
                Category: None
                Severity: 3 - Normal
                  Status: Works For Me
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any



Even though RFC 2616 recommends that user clients (e.g. firefox)
should recommend to their users to have a backup generic language
without a country code (e.g. "en" in addition to "en-US"),
in practice most users do not do this.

In particular, for non-english language samizdat sites, this
means that people who have only "en-US" sent by their browser
end up getting the default local language of the site. Their
article then gets published with message.language = the local
language, not "en", since formally speaking, they state that
they prefer "en-US" to the local language, but they are not
interested in "en".

This implies that if people want to add a local translation
rather than hiding an article, then moderator intervention
is required to change the language (unless the user chose
open editing).

Moreover, the monolanguage patch
(still under
development) will fail to exclude these type of articles under
the mono option, since their language is wrongly tagged (except
for a pedantic interpretation of their request).

For these reasons, i'm putting this as a bug (with a proposed
fix) rather than a patch.

This requires a reasonably modern version of ruby gettext, e.g.
debian 1.7.0-1 or later.  Copying gettext/locale_object.rb
into an older installation and using an appropriate require
statement is a hack to avoid a full installation of a recent

The idea is that if a requested accept-language in the list
is not found, then parse off the language part of it and try
that instead. This could potentially create multiple entries
of the same language, but i suspect that shouldn't be a

--- s070818/samizdat/lib/samizdat/engine/request.rb     2007-08-14
01:16:53.000000000 +0200
+++ /usr/lib/ruby/1.8/samizdat/engine/request.rb        2007-08-29
23:02:06.869866760 +0200
@@ -165,8 +173,17 @@
       accept.scan(/([^ ,;]+)(?:;q=([^ ,;]+))?/).collect {|l, q|
         [l, (q ? q.to_f : 1.0)]
       }.sort_by {|l, q| -q }.each {|l, q|
-        @accept_language.push l if config_lang.include? l
+#        @accept_language.push l if config_lang.include? l
+        if config_lang.include? l
+          @accept_language.push l
+        else
+          # try converting full locale (language tag) to ISO-639 language
+          #
+          lang_only =
+          @accept_language.push lang_only if config_lang.include? lang_only
+        end
     # lang cookie overrides Accept-Language
     lang = cookie('lang') and config_lang.include? lang and
       @accept_language.unshift lang

The relations between human languages and how close or distant
they are are well studied. A measure of the distance between
different languages could potentially be used as a backup to
find the likely closest language that a user would prefer
rather than just taking what is considered the "language"
part of the locale/Accept-Language string.

Since the "narratives" which claim different national identities
often try to claim sharp distinctions between closely related
languages, this could potentially be a quite politically
sensitive issue. This is not surprising, and is not IMHO an
argument against doing this: an RDF engine specifically aimed
for grassroots, non-authoritarian media is necessarily going
to challenge artificial linguistic barriers if it's to get
somewhere near doing its task.

In any case, users with their own notions of language preferences
would still be able to state this by all the presently available
methods; adding a language metric would only be used as a

The Locale:: module could probably also be used to check the
config files for valid languages and warn about invalid


File Attachments:

Date: Wednesday 08/29/2007 at 22:04  Name: 070829_locale2lang-0.1  Size: 997B
  By: boud



Reply to this item at:


  Message sent via/by Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]