[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[savannah-help-public] [sr #109439] Commit notification hook mishandles
From: |
Bob Proulx |
Subject: |
[savannah-help-public] [sr #109439] Commit notification hook mishandles non-ASCII author names |
Date: |
Tue, 9 Jan 2018 14:22:10 -0500 (EST) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36 |
Follow-up Comment #2, sr #109439 (project administration):
I spent some time looking into this problem and the issue is much too
complicated to type into a web page text area. I thought about dragging this
conversation over to the mailing list but decided to give it a shot here
anyway. Glenn is correct about the Latin1 encoding being the problem.
There are many problems. One is that Savannah's web interface is designed
around Latin1 not UTF-8. I don't know what needs to be done to fix the web UI
to migrate it from Latin1 to UTF-8. I didn't try it and am not sure but I am
pretty sure that if I update the database to contain UTF-8 content instead of
Latin1 content then the web page would be the reverse mangling.
https://savannah.gnu.org/users/civodul
Oh, and there is also a lot of content stored in the database in UTF-8 content
too. Even though the database character encoding is specified as Latin1.
Assaf has an entry describing this problem in the TODO list. That mismatch is
also a problem for other data in the other direction.
In any case here are some data factoids just as general information. I will
dump some data from the MySQL database.
vcs0:~# getent passwd civodul | awk -F: '{print$5}' | od -tx1 -c
0000000 4c 75 64 6f 76 69 63 20 43 6f 75 72 74 e8 73 0a
L u d o v i c C o u r t 350 s \n
vcs0:~# getent passwd civodul | awk -F: '{print$5}' | iconv -f LATIN1 -t UTF-8
| od -tx1 -c
0000000 4c 75 64 6f 76 69 63 20 43 6f 75 72 74 c3 a8 73
L u d o v i c C o u r t 303 250 s
0000020 0a
\n
This shows that indeed the content from the database is returned in a Latin1
encoding. This is then used by git-multimail and onward. If it were UTF-8
then from here onward through the email it should all work okay.
At the moment I think a reasonable workaround would be handling this in the
git-multimail wrapper that we are already using with git-multimail. It's all
Python and I am a Perl guy so please forgive me if I don't know Python well
enough to make the changes myself. But if someone were to propose patches to
the python then I think this could be fixed there. Here is raw access to the
git repository including config for git-multimail. The file needing patching
is post-receive. Looking at that file should give a python person enough
information on the process and they should be able to hack in a workaround.
https://git.savannah.gnu.org/git/guix.git/hooks/
If the fromaddr could be passed through "iconv -f LATIN1 -t UTF-8" then I
think the result would work around the current Latin1 issues. Patches
solicited.
And one more thing. We are using git-multimail from just after the 1.0.0 tag
plus 3 with two local changes on top of that from 2014. It's been working
well so there hasn't been a need to update. But if someone were offended that
we aren't using the latest version of git-multimail and was willing to test
out the new version then I'd be happy to work through the upgrade with them.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/support/?109439>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/