|
From: | Radim Tobolka |
Subject: | Re: [Duplicity-talk] unicode support strategy |
Date: | Sun, 21 Oct 2018 17:40:21 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 |
Hi Aaron, When did you last check the mentioned backport? Which input made it fail? I've been poking into it for some time now and so far it seems to produce correct results. I've summarized my attempts in a testcase (see attached test_os_backport.py). I had to include and modify the backport slightly to allow testing with different encodings, actual tests are at the end of the file after "import pytest" statement. More importantly, the backport performs successfully full
conversion cycle on Markus Kuhn's UTF-8 decoder capability and
stress test from https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt Even if there is some input, that makes it fail, I think, that can be helped. Maybe the entire encoding error handler logic will have to be bypassed and handled purely in Python. Still, I think it's worth the effort as opposed to hunting all the adorned strings, that force implicit ascii decoding of byte filenames, which is bound to fail on 8+bit codepoints. There is another concern. What will happen in Python 3, when you get b"" adorned string combined with - this time - unicode filename? How do you intend to deal with that? I will stop my train of thought here - looking forward to your comments.
Best regards,
|
test_uni.patch
Description: Text Data
test_os_backport.py
Description: Text Data
[Prev in Thread] | Current Thread | [Next in Thread] |