Re: [Duplicity-talk] unicode support strategy

duplicity-talk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] unicode support strategy

From:	Radim Tobolka
Subject:	Re: [Duplicity-talk] unicode support strategy
Date:	Sun, 21 Oct 2018 17:40:21 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

Hi Aaron,
sorry for late reply, I needed time to research it.
I'd like to submit two tests for unicode filenames, that fail with current codebase (see attached test_uni.patch). As for the fix, that will depend on how discussion on fsencode/fsencode turns out.

When did you last check the mentioned backport? Which input made it fail? I've been poking into it for some time now and so far it seems to produce correct results. I've summarized my attempts in a testcase (see attached test_os_backport.py). I had to include and modify the backport slightly to allow testing with different encodings, actual tests are at the end of the file after "import pytest" statement.

More importantly, the backport performs successfully full conversion cycle on Markus Kuhn's UTF-8 decoder capability and stress test from https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
This file contains numerous invalid UTF-8 sequences - if the backport did fine with this exhaustive list, I'd say it's pretty stable.

Even if there is some input, that makes it fail, I think, that can be helped. Maybe the entire encoding error handler logic will have to be bypassed and handled purely in Python. Still, I think it's worth the effort as opposed to hunting all the adorned strings, that force implicit ascii decoding of byte filenames, which is bound to fail on 8+bit codepoints.

There is another concern. What will happen in Python 3, when you get b"" adorned string combined with - this time - unicode filename? How do you intend to deal with that?

I will stop my train of thought here - looking forward to your comments.

Best regards,
Radim

test_uni.patch
Description: Text Data

test_os_backport.py
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Duplicity-talk] unicode support strategy, Aaron, 2018/10/02
- Re: [Duplicity-talk] unicode support strategy, Radim Tobolka <=

Prev by Date: Re: [Duplicity-talk] pytest redirect_stdin fixture patch
Next by Date: [Duplicity-talk] support xz compression
Previous by thread: Re: [Duplicity-talk] unicode support strategy
Next by thread: Re: [Duplicity-talk] pytest redirect_stdin fixture patch
Index(es):
- Date
- Thread