monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Filesystem normalisation


From: Stephen Leake
Subject: Re: [Monotone-devel] Filesystem normalisation
Date: Mon, 01 Dec 2008 09:38:16 -0500
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)

Peter Stirling <address@hidden> writes:

> On 29 Nov 2008, at 11:48 am, Stephen Leake wrote:
>
>> Peter Stirling <address@hidden> writes:
>>
>>> I've written a small module (it's one 'public' function and some
>>> support functions to make it clearer as to what it's doing) that
>>> solves the 'mtn add apple Apple' on Mac OS X.
>>
>> What exactly is the problem? Is Mac OS X file system case insensitive?
>
> It is insensitive by default, and a number of programs won't work
> if you set it to case-sensitive, so the vast majority of people
> will not set it that way. According to
> http://www.venge.net/mtn-wiki/CaseInsensitiveFilesystems
> case insensitivity is also an issue on windows.

Ok. Yes, there are case sensitivity issues on Windows.

> Also, HFS+ stores files in NFD utf-16 (it makes case insensitive
> sorting/searching much quicker), this means that you will also have
> issues with filenames entered as precomposed unicode (and my patch
> covers that case too).

As I understand it, your patch provides a hook that lets a user
function rewrite a file path. So the hook would have to handle that
case. I didn't see the actual hook code in your attachment; did I
miss it somewhere?

>>> The idea is to have a function
>>>     void plaform_fs_normalisation_hook( std::string & const in,
>>> std::string & out )
>>> that can be called from
>>>     args_to_paths() (cmd.hh:146 on nvm
>>> 93e7e626c6ee8db33a21eabcfab00d97f1bed8c2).
>>
>> This is what paths::to_external is supposed to do, mostly. Does that
>> not work here?
>>
>> On the other hand, args should already be in the fs format, so I don't
>> understand why you need to do anything.
>>
>> "paths" in mtn are supposed to be in a filesystem-neutral internal
>> format; to_external converts them to the current filesystem format.
>>
>
> grep -r to_external . returns nothing, I assume that you meant
>
>       string
>       any_path::as_external() const

Yes, sorry for the wrong name.

>       {
>       #ifdef __APPLE__
>         // on OS X paths for the filesystem/kernel are UTF-8
> encoded,  regardless of
>         // locale.
>         return data;
>       #else
>         // on normal systems we actually have some work to do, alas.
>         // not much, though, because utf8_to_system_string does all
> the  hard work.
>         // it is carefully optimized.  do not screw it up.
>         external out;
>         utf8_to_system_strict(utf8(data), out);
>         return out();
>       #endif
>       }
>
> (paths.cc:639)
>
> Which is, as should be obvious, not all that useful. 

Ah; on __APPLE__ it does nothing. But if you put the appropriate code
there, would that solve your problem? That should be simpler than
adding a hook.

> as_external() wasn't anywhere obvious in normalise_external_path(),
> so even if it did the right thing you could still add the same file
> several times.

normalize_external_path is the inverse (approximately) of as_external.
So it should contain the inverse path-conversion code for APPLE.

> To demonstrate what I'm trying to solve:
>
> <snip>

Thanks for the clear example.
>
> # create a file add it to the workspace twice with
> # the same name and verify that both names refer to the same contents
> ##############
> g5:temp peter$ echo foo > apple
> g5:temp peter$ mtn add apple APPLE
> mtn: adding APPLE to workspace manifest
> mtn: adding apple to workspace manifest

I agree it would be _very_ nice if mtn gave some sort of error here. I
gather your proposed hook would change "APPLE" to "apple" so the paths
would collide, and mtn would say something like "'apple' already
accounted for" on the second file.

One way to make that happen is for mtn normalized paths to be all
lowercase on case-insensitive file systems. However, I prefer
case-insensitive but case-preserving tools, so mtn would have to store
both the original user path, and the normalized path. That's a major
complication to the current code.

Another way is to always to path comparisons in a case-insensitive
way, on case-insensitive file systems. That preserves the user case.

> g5:temp peter$ mtn update -r p:
> mtn: expanding selection 'p:'
> mtn: expanded to 'c28046695cac17a722edf6a591d5f52e13d331b7'
> mtn: selected update target c28046695cac17a722edf6a591d5f52e13d331b7
> mtn: [left]  6d7859995aaff97d24086a456d0d9e5cc22c44d4
> mtn: [right] c28046695cac17a722edf6a591d5f52e13d331b7
> mtn: modifying APPLE
> mtn: error: content of file 'apple' has changed, not overwriting
>
> # since that didn't work, let's try checking it out into a new
> # workspace
> ##############
> g5:temp peter$ cd ..
> g5:temp peter$ mtn co --db=temp.mtn --branch=temp temp2
> mtn: misuse: rename target 'apple' already exists
>
> # nuke from orbit
> ##############
> g5:temp peter$ cd ..
> g5:test_paths peter$ rm -r temp
>
>>> If this code is eventually included in monotone, then it would be
>>> possible to identify 2 files in a manifest that will conflict on
>>> disk,
>>
>> This is done now, although not in a very friendly way.
>>
>
> As you can read above, there's no hint of friendliness
> in the diagnostic

Well, in the first case, it mentions both "APPLE" and "apple"; how
would you make that better?

In the second case, the it should mention "APPLE" in the error
message.

One way to make this better is to extend the new 'conflicts' commands
to handle update conflicts as well as merge conflicts.

>>> Currently it is appears impossible to checkout a revision that has
>>> conflicting filenames.
>>
>> yes, if they conflict according to the local filesystem.
>>
>
> And you don't think this is a problem?

I do think it's a problem. It is not at all clear how to fix it. 

For example, I can easily have several different file systems
accessible to mtn, via NFS. So just knowing I'm on Windows doesn't
tell me the case sensitivity of the file system. It would probably be
a close approximation on most projects.

The general "fix" is to not commit files that would conflict on any
filesystem that they might be checked out on. Each project needs to
determine the set of acceptable file systems, and thus acceptable file
names. 

Currently, enforcing that set of acceptable file names is a manual
process; it would be nice if mtn provided tools to enforce it.

Providing a user hook that returns the case sensitivity of a path
would work, if the internal mtn code handled case sensitivity
correctly (which it doesn't now). 

Your hook goes beyond that by allowing an arbitrary conversion for any
path. That is clearly more general. If we provide an example hook that
works for Mac OSX and Windows filesystems, it's probably a good idea.

The inverse conversion also needs to be available as another hook.
That may require storing a translation table or some other internal
state.

The hooks need to be called from the right places in the code. I think
that's in 'as_external' and 'normalize_filename', but we'd have to do
a thorough review to be sure.

> It may just have been a side-effect of the point-by-point examination,
> but your email came across as quite hostile. 

I appologize for that. I could not figure out what you were
trying to accomplish, since I didn't know what the "mtn add apple
APPLE" problem was. I suspect the hostility you detect is just my
frustration.

> I'm only trying to improve a defect that affects OS X, which has the
> potential as a start toward windows too. To that end, can someone
> point me as to where I should change the build configuration scripts
> to add this file, and the required linker argument? 

For a test that you won't commit, you can add fs.cc to MOST_SOURCES in
Makefile.am.

But since this file is currently MAC OSX specific, to be fully correct
you'll need to add a new variable MAXOSX_PLATFORM_SOURCES in
Makefile.am, and use it in a way similar to WIN32_PLATFORM_SOURCES.

The linker argument should go in mtn_LDFLAGS in Makefile.am (I think).
If it's a MACOSX specific argument, that would need to be in an 'if'.

Currently the code uses "__APPLE__" as a synonym for MACOSX. Since
there are other operating systems for Apple hardware (and probably
will be still more in the future), I think MACOSX is better.

> I need that so that I can check whether it works (ie hook in the
> right place), and hopefully write some tests to show that it works.

That would be a good first step. Then we can try to extend the test
cases to the situations identified in the other posts on this thread,
and experiment with moving the hook into 'as_external'.

If you want to commit experimental code, we'll need a branch.

-- 
-- Stephe




reply via email to

[Prev in Thread] Current Thread [Next in Thread]