Re: [Gnu-arch-users] Encoding handling proposal

gnu-arch-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Encoding handling proposal

From:	John Meinel
Subject:	Re: [Gnu-arch-users] Encoding handling proposal
Date:	Sun, 29 Aug 2004 13:13:09 -0500
User-agent:	Mozilla Thunderbird 0.7 (Windows/20040616)

Marcus Sundman wrote:

D) There should be a filter/plugin architecture to enable a transcoding offiles on input and output based on their content-types and user settingsand user-provided parameters.
E) Utilities such as "diff", "merge" and "annotate" (aka "blame") should beprovided by plugins mapped to content-types.

You definitely have some interesting proposals here. One thing to watchout for, though... Once we stop having one type of diff (say a xdeltadiff for binary files, and another type for xml files, etc.) how do wemake (or at least help) everyone have all of these programs.

Maybe it's something that happens outside of tla, but one of the nicethings is that tla uses diff, patch, and tar. Which are reasonablysimple programs that everyone is likely to have.

If I *don't* have the xmldiff/xmlpatch program, then it is likely that Iwon't be able to checkout a project that used them. As I would doubt theformat for the .patch file will be the same as diff/patch. Also, whatabout versions, is xmldiff 1.0 compatible with xmlpatch 2.0? (1 year agoI checked it in, but now I'm getting it back).

Will there be "blessed" diff/transcode programs? Will it only be theones that are bundled inside of tla?

I'm not sure about your statement that files are typically stored in the"local" encoding. The editors I use (gvim, scintilla) allow me tospecify the encoding. (Admittedly it's mostly latin-1, or utf-8, or utf-16).So in that situation, when I write out a file, if I try to check it intoarch, then I have to worry about telling arch *not* to use the localencoding.

I know one of your reasons for wanting encoding to be included is so youcan keep the "official" repository in the official encoding. One way todo that is to put a person in there. So people are allowed to work onany repository they want, but only a few people commit to the "official"one, and they are all knowledgeable about watching out for file encodingissues.

F) Commit comments and other string attributes should use UTF-8.
G) Filenames and paths should use UTF-8 in the repository, and be transcodedto the proper encoding when a client accesses the local file system.

This I do agree with. But I seem to recall that Tom's position is peoplewill probably want the files in local encoding. So that


        cat <patch-log>

Will be readable on that system.

I remember a big discussion about this in the past, but I don't think itwas thoroughly resolved.

I think Tom designed hackerlab such that you deal with characters, andnever know how many bytes/codepoints/etc is used underneath.

[...]

D) Since editors and other programmers' tools tend to use whatever the localsystem encoding happens to be and a project might include people withdifferent systems there needs to be some transcoding of most text files.The contents of files whose "Auto-Filter" attribute is set to "true" will bestored UTF-8 encoded with U+2028 newlines in the repository and transcodedfrom/to the local encoding and local newlines on input/output. The contentsof files whose "Auto-Filter" attribute is set to "false" will not betranscoded on input/output.Often the proper local encoding and line breaks can be detectedautomatically, but the user should be able to override the auto-detectionin his settings and/or by a parameter to the cm client.

This is where I feel "use the local system encoding" may not beperfectly true. But it is possible that "Auto-Filter" will handle this.

E) E.g. if two files with the content-type "application/vnd.sun.xml.writer"are diffed the system should use a diff plugin that knows how to interpretOpenOffice.org Writer documents. If no such plugin is found it defaults tothe standard diff which regards the files as byte blobs.

This is where the problem with plugins exists. On *my* machine, I havethe application/vnd.sun.xml.writer diff program. You don't have it on*your* machine. You can no longer read my archive.

If you just treat everything as blobs, at least you can get version 1and version 10, and create your own diff, and manually patch so that youget nice context-sensitive diffs.

My personal feeling is that we could do this 2 ways. Have tla generatethe standard diff and the special one. Clients who understand thespecial format use it, else you can rely on the standard one. (This wasproposed for xdelta use with pure binary files.)

The other way is to have tla start to incorporate more diff/patchprograms. Keep in mind that adding a new diff/patch effectively changesthe archive format, which is not something to do lightly.


I favor the former, though it doesn't allow for compact archive size.

[...]

Notice that there is no distinction between "text files" and "binary files".The same system that converts between different text encodings might justas well be used to convert between different "raw" audio formats. Just addthe appropriate plugin/filter and you're set.


Interesting idea, but I have to wonder if it is what you would really want.


- Marcus Sundman

Overall, I think you raise some good points. There is just a lot of carewith something that could potentially fragment repositories.


John
=:->

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

[Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/29
- Re: [Gnu-arch-users] Encoding handling proposal, John Meinel <=
  - Re: [Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/29
    - Re: [Gnu-arch-users] Encoding handling proposal, Charles Duffy, 2004/08/29
    - Re: [Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/29
    - Re: [Gnu-arch-users] Encoding handling proposal, Charles Duffy, 2004/08/29
    - Re: [Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/29
    - Re: [Gnu-arch-users] Encoding handling proposal, Charles Duffy, 2004/08/30
    - Re: [Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/30
    - Re: [Gnu-arch-users] Encoding handling proposal, Charles Duffy, 2004/08/30
- Re: [Gnu-arch-users] Encoding handling proposal, Alexey N. Solofnenko, 2004/08/29
  - Re: [Gnu-arch-users] Encoding handling proposal, Marcus Sundman, 2004/08/29

Prev by Date: Re: [Gnu-arch-users] Inexact patching and directory renames
Next by Date: Re: [Gnu-arch-users] Re: How does arch/tla handle encodings?
Previous by thread: [Gnu-arch-users] Encoding handling proposal
Next by thread: Re: [Gnu-arch-users] Encoding handling proposal
Index(es):
- Date
- Thread