[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnulib] addition: fstrcmp.h, fstrcmp.c
From: |
Bruno Haible |
Subject: |
Re: [Bug-gnulib] addition: fstrcmp.h, fstrcmp.c |
Date: |
Thu, 30 Jan 2003 14:30:06 +0100 (CET) |
Paul Eggert writes:
> First, that code has forked from an old version of diffutils, and is
> missing some minor improvements in the latest diffutils version. It
> shouldn't be a fork: it should be the exact same code as that used in
> diffutils.
I agree. So we have now two users of this code:
1) GNU diff itself,
2) fstrcmp,
and soon also:
3) msgdiff (which needs the routines for sequences of strings, as in
1, and for sequences of multibyte characters [I want msgdiff to
produce output similar to emacs ediff mode]).
So we need this code with at least 3 different sequence element types:
- string, as in the original GNU diff,
- 'char', as in fstrcmp,
- mbchar, which will be something like a
struct { unsigned short count; char bytes[6]; }
Since you probably want to stay in C and not use C++ templates, can
you arrange to put the core of these routines (at least up to and
including 'compareseq') in a gnulib module whose element type can be
parametrized through #defines?
> Of course this will take some code merging, but I can take
> responsibility for that. I'll do a merge and report back in a
> couple of weeks or so.
Thanks, I'll continue work on other gnulib modules in the meantime.
> Second, as far as the external interface goes:
>
> > extern double fstrcmp (const char *s1, const char *s2);
>
> It would be better to have a function that returns an integer
> containing the length of the common subsequence of s1 and s2. If I
> have such a function, I can easily write fstrcmp, but the converse is
> not true. Also, it's better to avoid inexact results if it's easy to
> avoid them, which is the case here. So, the interface should define a
> function something like this:
>
> /* Return the length of a common subsequence of chars of both BUF1 (of
> size BUF1SIZE) and BUF2 (of size BUF2SIZE). The common subsequence
> need not be contiguous in each buffer. The returned length need
> not be maximal, but should be as long as can easily be computed. */
> size_t commlen (char const *buf1, size_t buf1size,
> char const *buf2, size_t buf2size);
>
> We can of course have fstrcmp as another function, for people who
> prefer its interface as a convenience. However, it seems to me that
> fstrcmp should be a separate module that invokes commlen
I have no objections to this.
For msgdiff, however, I'll need more than just counts, I'll need
precise element numbers of each hunk. I.e. there should also be a
function like
struct hunk * diff_sequences (const ELEMENT *seq1, size_t size1,
const ELEMENT *seq2, size_t size2);
where
struct hunk
{
struct hunk *next;
struct change *changes; // list of changes, see diff.h
};
> Third, part of the credit for the authorship of the algorithm has been
> removed from the code's commentary.
Oops, I wasn't aware of this. Fixed it now.
Bruno