bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnulib] addition: fstrcmp.h, fstrcmp.c


From: Bruno Haible
Subject: Re: [Bug-gnulib] addition: fstrcmp.h, fstrcmp.c
Date: Thu, 30 Jan 2003 14:30:06 +0100 (CET)

Paul Eggert writes:

> First, that code has forked from an old version of diffutils, and is
> missing some minor improvements in the latest diffutils version.  It
> shouldn't be a fork: it should be the exact same code as that used in
> diffutils.

I agree. So we have now two users of this code:

  1) GNU diff itself,
  2) fstrcmp,

and soon also:

  3) msgdiff (which needs the routines for sequences of strings, as in
     1, and for sequences of multibyte characters [I want msgdiff to
     produce output similar to emacs ediff mode]).

So we need this code with at least 3 different sequence element types:

  - string, as in the original GNU diff,
  - 'char', as in fstrcmp,
  - mbchar, which will be something like a
    struct { unsigned short count; char bytes[6]; }

Since you probably want to stay in C and not use C++ templates, can
you arrange to put the core of these routines (at least up to and
including 'compareseq') in a gnulib module whose element type can be
parametrized through #defines?

> Of course this will take some code merging, but I can take
> responsibility for that.  I'll do a merge and report back in a
> couple of weeks or so.

Thanks, I'll continue work on other gnulib modules in the meantime.

> Second, as far as the external interface goes:
> 
> > extern double fstrcmp (const char *s1, const char *s2);
> 
> It would be better to have a function that returns an integer
> containing the length of the common subsequence of s1 and s2.  If I
> have such a function, I can easily write fstrcmp, but the converse is
> not true.  Also, it's better to avoid inexact results if it's easy to
> avoid them, which is the case here.  So, the interface should define a
> function something like this:
> 
> /* Return the length of a common subsequence of chars of both BUF1 (of
>    size BUF1SIZE) and BUF2 (of size BUF2SIZE).  The common subsequence
>    need not be contiguous in each buffer.  The returned length need
>    not be maximal, but should be as long as can easily be computed.  */
> size_t commlen (char const *buf1, size_t buf1size,
>                 char const *buf2, size_t buf2size);
> 
> We can of course have fstrcmp as another function, for people who
> prefer its interface as a convenience.  However, it seems to me that
> fstrcmp should be a separate module that invokes commlen

I have no objections to this.

For msgdiff, however, I'll need more than just counts, I'll need
precise element numbers of each hunk. I.e. there should also be a
function like

    struct hunk * diff_sequences (const ELEMENT *seq1, size_t size1,
                                  const ELEMENT *seq2, size_t size2);

where

    struct hunk
      {
        struct hunk *next;
        struct change *changes; // list of changes, see diff.h
      };

> Third, part of the credit for the authorship of the algorithm has been
> removed from the code's commentary.

Oops, I wasn't aware of this. Fixed it now.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]