[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Markup mode for wdiff to compare XML/HTML files

From: Bert Bos
Subject: Markup mode for wdiff to compare XML/HTML files
Date: Tue, 10 Mar 2020 00:14:29 +0100
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.5.0

I made an option -m (--ignore-markup) for wdiff to more easily compare
XML or HTML files (see attached). Any interest?

Longer story:

Some twenty years ago, I adapted an old version of wdiff to compare HTML
or XML files instead of plain text file. I still use it. But last week I
decided to see if I couldn't port my modification to a newer version of
wdiff and also improve it a bit.

This time, rather than making a program that only works for HTML and XML
files, I added an option -m to wdiff.

It is still a _word_ diff, even when it compares HTML/XML files. It
doesn't tell you if the markup changed, only the content. It just
removes the markup and reinserts it. (It copies the markup from the
second argument, by default.)

It is also not guaranteed to produce valid HTML. If the input is
well-formed XML, so is the output, but it is not necessarily _valid_.
E.g., the output may contain change markers (by default <del> and <ins>
tags) where they are normally not allowed. You may end up, e.g., with a
<del> tag inside a <title> or <style> element.

Still, I find it useful. (And you could make a front end that splits
HTML in a head and a body part and only applies wdiff to the body.
That's what W3C's online htmldiff[1] does, with a program similar to wdiff.)

I attached a patch. It contains code for wdiff.c, text for the manual,
and a regression test.

[1] https://github.com/w3c/htmldiff-ui/

  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  address@hidden                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

Attachment: wdiff-m.patch
Description: Text document

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]