[Fwd: Re: [lmi] Which is the best C++ wrapper for libxml2?]

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: Re: [lmi] Which is the best C++ wrapper for libxml2?]

From:	Greg Chicares
Subject:	[Fwd: Re: [lmi] Which is the best C++ wrapper for libxml2?]
Date:	Mon, 27 Nov 2006 16:47:53 +0000
User-agent:	Thunderbird 1.5.0.4 (Windows/20060516)

[my ISP lost some outgoing email yesterday, apparently including this:]

-------- Original Message --------
Subject: Re: [lmi] Which is the best C++ wrapper for libxml2?
Date: Sun, 26 Nov 2006 17:23:23 +0000
From: Greg Chicares <address@hidden>
To: Let me illustrate... <address@hidden>
References: <address@hidden>    <address@hidden>        <address@hidden>        
<address@hidden>
<address@hidden>        <address@hidden><address@hidden>        
<address@hidden> <address@hidden>

On 2006-11-8 14:42 UTC, Vadim Zeitlin wrote:
> > On Wed, 08 Nov 2006 04:27:01 +0000 Greg Chicares <address@hidden> wrote:
[snip example comparing two libraries]
> >  The code seems pretty similar to me in both cases to be honest...

The difference is the responsibility they place on a programmer
who uses one library or the other. This:
  parser.get_document()->get_root_node() // libxml++
dereferences a pointer without checking it, and therefore looks
unsafe to me, whereas this:
  parser.get_document().get_root_node()  // xmlwrapp
would not raise that code-review objection. When a C++ library
encounters an unrecoverable error, I don't want it to give me a
null pointer that I might carelessly dereference with disastrous
consequences. I'd much rather get an exception that I cannot
accidentally ignore.

Here's a libxml++ testcase that actually does segfault:

  xmlpp::Document d;
  std::string name = d.get_root_node()->get_name();

whereas this xmlwrapp testcase does not:

  xml::document d;
  std::string name = d.get_root_node().get_name();

To me at least, the return type expresses a contract:

  // Returns either a pointer to an object, or a null pointer.
  // Warning: don't use it without checking for null first.
  NodeClass* DocumentClass::get_root_node()
  {
    NodeClass* r = call_some_libxml2_function();
    // It could be null, but that's the user's problem.
    return r;
  }

versus

  // Returns an object. It's safe to use.
  NodeClass& DocumentClass::get_root_node()
  {
    NodeClass* r = call_some_libxml2_function();
    // If it's null, then it would be a grave error to
    // dereference it here, so either throw:
    if(NULL == r) throw "No root node.";
    // or create a dummy root node in the constructor
    // so that it can't be null.
    return *r;
  }

If the pointer is dereferenced inside the library, then the
library author accepts a duty to confront the problem. Returning
a reference doesn't guarantee that the author fulfilled that
duty, but it does make the responsibility plain, so that any
failure is clearly a serious library defect that must be fixed.

> > But maybe this can be mitigated by submitting patches
> > to libxml++ throwing an exception instead of returning NULL pointers?

I don't think that alone is adequate:

  // Return a pointer to an object. Maintainers: make sure it's
  // not null, because the documentation promises that to users!
  NodeClass* DocumentClass::get_root_node()
  {
    NodeClass* r = call_some_libxml2_function();
    if(NULL == r) throw "No root node.";
    // Here, a later maintainer adds some other code that might
    // make the pointer null, and doesn't read the comment above,
    // which after all might be in a header separate from the
    // implementation.
    r = do_something_else_with_the_pointer(r);
    return r; // Now it could be null.
  }

I want a reference returned. I want the contract expressed in the
physical design, not in a comment. Otherwise, maintainers who
break it might claim that their only fault was not changing the
comment: it's a pointer, after all, so of course it can be null,
and it wasn't their code that dereferenced it.

I don't want to give up safety we already have with xmlwrapp. If
libxml++ returns a pointer where xmlwrapp returns a reference,
then I feel compelled to write wrappers like
  NodeClass& safely_get_root_node(DocumentClass&);
and enforce their use by making it physically impossible to call
into libxml++ directly. Just insisting that we always write
  // std::string name = d.get_root_node()->get_name(); // Don't.
  NodeClass const* root = d.get_root_node();
  if(!root) throw "some error message";
  std::string name = root->get_name();
requires too much discipline. In recent work, I see a dozen
unwrapped calls to add_child(); that libxml++ function doesn't
throw if xmlAddChild() fails, but xmlwrapp's push_back() does.

But writing and maintaining a comprehensive set of wrappers to
wrap a wrapper library is too much work and seems wrongheaded.

> >  In any case, I still think libxml++ has more chances of being developed
> > [again] than xmlwrapp.

I'd agree. But I can't imagine they'd accept a wholesale rewrite
of their API that would break every application that uses it. The
change I'd want is more profound than your 'ustring' patch here:
  http://bugzilla.gnome.org/show_bug.cgi?id=320197
whose status a year later is still "unconfirmed". And we depend
on several other patches that haven't been submitted.

I suppose it's just a matter of time before someone develops a
better C++ xml library. It's been discussed by boost, e.g.:
  http://lists.boost.org/Archives/boost/2005/10/96129.php
The API for boost's somewhat-related "property tree" library:
  http://lists.boost.org/boost-announce/2006/05/0092.php
  http://kaalus.atspace.com/ptree/doc/index.html
looks much more like xmlwrapp's than libxml++'s, and I think
it'll be easier to convert xmlwrapp code to a better C++ xml
library when one emerges.

We don't need a richer xml API today anyway: the older version
of xmlwrapp that we'd been using for years was good enough, and
now we've upgraded to its final version. Evgeniy has pointed out
that it would be good to use xml namespaces, but AFAICT we've
tried that in only one spot, and the code works without it--and
it looks easier to add an xmlNewNs() wrapper to xmlwrapp than to
solve the libxml++ problems.

But we could use a libxslt wrapper today, and xsltwrapp might be
a better solution than continuing to develop our own.

> > This advantage is not nearly big enough to make me
> > argue for switching from xmlwrapp to libxml++ but seeing that we've already
> > done the switch, I think we should stick to it.

We didn't finish the work that switching would require.

We've wrapped some potentially-unsafe libxml++ operations, but
the wrappers aren't really comprehensive or even very well done,
and they aren't used uniformly. They were originally written only
to hide API differences, as an aid to validating a contemplated
conversion to libxml++. They'd need to be hardened and probably
expanded for long-term production use of libxml++. Upgrading to a
new libxml++ release might necessitate maintenance work on those
wrappers as well as on our local patches. I see no chance of
getting all this work accepted into libxml++ even if a new
maintainer steps forward.

I can't justify the additional work it would take to put all of
this in place and maintain it. We'll use xmlwrapp-0.5.0 because
it provides the features and safety we need with far less work;
we'll switch to a different library when something that's truly
better comes along.

[Prev in Thread]

Current Thread

[Next in Thread]

[Fwd: Re: [lmi] Which is the best C++ wrapper for libxml2?], Greg Chicares <=
- Re: [lmi] Which is the best C++ wrapper for libxml2?, Vadim Zeitlin, 2006/11/28

Prev by Date: Re: [lmi] Enumerating open technical issues
Next by Date: Re: [lmi] Which is the best C++ wrapper for libxml2?
Previous by thread: [lmi] a TODO Marker: move "PolicyYear" into Effective Calculation Summary columns list
Next by thread: Re: [lmi] Which is the best C++ wrapper for libxml2?
Index(es):
- Date
- Thread