[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Classpathx-xml] possible bug?

From: David Brownell
Subject: Re: [Classpathx-xml] possible bug?
Date: Sun, 14 Apr 2002 10:57:14 -0700

> > > I don't think a protocol is necessary is it? For an embedded DTD? 
> >  
> > Err, no it isn't ... because it came from the URL that DocumentBuilder 
> > is supposed to provide. 
> Also, I've tired setting the parser to non-validating and it still
> tries to get the URL. That's really wrong I think but xerces (at
> least the version I've got) does it too.

A lot of folk get confused about "validating" versus "using DTDs".

Some folk (including sometimes MSFT :) have been spreading
some misinformation, along the lines that they're the same thing.
They're not, but that misinformation is extremely widespread and
it has magnified some needless confusion about entities.

Short version of the story:  entities can (and for portability, should)
get processed whenever you parse XML that includes a DTD.  And
when you validate, they MUST get used.  DTDs define entities and
provide typing and defaults for attributes ... which get used every
time the DTD is processed.  They _also_ provide some rules that
can be checked ... that's optional, and is called "validation".

SAX has two feature flags controlling whether or not parsers
process parameter entities (including the unnamed one that's
used for the external subset of a DTD) or general entities.
When you use a SAX parser that's not validating, you can try to
tell a it not to read those entities.

However, so far as I know only AElfred2 actually does lets you
disable external entity processing.  All SAX parsers have a
mostly complete workaround for skipping general entities:  plug
in an entity resolver that returns empty documents.  That won't
work when the system IDs are illegal URLs, including your case
(where they might be legal as relative URLs, if only the document
had a base URL).

But it's not so simple for parameter entities, though for some
degenerate cases like documents with only external subsets,
that workaround will mostly do the Right Thing.  (Of course
you won't usually be able to know in advance whether all
the documents you handle will sufficiently degenerate.)

- Dave

reply via email to

[Prev in Thread] Current Thread [Next in Thread]