Re: [Bug-wget] Validity of angle brackets around WARC-Target-URI value

From: William Prescott
Subject: Re: [Bug-wget] Validity of angle brackets around WARC-Target-URI value
Date: Fri, 17 Nov 2017 03:31:16 -0500

For what it's worth, I confirmed that Heritrix (Internet Archive's
crawling tool) produces WARC files without the angle brackets for

Best regards,
William Prescott

On Tue, Nov 14, 2017 at 11:45 PM, William Prescott
<address@hidden> wrote:
> Hello,
> It seems that there may be some ambiguity in the WARC standard
> regarding the usage of angle brackets surrounding the URI given for a
> WARC-Target-URI field.
> In short, while the BNF grammar includes the brackets, the examples
> presented in the standard do not. It would appear that tools have been
> built to assume the lack of brackets, and may have issues when they
> are present (this is how I learned about this.)
> There is some discussion about this here:
> https://github.com/iipc/warc-specifications/issues/23
> https://github.com/iipc/warc-specifications/pull/24
> One commenter states that the brackets have been removed in a newer
> draft. I see that a new standard (ISO 28500:2017) was published in
> August, but I don't have access to confirm if it says anything about
> this.
> A Wget bug report ( http://savannah.gnu.org/bugs/?47281 ) had been
> submitted which resulted in the addition of the brackets in commit
> 100da11312a1781a3d5aa38760ce0e8bd9384659. An additional commit
> (63c2aea2557b84640272629c7dc0caccab66ab6d) expanded the usage of
> brackets to more block types which contained WARC-Target-URI -- this
> was mentioned in
> http://lists.gnu.org/archive/html/bug-wget/2017-03/msg00006.html . The
> specification PDF referenced in the report and mailing list post
> contains the error (see "uri" in the grammar on page 5, and
> "WARC-Target-URI" in the example on page 22 [C.2])
> Given the unclear nature of this aspect of the standard, I don't know
> exactly what action to suggest, but I did want to point it out.
> Best regards,
> William Prescott

