bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [PATCH 21/27] New option --metalink-index to process Metalink


From: Matthew White
Subject: [Bug-wget] [PATCH 21/27] New option --metalink-index to process Metalink application/metalink4+xml
Date: Thu, 29 Sep 2016 06:03:01 +0200

* NEWS: Mention the effect of --metalink-index over Metalink
* src/init.c: Add new option metalinkindex (opt.metalink_index),
  initialize to -1
* src/main.c: Add new option metalink-index (--metalink-index=NUMBER)
* src/options.h: Add new option metalink_index (int)
* src/metalink.h: Add declaration of functions fetch_metalink_file(),
  replace_metalink_basename()
* src/metalink.c: Add functions fetch_metalink_file() simple file
  fetch, replace_metalink_basename() replace file basename
* src/metalink.c (retrieve_from_metalink): New. Process Metalink
  application/metalink4+xml of opt.metalink_index ordinal number
* doc/wget.texi: Add new option metalink-index (--metalink-index)
  documentation
* doc/metalink-standard.txt: Updated doc. Add documentation about
  Metalink application/metalink4+xml metaurls download naming system
* doc/metalink-standard.txt: Update Metalink/XML and HTTP examples
* testenv/Makefile.am: Add new files
* testenv/Test-metalink-http-xml.py: New file. Metalink/HTTP automated
  Metalink/XML "application/metalink4+xml" --metalink-index tests
* testenv/Test-metalink-http-xml-trust.py: New file. Metalink/HTTP
  automated Metalink/XML "application/metalink4+xml" --metalink-index
  retrieval with --trust-server-names tests

WARNING: Do not use lib/dirname.c (dir_name) to get the directory
name, it may append a dot '.' character to the directory name.
---
 NEWS                                    |   3 +
 doc/metalink-standard.txt               |  36 +++-
 doc/wget.texi                           |   9 +
 src/init.c                              |   5 +
 src/main.c                              |   3 +
 src/metalink.c                          | 348 ++++++++++++++++++++++++++++++++
 src/metalink.h                          |   4 +
 src/options.h                           |   1 +
 testenv/Makefile.am                     |   2 +
 testenv/Test-metalink-http-xml-trust.py | 272 +++++++++++++++++++++++++
 testenv/Test-metalink-http-xml.py       | 272 +++++++++++++++++++++++++
 11 files changed, 947 insertions(+), 8 deletions(-)
 create mode 100755 testenv/Test-metalink-http-xml-trust.py
 create mode 100755 testenv/Test-metalink-http-xml.py

diff --git a/NEWS b/NEWS
index 2153d9a..ca7eaba 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,9 @@ Please send GNU Wget bug reports to <address@hidden>.
 
 * Changes in Wget X.Y.Z
 
+* When processing a Metalink header, --metalink-index=<number> allows
+  to process the header's application/metalink4+xml files.
+
 * When processing a Metalink file, --trust-server-names enables the
   use of the destination file names specified in the Metalink file,
   otherwise a safe destination file name is computed.
diff --git a/doc/metalink-standard.txt b/doc/metalink-standard.txt
index 18acaaa..78709fb 100644
--- a/doc/metalink-standard.txt
+++ b/doc/metalink-standard.txt
@@ -65,10 +65,16 @@ ignored, see '1. Security features'.
 ====================================
 
 When --trust-server-names is off, the basename of the --input-metalink
-file, if available, or of the mother URL is trusted.
+file, if available, or of the mother URL is trusted. This trusted name
+is the radix of any subsequent file name.
+
+When a Metalink/HTTP in encountered, any fetched Metalink/XML file has
+its own ordinal number appended as suffix to the trusted name. In this
+case scenario, an unique Metalink/XML file is saved each time applying
+an additional suffix to the currently computed name when necessary.
 
 The files described by a Metalink/XML file will be named sequentially
-applying a suffix to the trusted name.
+applying an additional suffix to the currently trusted/computed name.
 
 3.1.1.2 With --trust-server-names
 =================================
@@ -91,6 +97,8 @@ found unsafe too, the file is not downloaded.
 4.1 Example files
 =================
 
+See [1 #section-1.1].
+
 cat > bugus.meta4 << EOF
 <?xml version="1.0" encoding="UTF-8"?>
 <metalink xmlns="urn:ietf:params:xml:ns:metalink">
@@ -134,8 +142,9 @@ be informed to the caller of libmetalink's 
metalink_parse_file().
 Fetched metalink:file elements shall be wrote using the unique "name"
 field as file name [1 #section-4.1.2.1].
 
-A metalink:file url's file name shall not substitute the "name" field,
-see '3. Download file name'.
+A metalink:file url's file name shall not substitute the "name" field.
+
+Security exceptions are explained in '3. Download file name'.
 
 4.5 Multi-Source download
 =========================
@@ -160,9 +169,19 @@ $ wget --metalink-over-http http://127.0.0.1/dir/file.ext
 5.3 Metalink/HTTP header answer
 ===============================
 
-Link: http://ftpmirror.gnu.org/bash/bash-4.3-patches/bash43-001; 
rel=duplicate; pref; pri=2
-Link: http://another.url/common_name; rel=duplicate; pref; pri=1
-Digest: SHA-256=7LPf8mSGZ1E+MVVLOtBUzNifzjjjM2fJRZrDooUVN0I=
+See [2 #section-1.1].
+
+Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
+Link: <http://www2.example.com/example.ext>; rel=duplicate
+Link: <ftp://ftp.example.com/example.ext>; rel=duplicate
+Link: <http://example.com/example.ext.torrent>; rel=describedby;
+type="application/x-bittorrent"
+Link: <http://example.com/example.ext.meta4>; rel=describedby;
+type="application/metalink4+xml"
+Link: <http://example.com/example.ext.asc>; rel=describedby;
+type="application/pgp-signature"
+Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
+DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==
 
 5.4 Saving files
 ================
@@ -173,7 +192,8 @@ purpose of the "Directory Options" is as usual, and the 
file name is
 the cli's url file name, see wget(1).
 
 The url followed to download the file shall not substitute the cli's
-url to compute the file name to wrote, see '3. Download file name'.
+url to compute the file name to wrote, except when it redirects to a
+Metalink/XML file, following the rules in '3. Download file name'.
 
 5.5 Multi-Source download
 =========================
diff --git a/doc/wget.texi b/doc/wget.texi
index c19e9db..8cf3230 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -524,6 +524,15 @@ Issues HTTP HEAD request instead of GET and extracts 
Metalink metadata
 from response headers. Then it switches to Metalink download.
 If no valid Metalink metadata is found, it falls back to ordinary HTTP 
download.
 
address@hidden metalink-index
address@hidden address@hidden
+Set the Metalink @samp{application/metalink4+xml} metaurl ordinal
+NUMBER. From 1 to the total number of ``application/metalink4+xml''
+available.  Specify 0 or @samp{inf} to choose the first good one.
+Metaurls, such as those from a @samp{--metalink-over-http}, may have
+been sorted by priority key's value; keep this in mind to choose the
+right NUMBER.
+
 @cindex preferred-location
 @item --preferred-location
 Set preferred location for Metalink resources. This has effect if multiple
diff --git a/src/init.c b/src/init.c
index 6729c5a..26f3886 100644
--- a/src/init.c
+++ b/src/init.c
@@ -248,6 +248,7 @@ static const struct {
   { "login",            &opt.ftp_user,          cmd_string },/* deprecated*/
   { "maxredirect",      &opt.max_redirect,      cmd_number },
 #ifdef HAVE_METALINK
+  { "metalinkindex",    &opt.metalink_index,     cmd_number_inf },
   { "metalinkoverhttp", &opt.metalink_over_http, cmd_boolean },
 #endif
   { "method",           &opt.method,            cmd_string_uppercase },
@@ -385,6 +386,10 @@ defaults (void)
      bit pattern will be the least of the implementors' worries.  */
   xzero (opt);
 
+#ifdef HAVE_METALINK
+  opt.metalink_index = -1;
+#endif
+
   opt.cookies = true;
   opt.verbose = -1;
   opt.ntry = 20;
diff --git a/src/main.c b/src/main.c
index ac6ee2c..d48e3b2 100644
--- a/src/main.c
+++ b/src/main.c
@@ -354,6 +354,7 @@ static struct cmdline_option option_data[] =
     { "rejected-log", 0, OPT_VALUE, "rejectedlog", -1 },
     { "max-redirect", 0, OPT_VALUE, "maxredirect", -1 },
 #ifdef HAVE_METALINK
+    { "metalink-index", 0, OPT_VALUE, "metalinkindex", -1 },
     { "metalink-over-http", 0, OPT_BOOLEAN, "metalinkoverhttp", -1 },
 #endif
     { "method", 0, OPT_VALUE, "method", -1 },
@@ -714,6 +715,8 @@ Download:\n"),
     N_("\
        --keep-badhash              keep files with checksum mismatch (append 
.badhash)\n"),
     N_("\
+       --metalink-index=NUMBER     Metalink application/metalink4+xml metaurl 
ordinal NUMBER\n"),
+    N_("\
        --metalink-over-http        use Metalink metadata from HTTP response 
headers\n"),
     N_("\
        --preferred-location        preferred location for Metalink 
resources\n"),
diff --git a/src/metalink.c b/src/metalink.c
index 29cfee7..16e247d 100644
--- a/src/metalink.c
+++ b/src/metalink.c
@@ -189,6 +189,184 @@ retrieve_from_metalink (const metalink_t* metalink)
             }
         }
 
+      /* Process the chosen application/metalink4+xml metaurl.  */
+      if (opt.metalink_index >= 0)
+        {
+          int _metalink_index = opt.metalink_index;
+
+          metalink_metaurl_t **murl_ptr;
+          int abs_count = 0, meta_count = 0;
+
+          uerr_t x_retr_err = METALINK_MISSING_RESOURCE;
+
+          opt.metalink_index = -1;
+
+          DEBUGP (("Searching application/metalink4+xml ordinal number 
%d...\n", _metalink_index));
+
+          if (mfile->metaurls && mfile->metaurls[0])
+            for (murl_ptr = mfile->metaurls; *murl_ptr; murl_ptr++)
+              {
+                metalink_t* metaurl_xml;
+                metalink_error_t meta_err;
+                metalink_metaurl_t *murl = *murl_ptr;
+
+                char *_dir_prefix = opt.dir_prefix;
+                char *_input_metalink = opt.input_metalink;
+
+                char *metafile = NULL;
+                char *metadest = NULL;
+                char *metadir = NULL;
+
+                abs_count++;
+
+                if (strcmp (murl->mediatype, "application/metalink4+xml"))
+                  continue;
+
+                meta_count++;
+
+                DEBUGP (("  Ordinal number %d: %s\n", meta_count, quote 
(murl->url)));
+
+                if (_metalink_index > 0)
+                  {
+                    if (meta_count < _metalink_index)
+                      continue;
+                    else if (meta_count > _metalink_index)
+                      break;
+                  }
+
+                logprintf (LOG_NOTQUIET,
+                           _("Processing metaurl %s...\n"), quote (murl->url));
+
+                /* Metalink/XML download file name.  */
+                metafile = xstrdup (safename);
+
+                if (opt.trustservernames)
+                  replace_metalink_basename (&metafile, murl->url);
+                else
+                  append_suffix_number (&metafile, ".meta#", meta_count);
+
+                if (!metalink_check_safe_path (metafile))
+                  {
+                    logprintf (LOG_NOTQUIET,
+                               _("Rejecting metaurl file %s. Unsafe name.\n"),
+                               quote (metafile));
+                    xfree (metafile);
+                    if (_metalink_index > 0)
+                      break;
+                    continue;
+                  }
+
+                /* For security reasons, always save metalink metaurl
+                   files as new unique files. Keep them on failure.  */
+                x_retr_err = fetch_metalink_file (murl->url, false, false,
+                                                  metafile, &metadest);
+
+                /* On failure, try the next metalink metaurl.  */
+                if (x_retr_err != RETROK)
+                  {
+                    logprintf (LOG_VERBOSE,
+                               _("Failed to download %s. Skipping metaurl.\n"),
+                               quote (metadest ? metadest : metafile));
+                    inform_exit_status (x_retr_err);
+                    xfree (metadest);
+                    xfree (metafile);
+                    if (_metalink_index > 0)
+                      break;
+                    continue;
+                  }
+
+                /* Parse Metalink/XML.  */
+                meta_err = metalink_parse_file (metadest, &metaurl_xml);
+
+                /* On failure, try the next metalink metaurl.  */
+                if (meta_err)
+                  {
+                    logprintf (LOG_NOTQUIET,
+                               _("Unable to parse metaurl file %s.\n"), quote 
(metadest));
+                    x_retr_err = METALINK_PARSE_ERROR;
+                    inform_exit_status (x_retr_err);
+                    xfree (metadest);
+                    xfree (metafile);
+                    if (_metalink_index > 0)
+                      break;
+                    continue;
+                  }
+
+                /* We need to sort the resources if preferred location
+                   was specified by the user.  */
+                if (opt.preferred_location && opt.preferred_location[0])
+                  {
+                    metalink_file_t **x_mfile_ptr;
+                    for (x_mfile_ptr = metaurl_xml->files; *x_mfile_ptr; 
x_mfile_ptr++)
+                      {
+                        metalink_resource_t **x_mres_ptr;
+                        metalink_file_t *x_mfile = *x_mfile_ptr;
+                        size_t mres_count = 0;
+
+                        for (x_mres_ptr = x_mfile->resources; *x_mres_ptr; 
x_mres_ptr++)
+                          mres_count++;
+
+                        stable_sort (x_mfile->resources,
+                                     mres_count,
+                                     sizeof (metalink_resource_t *),
+                                     metalink_res_cmp);
+                      }
+                  }
+
+                /* Insert the current "Directory Options".  */
+                if (metalink->origin)
+                  {
+                    /* WARNING: Do not use lib/dirname.c (dir_name) to
+                       get the directory name, it may append a dot '.'
+                       character to the directory name. */
+                    metadir = xstrdup (planname);
+                    replace_metalink_basename (&metadir, NULL);
+                  }
+                else
+                  {
+                    metadir = xstrdup (opt.dir_prefix);
+                  }
+
+                opt.dir_prefix = metadir;
+                opt.input_metalink = metadest;
+
+                x_retr_err = retrieve_from_metalink (metaurl_xml);
+
+                if (x_retr_err != RETROK)
+                  logprintf (LOG_NOTQUIET,
+                             _("Could not download all resources from %s.\n"),
+                             quote (metadest));
+
+                metalink_delete (metaurl_xml);
+                metaurl_xml = NULL;
+
+                opt.input_metalink = _input_metalink;
+                opt.dir_prefix = _dir_prefix;
+
+                xfree (metadir);
+                xfree (metadest);
+                xfree (metafile);
+
+                break;
+              }
+
+          if (x_retr_err != RETROK)
+            logprintf (LOG_NOTQUIET, _("Metaurls processing returned with 
error.\n"));
+
+          xfree (destname);
+          xfree (filename);
+          xfree (trsrname);
+          xfree (planname);
+
+          opt.output_document = _output_document;
+          output_stream_regular = _output_stream_regular;
+          output_stream = _output_stream;
+
+          opt.metalink_index = _metalink_index;
+
+          return x_retr_err;
+        }
+
       /* Resources are sorted by priority.  */
       for (mres_ptr = mfile->resources; *mres_ptr && !skip_mfile; mres_ptr++)
         {
@@ -740,6 +918,72 @@ gpg_skip_verification:
 }
 
 /*
+  Replace/remove the basename of a file name.
+
+  The file name is permanently modified.
+
+  Always set NAME to a string, even an empty one.
+
+  Use REF's basename as replacement.  If REF is NULL or if it doesn't
+  provide a valid basename candidate, then remove NAME's basename.
+*/
+void
+replace_metalink_basename (char **name, char *ref)
+{
+  int n;
+  char *p, *new, *basename;
+
+  if (!name)
+    return;
+
+  /* Strip old basename.  */
+  if (*name)
+    {
+      basename = last_component (*name);
+
+      if (basename == *name)
+        xfree (*name);
+      else
+        *basename = '\0';
+    }
+
+  /* New basename from file name reference.  */
+  if (ref)
+    ref = last_component (ref);
+
+  /* Replace the old basename.  */
+  new = aprintf ("%s%s", *name ? *name : "", ref ? ref : "");
+  xfree (*name);
+  *name = new;
+
+  /* Remove prefix drive letters if required, i.e. when in w32
+     environments.  */
+  p = new;
+  while (p[0] != '\0')
+    {
+      while ((n = FILE_SYSTEM_PREFIX_LEN (p)))
+        p += n;
+
+      if (p != new)
+        {
+          while (ISSLASH (p[0]))
+            ++p;
+          new = p;
+          continue;
+        }
+
+      break;
+    }
+
+  if (*name != new)
+    {
+      new = xstrdup (new);
+      xfree (*name);
+      *name = new;
+    }
+}
+
+/*
   Strip the directory components from the given name.
 
   Return a pointer to the end of the leading directory components.
@@ -891,6 +1135,110 @@ badhash_or_remove (char *name)
     }
 }
 
+/*
+  Simple file fetch.
+
+  Set DESTNAME to the name of the saved file.
+
+  Resume previous download if RESUME is true.  To disable
+  Metalink/HTTP, set METALINK_HTTP to false.
+*/
+uerr_t
+fetch_metalink_file (const char *url_str,
+                     bool resume, bool metalink_http,
+                     const char *filename, char **destname)
+{
+  FILE *_output_stream = output_stream;
+  bool _output_stream_regular = output_stream_regular;
+  char *_output_document = opt.output_document;
+  bool _metalink_http = opt.metalink_over_http;
+
+  char *local_file = NULL;
+
+  uerr_t retr_err = URLERROR;
+
+  struct iri *iri;
+  struct url *url;
+  int url_err;
+
+  /* Parse the URL.  */
+  iri = iri_new ();
+  set_uri_encoding (iri, opt.locale, true);
+  url = url_parse (url_str, &url_err, iri, false);
+
+  if (!url)
+    {
+      char *error = url_error (url_str, url_err);
+      logprintf (LOG_NOTQUIET, "%s: %s.\n", url_str, error);
+      inform_exit_status (retr_err);
+      iri_free (iri);
+      xfree (error);
+      return retr_err;
+    }
+
+  output_stream = NULL;
+
+  if (resume)
+    /* continue previous download */
+    output_stream = fopen (filename, "ab");
+  else
+    /* create a file with an unique name */
+    output_stream = unique_create (filename, true, &local_file);
+
+  output_stream_regular = true;
+
+  /*
+    If output_stream is NULL, the file couldn't be created/opened.
+    This could be due to the impossibility to create a directory tree:
+    * stdio.h (fopen)
+    * src/utils.c (unique_create)
+
+    A call to retrieve_url() can indirectly create a directory tree,
+    when opt.output_document is set to the destination file name and
+    output_stream is left to NULL:
+    * src/http.c (open_output_stream): If output_stream is NULL,
+      create the destination opt.output_document "path/file"
+  */
+  if (!local_file)
+    local_file = xstrdup (filename);
+
+  /* Store the real file name for displaying in messages, and for
+     proper "path/file" handling.  */
+  opt.output_document = local_file;
+
+  opt.metalink_over_http = metalink_http;
+
+  DEBUGP (("Storing to %s\n", local_file));
+  retr_err = retrieve_url (url, url_str, NULL, NULL,
+                           NULL, NULL, opt.recursive, iri, false);
+
+  if (retr_err == RETROK)
+    {
+      if (destname)
+        *destname = local_file;
+      else
+        xfree (local_file);
+    }
+
+  if (output_stream)
+    {
+      fclose (output_stream);
+      output_stream = NULL;
+    }
+
+  opt.metalink_over_http = _metalink_http;
+  opt.output_document = _output_document;
+  output_stream_regular = _output_stream_regular;
+  output_stream = _output_stream;
+
+  inform_exit_status (retr_err);
+
+  iri_free (iri);
+  url_free (url);
+
+  return retr_err;
+}
+
 int metalink_res_cmp (const void* v1, const void* v2)
 {
   const metalink_resource_t *res1 = *(metalink_resource_t **) v1,
diff --git a/src/metalink.h b/src/metalink.h
index ed8c62f..2a8481d 100644
--- a/src/metalink.h
+++ b/src/metalink.h
@@ -51,12 +51,16 @@ int metalink_meta_cmp (const void* meta1, const void* 
meta2);
 int metalink_check_safe_path (const char *path);
 
 char *last_component (char const *name);
+void replace_metalink_basename (char **name, char *ref);
 char *get_metalink_basename (char *name);
 void append_suffix_number (char **str, const char *sep, wgint num);
 void clean_metalink_string (char **str);
 void dequote_metalink_string (char **str);
 void badhash_suffix (char *name);
 void badhash_or_remove (char *name);
+uerr_t fetch_metalink_file (const char *url_str,
+                            bool resume, bool metalink_http,
+                            const char *filename, char **destname);
 
 bool find_key_value (const char *start,
                      const char *end,
diff --git a/src/options.h b/src/options.h
index 2724388..9fcefc3 100644
--- a/src/options.h
+++ b/src/options.h
@@ -67,6 +67,7 @@ struct options
   char *input_filename;         /* Input filename */
 #ifdef HAVE_METALINK
   char *input_metalink;         /* Input metalink file */
+  int metalink_index;           /* Metalink application/metalink4+xml metaurl 
ordinal number. */
   bool metalink_over_http;      /* Use Metalink if present in HTTP response */
   char *preferred_location;     /* Preferred location for Metalink resources */
 #endif
diff --git a/testenv/Makefile.am b/testenv/Makefile.am
index b6bad8d..daba609 100644
--- a/testenv/Makefile.am
+++ b/testenv/Makefile.am
@@ -29,6 +29,8 @@
 if METALINK_IS_ENABLED
   METALINK_TESTS = Test-metalink-http.py            \
     Test-metalink-http-quoted.py                    \
+    Test-metalink-http-xml.py                       \
+    Test-metalink-http-xml-trust.py                 \
     Test-metalink-xml.py                            \
     Test-metalink-xml-continue.py                   \
     Test-metalink-xml-relpath.py                    \
diff --git a/testenv/Test-metalink-http-xml-trust.py 
b/testenv/Test-metalink-http-xml-trust.py
new file mode 100755
index 0000000..1df6ce4
--- /dev/null
+++ b/testenv/Test-metalink-http-xml-trust.py
@@ -0,0 +1,272 @@
+#!/usr/bin/env python3
+from sys import exit
+from test.http_test import HTTPTest
+from misc.wget_file import WgetFile
+import hashlib
+from base64 import b64encode
+
+"""
+    This is to test Metalink/HTTP with Metalink/XML Link headers.
+
+    With --trust-server-names, trust the metalink:file names.
+
+    Without --trust-server-names, don't trust the metalink:file names:
+    use the basename of --input-metalink, and add a sequential number
+    (e.g. .#1, .#2, etc.).
+
+    Strip the directory from unsafe paths.
+"""
+
+############# File Definitions ###############################################
+bad = "Ouch!"
+bad_sha256 = hashlib.sha256 (bad.encode ('UTF-8')).hexdigest ()
+
+File1 = "Would you like some Tea?"
+File1_lowPref = "Do not take this"
+File1_sha256 = hashlib.sha256 (File1.encode ('UTF-8')).hexdigest ()
+
+File2 = "This is gonna be good"
+File2_lowPref = "Not this one too"
+File2_sha256 = hashlib.sha256 (File2.encode ('UTF-8')).hexdigest ()
+
+File3 = "A little more, please"
+File3_lowPref = "That's just too much"
+File3_sha256 = hashlib.sha256 (File3.encode ('UTF-8')).hexdigest ()
+
+File4 = "Maybe a biscuit?"
+File4_lowPref = "No, thanks"
+File4_sha256 = hashlib.sha256 (File4.encode ('UTF-8')).hexdigest ()
+
+File5 = "More Tea...?"
+File5_lowPref = "I have to go..."
+File5_sha256 = hashlib.sha256 (File5.encode ('UTF-8')).hexdigest ()
+
+MetaXml1 = \
+"""<?xml version="1.0" encoding="utf-8"?>
+<metalink version="3.0" xmlns="http://www.metalinker.org/";>
+  <publisher>
+    <name>GNU Wget</name>
+  </publisher>
+  <license>
+    <name>GNU GPL</name>
+    <url>http://www.gnu.org/licenses/gpl.html</url>
+  </license>
+  <identity>Wget Test Files</identity>
+  <version>1.2.3</version>
+  <description>Wget Test Files description</description>
+  <files>
+    <file name="dir/File1">
+      <verification>
+        <hash type="sha256">{{FILE1_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File1_lowPref</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File1</url>
+      </resources>
+    </file>
+    <file name="dir/File2">
+      <verification>
+        <hash type="sha256">{{FILE2_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File2_lowPref</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File2</url>
+      </resources>
+    </file>
+    <file name="/dir/File3"> <!-- rejected by libmetalink -->
+      <verification>
+        <hash type="sha256">{{FILE3_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File3_lowPref</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File3</url>
+      </resources>
+    </file>
+    <file name="dir/File4">
+      <verification>
+        <hash type="sha256">{{FILE4_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File4_lowPref</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File4</url>
+      </resources>
+    </file>
+    <file name="dir/File5">
+      <verification>
+        <hash type="sha256">{{FILE5_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File5_lowPref</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File5</url>
+      </resources>
+    </file>
+  </files>
+</metalink>
+"""
+
+MetaXml2 = \
+"""<?xml version="1.0" encoding="utf-8"?>
+<metalink version="3.0" xmlns="http://www.metalinker.org/";>
+  <publisher>
+    <name>GNU Wget</name>
+  </publisher>
+  <license>
+    <name>GNU GPL</name>
+    <url>http://www.gnu.org/licenses/gpl.html</url>
+  </license>
+  <identity>Wget Test Files</identity>
+  <version>1.2.3</version>
+  <description>Wget Test Files description</description>
+  <files>
+    <file name="bad">
+      <verification>
+        <hash type="sha256">{{BAD_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/bad</url>
+      </resources>
+    </file>
+  </files>
+</metalink>
+"""
+
+LinkHeaders = [
+    # This file has the lowest priority, and should go last
+    "<http://{{SRV_HOST}}:{{SRV_PORT}}/test1.metalink>; rel=describedby; 
pri=2; type=\"application/metalink4+xml\"",
+    # This file has the highest priority, and should go first
+    "<http://{{SRV_HOST}}:{{SRV_PORT}}/test2.metalink>; rel=describedby; 
pri=1; type=\"application/metalink4+xml\""
+]
+
+# This will be filled as soon as we know server hostname and port
+MetaHTTPRules = {'SendHeader' : {}}
+
+MetaHTTP = WgetFile ("main.metalink", rules=MetaHTTPRules)
+
+wrong_file = WgetFile ("wrong_file", bad)
+
+File1_orig = WgetFile ("File1", File1)
+File1_down = WgetFile ("dir/File1", File1)
+File1_nono = WgetFile ("File1_lowPref", File1_lowPref)
+
+File2_orig = WgetFile ("File2", File2)
+File2_down = WgetFile ("dir/File2", File2)
+File2_nono = WgetFile ("File2_lowPref", File2_lowPref)
+
+# rejected by libmetalink
+File3_orig = WgetFile ("File3", File3)
+File3_nono = WgetFile ("File3_lowPref", File3_lowPref)
+
+File4_orig = WgetFile ("File4", File4)
+File4_down = WgetFile ("dir/File4", File4)
+File4_nono = WgetFile ("File4_lowPref", File4_lowPref)
+
+File5_orig = WgetFile ("File5", File5)
+File5_down = WgetFile ("dir/File5", File5)
+File5_nono = WgetFile ("File5_lowPref", File5_lowPref)
+
+MetaFile1 = WgetFile ("test1.metalink", MetaXml1)
+MetaFile1_down = WgetFile ("test1.metalink", MetaXml1)
+
+MetaFile2 = WgetFile ("test2.metalink", MetaXml2)
+
+WGET_OPTIONS = "--trust-server-names --metalink-over-http --metalink-index=2"
+WGET_URLS = [["main.metalink"]]
+
+RequestList = [[
+    "HEAD /main.metalink",
+    "GET /404",
+    "GET /wrong_file",
+    "GET /test1.metalink",
+    "GET /File1",
+    "GET /File2",
+    "GET /File4",
+    "GET /File5"
+]]
+
+Files = [[
+    MetaHTTP,
+    wrong_file,
+    MetaFile1, MetaFile2,
+    File1_orig, File1_nono,
+    File2_orig, File2_nono,
+    File3_orig, File3_nono,
+    File4_orig, File4_nono,
+    File5_orig, File5_nono
+]]
+Existing_Files = []
+
+ExpectedReturnCode = 0
+ExpectedDownloadedFiles = [
+    MetaFile1_down,
+    File1_down,
+    File2_down,
+    File4_down,
+    File5_down
+]
+
+################ Pre and Post Test Hooks #####################################
+pre_test = {
+    "ServerFiles"       : Files,
+    "LocalFiles"        : Existing_Files
+}
+test_options = {
+    "WgetCommands"      : WGET_OPTIONS,
+    "Urls"              : WGET_URLS
+}
+post_test = {
+    "ExpectedFiles"     : ExpectedDownloadedFiles,
+    "ExpectedRetcode"   : ExpectedReturnCode,
+    "FilesCrawled"      : RequestList
+}
+
+http_test = HTTPTest (
+                pre_hook=pre_test,
+                test_params=test_options,
+                post_hook=post_test
+)
+
+http_test.server_setup()
+### Get and use dynamic server sockname
+srv_host, srv_port = http_test.servers[0].server_inst.socket.getsockname ()
+
+MetaXml1 = MetaXml1.replace('{{FILE1_HASH}}', File1_sha256)
+MetaXml1 = MetaXml1.replace('{{FILE2_HASH}}', File2_sha256)
+MetaXml1 = MetaXml1.replace('{{FILE3_HASH}}', File3_sha256)
+MetaXml1 = MetaXml1.replace('{{FILE4_HASH}}', File4_sha256)
+MetaXml1 = MetaXml1.replace('{{FILE5_HASH}}', File5_sha256)
+MetaXml1 = MetaXml1.replace('{{SRV_HOST}}', srv_host)
+MetaXml1 = MetaXml1.replace('{{SRV_PORT}}', str (srv_port))
+MetaFile1.content = MetaXml1
+MetaFile1_down.content = MetaXml1
+
+MetaXml2 = MetaXml2.replace('{{BAD_HASH}}', bad_sha256)
+MetaXml2 = MetaXml2.replace('{{SRV_HOST}}', srv_host)
+MetaXml2 = MetaXml2.replace('{{SRV_PORT}}', str (srv_port))
+MetaFile2.content = MetaXml2
+
+# Helper function for hostname, port and digest substitution
+def SubstituteServerInfo (text, host, port):
+    text = text.replace('{{SRV_HOST}}', host)
+    text = text.replace('{{SRV_PORT}}', str (port))
+    return text
+
+MetaHTTPRules["SendHeader"] = {
+        'Link': [ SubstituteServerInfo (LinkHeader, srv_host, srv_port)
+                    for LinkHeader in LinkHeaders ]
+}
+
+err = http_test.begin ()
+
+exit (err)
diff --git a/testenv/Test-metalink-http-xml.py 
b/testenv/Test-metalink-http-xml.py
new file mode 100755
index 0000000..9cc49e9
--- /dev/null
+++ b/testenv/Test-metalink-http-xml.py
@@ -0,0 +1,272 @@
+#!/usr/bin/env python3
+from sys import exit
+from test.http_test import HTTPTest
+from misc.wget_file import WgetFile
+import hashlib
+from base64 import b64encode
+
+"""
+    This is to test Metalink/HTTP with Metalink/XML Link headers.
+
+    With --trust-server-names, trust the metalink:file names.
+
+    Without --trust-server-names, don't trust the metalink:file names:
+    use the basename of --input-metalink, and add a sequential number
+    (e.g. .#1, .#2, etc.).
+
+    Strip the directory from unsafe paths.
+"""
+
+############# File Definitions ###############################################
+bad = "Ouch!"
+bad_sha256 = hashlib.sha256 (bad.encode ('UTF-8')).hexdigest ()
+
+File1 = "Would you like some Tea?"
+File1_lowPref = "Do not take this"
+File1_sha256 = hashlib.sha256 (File1.encode ('UTF-8')).hexdigest ()
+
+File2 = "This is gonna be good"
+File2_lowPref = "Not this one too"
+File2_sha256 = hashlib.sha256 (File2.encode ('UTF-8')).hexdigest ()
+
+File3 = "A little more, please"
+File3_lowPref = "That's just too much"
+File3_sha256 = hashlib.sha256 (File3.encode ('UTF-8')).hexdigest ()
+
+File4 = "Maybe a biscuit?"
+File4_lowPref = "No, thanks"
+File4_sha256 = hashlib.sha256 (File4.encode ('UTF-8')).hexdigest ()
+
+File5 = "More Tea...?"
+File5_lowPref = "I have to go..."
+File5_sha256 = hashlib.sha256 (File5.encode ('UTF-8')).hexdigest ()
+
+MetaXml1 = \
+"""<?xml version="1.0" encoding="utf-8"?>
+<metalink version="3.0" xmlns="http://www.metalinker.org/";>
+  <publisher>
+    <name>GNU Wget</name>
+  </publisher>
+  <license>
+    <name>GNU GPL</name>
+    <url>http://www.gnu.org/licenses/gpl.html</url>
+  </license>
+  <identity>Wget Test Files</identity>
+  <version>1.2.3</version>
+  <description>Wget Test Files description</description>
+  <files>
+    <file name="dir/File1">
+      <verification>
+        <hash type="sha256">{{FILE1_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File1_lowPref</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File1</url>
+      </resources>
+    </file>
+    <file name="dir/File2">
+      <verification>
+        <hash type="sha256">{{FILE2_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File2_lowPref</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File2</url>
+      </resources>
+    </file>
+    <file name="/dir/File3"> <!-- rejected by libmetalink -->
+      <verification>
+        <hash type="sha256">{{FILE3_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File3_lowPref</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File3</url>
+      </resources>
+    </file>
+    <file name="dir/File4">
+      <verification>
+        <hash type="sha256">{{FILE4_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File4_lowPref</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File4</url>
+      </resources>
+    </file>
+    <file name="dir/File5">
+      <verification>
+        <hash type="sha256">{{FILE5_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="25">http://{{SRV_HOST}}:{{SRV_PORT}}/File5_lowPref</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/File5</url>
+      </resources>
+    </file>
+  </files>
+</metalink>
+"""
+
+MetaXml2 = \
+"""<?xml version="1.0" encoding="utf-8"?>
+<metalink version="3.0" xmlns="http://www.metalinker.org/";>
+  <publisher>
+    <name>GNU Wget</name>
+  </publisher>
+  <license>
+    <name>GNU GPL</name>
+    <url>http://www.gnu.org/licenses/gpl.html</url>
+  </license>
+  <identity>Wget Test Files</identity>
+  <version>1.2.3</version>
+  <description>Wget Test Files description</description>
+  <files>
+    <file name="bad">
+      <verification>
+        <hash type="sha256">{{BAD_HASH}}</hash>
+      </verification>
+      <resources>
+        <url type="http" 
preference="35">http://{{SRV_HOST}}:{{SRV_PORT}}/wrong_file</url>
+        <url type="http" 
preference="40">http://{{SRV_HOST}}:{{SRV_PORT}}/404</url>
+        <url type="http" 
preference="30">http://{{SRV_HOST}}:{{SRV_PORT}}/bad</url>
+      </resources>
+    </file>
+  </files>
+</metalink>
+"""
+
+LinkHeaders = [
+    # This file has the lowest priority, and should go last
+    "<http://{{SRV_HOST}}:{{SRV_PORT}}/test1.metalink>; rel=describedby; 
pri=2; type=\"application/metalink4+xml\"",
+    # This file has the highest priority, and should go first
+    "<http://{{SRV_HOST}}:{{SRV_PORT}}/test2.metalink>; rel=describedby; 
pri=1; type=\"application/metalink4+xml\""
+]
+
+# This will be filled as soon as we know server hostname and port
+MetaHTTPRules = {'SendHeader' : {}}
+
+MetaHTTP = WgetFile ("main.metalink", rules=MetaHTTPRules)
+
+wrong_file = WgetFile ("wrong_file", bad)
+
+File1_orig = WgetFile ("File1", File1)
+File1_down = WgetFile ("main.metalink.meta#2.#1", File1)
+File1_nono = WgetFile ("File1_lowPref", File1_lowPref)
+
+File2_orig = WgetFile ("File2", File2)
+File2_down = WgetFile ("main.metalink.meta#2.#2", File2)
+File2_nono = WgetFile ("File2_lowPref", File2_lowPref)
+
+# rejected by libmetalink
+File3_orig = WgetFile ("File3", File3)
+File3_nono = WgetFile ("File3_lowPref", File3_lowPref)
+
+File4_orig = WgetFile ("File4", File4)
+File4_down = WgetFile ("main.metalink.meta#2.#3", File4)
+File4_nono = WgetFile ("File4_lowPref", File4_lowPref)
+
+File5_orig = WgetFile ("File5", File5)
+File5_down = WgetFile ("main.metalink.meta#2.#4", File5)
+File5_nono = WgetFile ("File5_lowPref", File5_lowPref)
+
+MetaFile1 = WgetFile ("test1.metalink", MetaXml1)
+MetaFile1_down = WgetFile ("main.metalink.meta#2", MetaXml1)
+
+MetaFile2 = WgetFile ("test2.metalink", MetaXml2)
+
+WGET_OPTIONS = "--metalink-over-http --metalink-index=2"
+WGET_URLS = [["main.metalink"]]
+
+RequestList = [[
+    "HEAD /main.metalink",
+    "GET /404",
+    "GET /wrong_file",
+    "GET /test1.metalink",
+    "GET /File1",
+    "GET /File2",
+    "GET /File4",
+    "GET /File5"
+]]
+
+Files = [[
+    MetaHTTP,
+    wrong_file,
+    MetaFile1, MetaFile2,
+    File1_orig, File1_nono,
+    File2_orig, File2_nono,
+    File3_orig, File3_nono,
+    File4_orig, File4_nono,
+    File5_orig, File5_nono
+]]
+Existing_Files = []
+
+ExpectedReturnCode = 0
+ExpectedDownloadedFiles = [
+    MetaFile1_down,
+    File1_down,
+    File2_down,
+    File4_down,
+    File5_down
+]
+
+################ Pre and Post Test Hooks #####################################
+pre_test = {
+    "ServerFiles"       : Files,
+    "LocalFiles"        : Existing_Files
+}
+test_options = {
+    "WgetCommands"      : WGET_OPTIONS,
+    "Urls"              : WGET_URLS
+}
+post_test = {
+    "ExpectedFiles"     : ExpectedDownloadedFiles,
+    "ExpectedRetcode"   : ExpectedReturnCode,
+    "FilesCrawled"      : RequestList
+}
+
+http_test = HTTPTest (
+                pre_hook=pre_test,
+                test_params=test_options,
+                post_hook=post_test
+)
+
+http_test.server_setup()
+### Get and use dynamic server sockname
+srv_host, srv_port = http_test.servers[0].server_inst.socket.getsockname ()
+
+MetaXml1 = MetaXml1.replace('{{FILE1_HASH}}', File1_sha256)
+MetaXml1 = MetaXml1.replace('{{FILE2_HASH}}', File2_sha256)
+MetaXml1 = MetaXml1.replace('{{FILE3_HASH}}', File3_sha256)
+MetaXml1 = MetaXml1.replace('{{FILE4_HASH}}', File4_sha256)
+MetaXml1 = MetaXml1.replace('{{FILE5_HASH}}', File5_sha256)
+MetaXml1 = MetaXml1.replace('{{SRV_HOST}}', srv_host)
+MetaXml1 = MetaXml1.replace('{{SRV_PORT}}', str (srv_port))
+MetaFile1.content = MetaXml1
+MetaFile1_down.content = MetaXml1
+
+MetaXml2 = MetaXml2.replace('{{BAD_HASH}}', bad_sha256)
+MetaXml2 = MetaXml2.replace('{{SRV_HOST}}', srv_host)
+MetaXml2 = MetaXml2.replace('{{SRV_PORT}}', str (srv_port))
+MetaFile2.content = MetaXml2
+
+# Helper function for hostname, port and digest substitution
+def SubstituteServerInfo (text, host, port):
+    text = text.replace('{{SRV_HOST}}', host)
+    text = text.replace('{{SRV_PORT}}', str (port))
+    return text
+
+MetaHTTPRules["SendHeader"] = {
+        'Link': [ SubstituteServerInfo (LinkHeader, srv_host, srv_port)
+                    for LinkHeader in LinkHeaders ]
+}
+
+err = http_test.begin ()
+
+exit (err)
-- 
2.7.3




reply via email to

[Prev in Thread] Current Thread [Next in Thread]