[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] hurd/translator/httpfs.mdwn: I added a Intro, how to use, and TO

From: Joshua Branson
Subject: [PATCH] hurd/translator/httpfs.mdwn: I added a Intro, how to use, and TODO section. hurd/translator/xmlfs.mdwn: I added a How to use and TODO wishlist section.
Date: Thu, 10 Sep 2020 09:50:28 -0400

I copied most of the text from the Hurd extras repos.
 hurd/translator/httpfs.mdwn | 73 ++++++++++++++++++++++++++++++++++++
 hurd/translator/xmlfs.mdwn  | 74 +++++++++++++++++++++++++++++++++++++
 2 files changed, 147 insertions(+)

diff --git a/hurd/translator/httpfs.mdwn b/hurd/translator/httpfs.mdwn
index 8b02aa06..0fc6fbbd 100644
--- a/hurd/translator/httpfs.mdwn
+++ b/hurd/translator/httpfs.mdwn
@@ -12,6 +12,79 @@ License|/fdl]]."]]"""]]
 While the httpfs translator works, it is only suitable for very simple use 
cases: it just provides the actual file contents downloaded from the URL, but 
no additional status information that are necessary for interactive use. 
(Progress indication, error codes, HTTP redirects etc.)
+# Intro
+Here we describe the structure of the /http filesystem for the Hurd.
+Under the Hurd, we provide a translator called 'httpfs' which is intended
+to provide the filesystem structure.
+The httpfs translator accepts an "http:// URL" as an argument. The underlying
+node of the translator can be a file or directory. This is guided by the --mode
+command lineoption. Default is a directory.
+If its a file, only file system read requests are supported on that node.  If
+its a directory, we can cd into that directory and ls would list the files in
+the web server. A web server may provide a directory listing or it may not
+provide, whatever it be the case the web server always returns an HTML stream
+for an user request (GET command). So to get the files residing in the web
+server, we have to parse the incoming HTML stream to find out the anchor
+tags. These anchor tags point to different pages or files in the web
+server. These file name are extracted and filled into the node of the
+translator. An anchor tag can also be a pointer to an external URL, in such a
+case we just show that URL as a regular file so that the user can make file
+system read requests on that URL. In case the file is a URL, we change the name
+of URL by converting all the /'s with .'s so that it can be displayed in the
+file system.
+Only the root node is filled when the translator is set, subdirectories inside
+that are filled as on demand, i.e. when a cd or ls occurs on that particular 
+The File size is now displayed as 0. One way of getting individual file sizes 
+sending a GET request for each file and cull the file size from Content-Length
+field of an HTTP response. But this may put a very heavy burden on the network,
+So as of now we have not incorporated this method with this http translator.
+The translator uses the libxml2 library for doing the parsing of HTML
+stream. The libxml2 provides SAX interfaces for the parser which are used for
+finding the begining of anchor tags <A href="i.html>. So the translator has
+dependency on the libxml2 library.
+If the connection to the Internet through a proxy, then the user must 
+give the IP address and port of the proxy server by using the command line
+options --proxy and --port.
+# How to Use httpfs
+    # settrans -a tmp/ /hurd/httpfs http://www.gnu.org/software/hurd/index.html
+<Remember to give the / at the end of the URL, unless you are specifying a 
specific file like www.hurd-project.com/httpfs.html >
+    # cd tmp/
+    # ls -l
+    # settrans -a tmp/ /hurd/httpfs 
http://www.gnu.org/software/hurd/index.html --proxy=
+                               --port=3126
+The above command should be used in case if the access to the Internet is
+through a proxy server, substitute your proxies IP and port no.s
+- https:// support
+- scheme-relative URL support (eg. "//example.com/")
+- query-string and fragment support
+- HTTP/1.1 support
+- HTTP/2 support
+- HTTP/3 support
+- Teach httpfs to understand HTTP status codes like re-directs, 404 not found,
+  etc.
+- Teach httpfs to look for "sitemaps".  Many sites offer a sitemap, and this
+  would be a nifty way for httpfs to allow grep-ing the entire site's contents.
 # Source
diff --git a/hurd/translator/xmlfs.mdwn b/hurd/translator/xmlfs.mdwn
index a4de1668..bde5960b 100644
--- a/hurd/translator/xmlfs.mdwn
+++ b/hurd/translator/xmlfs.mdwn
@@ -11,6 +11,80 @@ License|/fdl]]."]]"""]]
 `xmlfs` is a translator that provides access to XML documents through the
+# How to Use xmlfs
+       xmlfs - a translator for accessing XML documents
+This is  only an alpha version.   It works in read  only.  It supports
+text  nodes and  attributes. It  doesn't do  anything fancy  like size
+computing, though. Here is an example of how to use it:
+    $ wget 
+    $ settrans -ca xml /hurd/xmlfs example.xml  #the website says to use 
+    $ cd xml; ls
+      library0 library1
+    $ cd library0; ls -A
+      .text1  .text2  @name  book0  book1  book2  sub-library0  sub-library1
+    $ cat .text2
+CDATA, again !
+    $ cat book0
+      <book>
+      <author>Mark Twain</author>
+      <title>La case de l'oncle Tom</title>
+      <isbn>4242</isbn>
+      </book>
+    $ cat book0/author/.text
+      Mark Twain
+As  you can  see,  text nodes  are  named .textN,  with  N an  integer
+starting from 0. Sorting is supposed to be stable, so you get the same
+N every time you access the same  file. If there is only one text node
+at this level, N is ommitted. Attributes are prefixed with @.
+An  example file,  example.xml, is  provided. Of  course, it  does not
+contain anything  useful. xmlfs  has been tested  on several-megabytes
+XML documents, though.
+Comments are welcome.
+       -- Manuel Menal <mmenal@hurdfr.org>
+- Handle memory usage in a clever way:
+     - do not dump the nodes at each read, try to guess if read()
+       is called in a sequence of read() operations (e.g. cat reads
+       8192 bytes by 8192 bytes) and if it is, cache the node
+       contents. That'd need a very small ftpfs-like GC.
+     - perhaps we shouldn't store the node informations from
+       first access to end and have a pool of them. That might come
+       with next entries though.
+ - Handle changes of the backing store (XML document) while running.
+    (Idea: we should probably attach to the XML node and handle
+     read()/write() operations ourselves, with libxml primitives.)
+ - Write support. Making things like echo >, sed and so on work is
+   quite obvious. Editing is not -that- simple, 'cause we could
+   want to save a not XML well-formed, and libxml will just return
+   an error. Perhaps we should use something like 'sync'.
+ - Handle error cases in a more clever way ; there are many error
+   conditions that will just cause xmlfs to crash or do strange
+   things. We should review them.
+ - Make sorting *really* stable.
+ - Kilobug suggested a --xslt option that would make xmlfs provide
+   a tree matching the XSLT-modified document.
+   (Problem: In this case we cannot attach easily to the .xml 'cause
+    the user would loose access to theirs original document. Perhaps
+    we should allow an optional "file.xml" argument and check if it
+    is not the same as the file we are attaching to when --xslt is
+    specified.)
+ - DTD support ; perhaps XML schema/RelaxNG when I'm sure I understand
+   them ;-)
 # Source

reply via email to

[Prev in Thread] Current Thread [Next in Thread]