gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-SVN] r37148 - Extractor-docs/WWW


From: gnunet
Subject: [GNUnet-SVN] r37148 - Extractor-docs/WWW
Date: Sun, 8 May 2016 01:16:51 +0200

Author: grothoff
Date: 2016-05-08 01:16:51 +0200 (Sun, 08 May 2016)
New Revision: 37148

Modified:
   Extractor-docs/WWW/index.html
Log:
fixes from Therese Godefroy

Modified: Extractor-docs/WWW/index.html
===================================================================
--- Extractor-docs/WWW/index.html       2016-05-07 23:12:49 UTC (rev 37147)
+++ Extractor-docs/WWW/index.html       2016-05-07 23:16:51 UTC (rev 37148)
@@ -1,53 +1,30 @@
 <!--#include virtual="/server/header.html" -->
-<!-- Parent-Version: 1.69 -->
-
-<!-- Instructions for adapting this boilerplate to a new project: -->
-
-<!-- 1. In the line above starting "Parent-Version:", remove the
-        "$Revision...$" from around the revision number,
-        leaving just Parent-Version: and the number. -->
-
-<!-- 2. Replace "baz" with the name of your project.
-        You should be able to do this with search and replace;
-        making sure that the search is case insensitive and
-        that the case of the replacement matches the case
-        of the string found. In Emacs, query-replace will do this
-        when case-fold-search and case-replace are both non-nil
-        and both search and replacement string are given in lower case. -->
-
-<!-- 3. Of course update the actual information according to your project,
-        such as mailing lists, project locations, and maintainer name.  -->
-
-<!-- 4. You can use the patch-from-parent script to semi-automate
-        merging future changes to the boilerplate with your file:
-        
http://web.cvs.savannah.gnu.org/viewvc/*checkout*/www/server/standards/patch-from-parent?root=www&content-type=text%2Fplain
-        -->
-
-<!-- If you would like to make sure your page validates with HTML5, that
-     would be a good thing.  To do that, change the first line from
-     to /server/html5-header.html before trying the validation.  Maybe
-     someday we will be able to make /server/header be HTML5.  -->
-
+<!-- Parent-Version: 1.79 -->
 <title>Libextractor
 - GNU Project - Free Software Foundation</title>
-<meta name="content-language" content="en">
-<meta name="language" content="en">
-<meta name="description" content="a simple library for keyword extraction">
-<meta name="author" content="Vids Samanta and Christian Grothoff">
-<meta name="rights" content="(C) 2002--2012 by Vids Samanta and Christian 
Grothoff">
-<meta name="keywords" content="keyword, meta data, extraction, mp3, html, 
images, jpeg, gif, ps, mime, real, qt, asf, mpeg, avi, riff, tiff, summary, 
summaries, kbps, format, mime-type, zip, elf, doc, ppt, xls, sha-1, md5, open 
office, sxw, dvi, id3, id3v2, id3v2.3, id3v2.4, thumbnails, exiv2, nsf, sid, 
flv, flac, pax, cpio, ISO9960, shar, raw, mtree, rar, 7-zip, cab, lha, lzh, 
xar, ar">
-<meta name="robots" content="index,follow">
-<meta name="revisit-after" content="28 days">
-<meta name="content-language" content="en">
-<meta name="language" content="en">
-<meta http-equiv="expires" content="43200">
-<meta http-equiv="content-type" content="text/html; charset=UTF-8">
-<link rel="SHORTCUT ICON" href="favicon.ico">
+<style type="text/css"><!--
+img { width: 9em; }
+#content .emph-box { padding: .7em 1.2em; }
+#content dd ul { margin-left: 0; }
+#content dd li { margin-bottom: 1em; }
+--></style>
+<meta name="content-language" content="en" />
+<meta name="language" content="en" />
+<meta name="description" content="a simple library for keyword extraction" />
+<meta name="author" content="Vids Samanta and Christian Grothoff" />
+<meta name="rights" content="(C) 2002--2012 by Vids Samanta and Christian 
Grothoff" />
+<meta name="keywords" content="keyword, meta data, extraction, mp3, html, 
images, jpeg, gif, ps, mime, real, qt, asf, mpeg, avi, riff, tiff, summary, 
summaries, kbps, format, mime-type, zip, elf, doc, ppt, xls, sha-1, md5, open 
office, sxw, dvi, id3, id3v2, id3v2.3, id3v2.4, thumbnails, exiv2, nsf, sid, 
flv, flac, pax, cpio, ISO9960, shar, raw, mtree, rar, 7-zip, cab, lha, lzh, 
xar, ar" />
+<meta name="robots" content="index,follow" />
+<meta name="revisit-after" content="28 days" />
+<meta name="content-language" content="en" />
+<meta name="language" content="en" />
+<meta http-equiv="expires" content="43200" />
+<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
+<link rel="SHORTCUT ICON" href="favicon.ico" />
+<!--#include virtual="/server/gnun/initial-translations-list.html" -->
 <!--#include virtual="/server/banner.html" -->
-<!--#set var="article_name" value="/server/standards/boilerplate" -->
-<!--#include virtual="/server/gnun/initial-translations-list.html" -->
 <h2>GNU Libextractor</h2>
-<img src="extractor_logo.png" alt="libextractor" vspace="0" width="136" 
border="0" height="94" hspace="0" align="right">
+<img src="extractor_logo.png" alt="libextractor" class="imgright" />
 
 <p>
 GNU Libextractor is a library used to extract meta data from files.
@@ -56,18 +33,18 @@
 keywords and meta data to match against queries and to show to users
 instead of only relying on filenames.  libextractor contains the shell
 command <tt>extract</tt> that, similar to the well-known <tt>file</tt>
-command, can extract meta data from a file an print the results to
+command, can extract meta data from a file and print the results to
 stdout.
 </p>
 <p>
 Currently, libextractor supports the following formats:
 HTML, MAN,
-PS, DVI, 
+PS, DVI,
 OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw),
 FLAC, MP3 (ID3v1 and ID3v2), OGG, WAV,
 S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse Tracker), NSF(E) 
(NES music), SID (C64 music),
 EXIV2, JPEG, GIF, PNG, TIFF,
-DEB, RPM, 
+DEB, RPM,
 TAR(.GZ), LZH, LHA, RAR, ZIP, CAB, 7-ZIP, AR, MTREE, PAX, CPIO, ISO9660, SHAR, 
RAW, XAR
 FLV, REAL, RIFF (AVI), MPEG, QT and ASF.
 Also, various additional MIME types are detected.
@@ -111,23 +88,23 @@
 
 <dt>Tar Package</dt>
 <dd>
-The latest version can be found on <a 
href="http://ftpmirror.gnu.org/libextractor/";>GNU mirrors</a>.  
-If the mirror does not work, you should be able to find them on the main FTP 
server at 
-<a 
href="ftp://ftp.gnu.org/gnu/libextractor/";>ftp://ftp.gnu.org/gnu/libextractor/</a>.
  
-<br>
+The latest version can be found on <a 
href="http://ftpmirror.gnu.org/libextractor/";>GNU mirrors</a>.
+If the mirror does not work, you should be able to find them on the main FTP 
server at
+<a 
href="ftp://ftp.gnu.org/gnu/libextractor/";>ftp://ftp.gnu.org/gnu/libextractor/</a>.
+<br />
 Latest release is <a 
href="http://ftpmirror.gnu.org/libextractor/libextractor-1.1.tar.gz";>libextractor-1.1.tar.gz</a>.
-<br>
+<br />
 Latest Java-binding is <a 
href="http://ftpmirror.gnu.org/libextractor/libextractor-java-1.0.0.tar.gz";>libextractor-java-1.0.0.tar.gz</a>.
-<br>
+<br />
 Latest Mono-binding is <a 
href="http://ftpmirror.gnu.org/libextractor/libextractor-mono-0.5.23.tar.gz";>libextractor-mono-0.5.23.tar.gz</a>.
-<br>
+<br />
 Latest Python-binding is <a 
href="http://ftpmirror.gnu.org/libextractor/libextractor-python-0.5.tar.gz";>libextractor-python-0.5.tar.gz</a>.
 </dd>
 
 <dt>RPM Package</dt>
 <dd>
-RPMs for SuSE 9.3 can be found here (<a 
href="ftp://ftp.suse.com/pub/people/ke/9.3-i386/";>i386</a>, 
-<a href="ftp://ftp.suse.com/pub/people/ke/9.3-x86_64/";>x86_64</a>, 
+RPMs for SuSE 9.3 can be found here (<a 
href="ftp://ftp.suse.com/pub/people/ke/9.3-i386/";>i386</a>,
+<a href="ftp://ftp.suse.com/pub/people/ke/9.3-x86_64/";>x86_64</a>,
 <a href="ftp://ftp.suse.com/pub/people/ke/SRPM/";>SRPM</a>)
 </dd>
 
@@ -163,12 +140,12 @@
 </p>
 <p>
 Articles related to libextractor:
+</p>
 <ul>
 <li><a href="http://www.linuxjournal.com/article/7552";>Reading File Metadata 
with extract and libextractor</a></li>
 <li><a href="http://servers.linux.com/servers/06/08/21/1558230.shtml";>How to 
recover lost files after you accidentally wipe your hard drive</a></li>
 <li><a 
href="http://www.gnucitizen.org/blog/all-your-metadata-are-belong-to-us";>All 
your Metadata are belong to Us</a></li>
 </ul>
-</p>
 
 
 <h3 id="mail">Mailing lists</h3>
@@ -191,7 +168,7 @@
 and most other GNU software are made on
 <a href="http://lists.gnu.org/mailman/listinfo/info-gnu";>info-gnu</a>
 (<a href="http://lists.gnu.org/archive/html/info-gnu/";>archive</a>).
-If you only want to get notifications about Libextractor, we 
+If you only want to get notifications about Libextractor, we
 suggest you subscribe to the project at
 <a href="http://freshmeat.net/projects/libextractor/";>freshmeat</a>.
 </p>
@@ -198,7 +175,7 @@
 
 <p>
 Security reports that should not be made immediately public can be
-sent directly to <a href="http://grothoff.org/christian/";>the maintainer</a>. 
+sent directly to <a href="http://grothoff.org/christian/";>the maintainer</a>.
 If there is no response to an urgent
 issue, you can escalate to the general
 <a href="http://lists.gnu.org/mailman/listinfo/security";>security</a>
@@ -217,31 +194,50 @@
 
 <dt>Development</dt>
 
-<dd>Known bugs and open feature requests are tracked in 
+<dd>Known bugs and open feature requests are tracked in
     <a href="https://gnunet.org/bugs/";>our bugtracker</a>.</dd>
 
 <dt>Subversion access</dt>
 <dd>
+<ul><li><p>
 You can access the current development version of libextractor using
-<pre>$ svn checkout https://gnunet.org/svn/Extractor</pre><br>
+</p>
+<pre class="emph-box">$ svn checkout https://gnunet.org/svn/Extractor</pre>
+</li>
+<li><p>
 A Java binding for libextractor is in
-<pre>$ svn checkout https://gnunet.org/svn/Extractor-java</pre><br>
+</p>
+<pre class="emph-box">$ svn checkout 
https://gnunet.org/svn/Extractor-java</pre>
+</li>
+<li><p>
 A Mono binding for libextractor is in
-<pre>$ svn checkout https://gnunet.org/svn/Extractor-mono</pre><br>
+</p>
+<pre class="emph-box">$ svn checkout 
https://gnunet.org/svn/Extractor-mono</pre>
+</li>
+<li><p>
 A Python binding can be found under
-<pre>$ svn checkout https://gnunet.org/svn/Extractor-python</pre>
+</p>
+<pre class="emph-box">$ svn checkout 
https://gnunet.org/svn/Extractor-python</pre>
+<p>
 A source package is <a 
href="http://ftpmirror.gnu.org/libextractor/libextractor-python-0.5.tar.gz";>here</a>.
-This binding has been packaged as a python egg, available <a 
href="http://cheeseshop.python.org/pypi/Extractor";>here</a>
+This binding has been packaged as a python egg, available <a 
href="http://cheeseshop.python.org/pypi/Extractor";>here</a>.
 A second Python binding that includes a binding for doodle can be found <a 
href="http://grothoff.org/christian/doodle/download/nokos_extractor_doodle_python.zip";>here</a>.
-<br>
-A Perl binding is in <a 
href="http://search.cpan.org/~flora/File-Extractor/";>CPAN</a>
-The latest version of the Perl binding is available using <tt>git clone 
git://git.perldition.org/File-Extractor.git/</tt>
-<br>
+</p></li>
+<li><p>
+A Perl binding is in <a 
href="http://search.cpan.org/~flora/File-Extractor/";>CPAN</a>.
+The latest version of the Perl binding is available using
+</p>
+<pre class="emph-box">$ git clone 
git://git.perldition.org/File-Extractor.git/</pre>
+</li>
+<li><p>
 A Ruby binding has been published <a 
href="http://raa.ruby-lang.org/project/extractor/";>here</a> (<a 
href="http://gnunet.org/libextractor/download/libextractor-ruby-0.9.tar.gz";>mirror</a>).
 Another Ruby binding has been published <a 
href="http://extractor.rubyforge.org/";>here</a> (<a 
href="http://ftpmirror.gnu.org/libextractor/libextractor-ruby-0.1.gem";>mirror</a>).
-<br>
+</p></li>
+<li><p>
 An initial draft of a PHP binding can be found under
-<pre>$ svn checkout https://gnunet.org/svn/Extractor-php</pre>
+</p>
+<pre class="emph-box">$ svn checkout https://gnunet.org/svn/Extractor-php</pre>
+</li></ul>
 </dd>
 
 
@@ -276,6 +272,7 @@
 <dl>
 <dt>Installation</dt>
 <dd>
+<p>
 The simplest way to install GNU libextractor is to use one of the binary
 packages which are available online for many distributions.  Note that
 under Debian, the extract tool is in a separate
@@ -282,11 +279,14 @@
 package <tt>extract</tt> and headers required to compile other
 applications against libextractor are in <tt>libextractor-dev</tt>.
 Thus, under Debian, you should use:
-<pre>
+</p>
+<pre class="emph-box">
 # apt-get install libextractor-dev extract
 </pre>
+<p>
 Compiling by hand follows the usual sequence:
-<pre>
+</p>
+<pre class="emph-box">
 $ tar xzvf libextractor.x.y.z.tar.gz
 $ cd libextractor.x.y.z
 $ ./configure
@@ -293,11 +293,14 @@
 $ make
 # make install
 </pre>
+<p>
 Note that you need various dependencies (read <tt>README</tt>
 for an up-to-date list) in order to compile all of the plugins.
+</p>
 </dd>
 <dt>Using the extract tool</dt>
 <dd>
+<p>
 After installing GNU libextractor, the extract tool can be used to obtain
 meta data from documents.  By default, the extract tool uses the
 canonical set of plugins, which consists of all format-specific
@@ -307,12 +310,14 @@
 the option <tt>-b</tt> is likely to come in handy to automatically
 create bibtex entries from documents that have been properly equipped
 with meta-data (if available).
-<br>
+</p>
+<p>
 Further options are described in the extract manpage 
(<tt>man&nbsp;1&nbsp;extract</tt>).
+</p>
 </dd>
 <dt>Example Output</dt>
 <dd>
-<pre>
+<pre class="emph-box">
 $ extract libextractor-0.1.3-1.src.rpm
 Keywords for file libextractor-0.1.3-1.src.rpm:
 os - linux
@@ -331,7 +336,7 @@
 unknown - SOURCE RPM 3.0
 mimetype - application/x-rpm
 </pre>
-<pre>
+<pre class="emph-box">
 $ extract extractor_logo.png
 Keywords for file extractor_logo.png:
 image dimensions - 272x188
@@ -348,6 +353,7 @@
 </dd>
 <dt>Using the GNU libextractor library in your programs</dt>
 <dd>
+<p>
 The following listing shows the code of a minimalistic program that
 uses GNU libextractor.  Compiling the fragment requires passing the
 option <tt>-lextractor</tt> to gcc.  For details and additional
@@ -357,17 +363,17 @@
 communicate with libextractor is also available.  Python programmers
 will find that libextractor (since 0.5.0) can also be used from
 Python, just <tt>import Extractor</tt>.
-<br>
-<pre>
+</p>
+<pre class="emph-box">
 #include &lt;extractor.h&gt;
 
-int 
-main (int argc, char * argv[]) 
+int
+main (int argc, char * argv[])
 {
   struct EXTRACTOR_PluginList *plugins
     = EXTRACTOR_plugin_add_defaults (EXTRACTOR_OPTION_DEFAULT_POLICY);
   EXTRACTOR_extract (plugins, argv[1],
-                     NULL, 0, 
+                     NULL, 0,
                      &amp;EXTRACTOR_meta_data_print, stdout);
   EXTRACTOR_plugin_remove_all (plugins);
   return 0;
@@ -376,6 +382,7 @@
 </dd>
 <dt>Writing new Plugins for GNU libextractor</dt>
 <dd>
+<p>
 The most complicated thing when writing a new plugin for GNU
 libextractor is the writing of the actual parser for a specific
 format.  Nevertheless, the basic pattern is always the same.  The
@@ -384,11 +391,12 @@
 the plugin directory (typically <tt>$PREFIX/lib/libextractor/</tt>).
 The library must export a method <tt>EXTRACTOR_XXX_extract_method</tt>
 with the following signature:
-<pre>
+</p>
+<pre class="emph-box">
 void
 EXTRACTOR_XXX_extract_method (struct EXTRACTOR_ExtractContext *ec);
 </pre>
-<br>
+<p>
 <tt>ec</tt> provides a callback to invoke with meta data as well as
 functions for reading data from the file that is being processed.
 Most plugins start by reading the first bytes of the file and checking that
@@ -398,7 +406,8 @@
 argument to <tt>proc</tt> and other function invoked from within <tt>ec</tt>.
 Finally, <tt>ec-&gt;config</tt> is an arbitrary string of options that the 
plugin is
 free to interpret. Most plugins ignore <tt>config</tt>.
-<br>
+</p>
+<p>
 If the meta data extracted is a string, it is supposed to be converted
 into the UTF-8 character set by the plugin.  However, in cases where
 the character encoding used in the document is unknown, no conversion
@@ -412,7 +421,8 @@
 the meta data type.  Common meta data types are &quot;author&quot;,
 &quot;title&quot; and &quot;mime-type&quot;.  The full signature of
 the &quot;proc&quot; callback is:
-<pre>
+</p>
+<pre class="emph-box">
 typedef int (*EXTRACTOR_MetaDataProcessor)(void *cls,
                                            const char *plugin_name,
                                            enum EXTRACTOR_MetaType type,
@@ -421,8 +431,10 @@
                                            const char *data,
                                            size_t data_len);
 </pre>
-If &quot;proc&quot; returns non-zero, the plugin should abort 
-processing the current file and return.  
+<p>
+If &quot;proc&quot; returns non-zero, the plugin should abort
+processing the current file and return.
+</p>
 </dd>
 
 <dt>Related projects and useful resources</dt>
@@ -464,64 +476,50 @@
 option) any later version.</p>
 
 
-<!-- If needed, change the copyright block at the bottom. In general,
-     pages on the GNU web server should be under CC BY-ND 3.0 US.
-     Please do NOT change or remove this without talking
-     with the webmasters or licensing team first.
-     Please make sure the copyright date is consistent with the document.
-     For web pages, it is ok to list just the latest year the document
-     was modified, or published.
-     
-     If you wish to list earlier years, that is ok too.
-     Either "2001, 2002, 2003" or "2001-2003" are ok for specifying
-     years, as long as each year in the range is in fact a copyrightable
-     year, i.e., a year in which the document was published (including
-     being publicly visible on the web or in a revision control system).
-     
-     There is more detail about copyright years in the GNU Maintainers
-     Information document, www.gnu.org/prep/maintain. -->
-
-
 </div><!-- for id="content", starts in the include above -->
 <!--#include virtual="/server/footer.html" -->
 <div id="footer">
+<div class="unprintable">
 
-<p><!-- WEBMASTERS: Replace address@hidden with address@hidden and
-        remove this comment after completion. -->
-
-Please send general FSF &amp; GNU inquiries to
+<p>Please send general FSF &amp; GNU inquiries to
 <a href="mailto:address@hidden";>&lt;address@hidden&gt;</a>.
 There are also <a href="/contact/">other ways to contact</a>
 the FSF.  Broken links and other corrections or suggestions can be sent
 to <a href="mailto:address@hidden";>&lt;address@hidden&gt;</a>.</p>
 
-<p><!-- TRANSLATORS: Ignore the original text in this paragraph,
-        replace it with the translation of these two:
-
-        We work hard and do our best to provide accurate, good quality
-        translations.  However, we are not exempt from imperfection.
-        Please send your comments and general suggestions in this regard
-        to <a href="mailto:address@hidden";>
-        &lt;address@hidden&gt;</a>.</p>
-
-        <p>For information on coordinating and submitting translations of
-        our web pages, see <a
-        href="/server/standards/README.translations.html">Translations
-        README</a>. -->
-Please see the <a
+<p>Please see the <a
 href="/server/standards/README.translations.html">Translations
 README</a> for information on coordinating and submitting translations
 of this article.</p>
+</div>
 
+<!-- Regarding copyright, in general, standalone pages (as opposed to
+     files generated as part of manuals) on the GNU web server should
+     be under CC BY-ND 4.0.  Please do NOT change or remove this
+     without talking with the webmasters or licensing team first.
+     Please make sure the copyright date is consistent with the
+     document.  For web pages, it is ok to list just the latest year the
+     document was modified, or published.
+
+     If you wish to list earlier years, that is ok too.
+     Either "2001, 2002, 2003" or "2001-2003" are ok for specifying
+     years, as long as each year in the range is in fact a copyrightable
+     year, i.e., a year in which the document was published (including
+     being publicly visible on the web or in a revision control system).
+
+     There is more detail about copyright years in the GNU Maintainers
+     Information document, www.gnu.org/prep/maintain. -->
 <p>Copyright &copy; 2012 Free Software Foundation, Inc.</p>
 
 <p>This page is licensed under a <a rel="license"
-href="http://creativecommons.org/licenses/by-nd/3.0/us/";>Creative
-Commons Attribution-NoDerivs 3.0 United States License</a>.</p>
+href="http://creativecommons.org/licenses/by-nd/4.0/";>Creative
+Commons Attribution-NoDerivatives 4.0 International License</a>.</p>
 
-<p>Updated:
-<!-- timestamp start -->
-$Date: 2013/03/28 09:00:55 $
+<!--#include virtual="/server/bottom-notes.html" -->
+
+<p class="unprintable">Updated:
+ <!-- timestamp start -->
+$Date: 2015/05/18 06:12:54 $
 <!-- timestamp end -->
 </p>
 </div>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]