[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Accuracy of importers?
From: |
Ricardo Wurmus |
Subject: |
Re: Accuracy of importers? |
Date: |
Thu, 28 Oct 2021 12:25:56 +0000 |
User-agent: |
mu4e 1.6.6; emacs 27.2 |
Ludovic Courtès <ludovic.courtes@inria.fr> writes:
Hello Guix!
As I’m preparing my PackagingCon talk and wondering how language
package
managers could make our lives easier, I thought it’d be
interesting to
know how well our importers are doing.
My understanding is that most of them require manual
intervention—i.e.,
one has to tweak what ‘guix import’ produces, even if we ignore
synopsis/description/license, to set the right inputs, etc. If
we were
to estimate the fraction of imported packages for which manual
changes
are needed, what would it look like?
importer fraction of imported packages needing changes
[…]
cran 5% (Ricardo? Simon? seems to almost always
work?)
Like Lars and Simon wrote: the importers work *really* well for
both CRAN and Bioconductor, so much so that I’m using them in the
background here:
https://git.elephly.net/gitweb.cgi?p=software/r-guix-install.git;a=blob;f=guix-install.R;h=2766aa1f2d248a8ed2a4eb4c3244b85574d326e2;hb=HEAD
The biggest annoyance is the missing “license:” prefix when
packaging things for gnu/packages/cran.scm or
gnu/packages/bioconductor.scm. Descriptions need regular clean-
up work (e.g. to complete sentences), even though we’re using some
heuristics to fix the most common stylistic problems. It’s really
not a big deal, though.
The biggest missing feature is recursive import of dependencies
hosted on Github or Mercurial (with “-r -a git” or “-r -a hg”).
I.e. a package on Github that declares a dependency on another
package that’s also only hosted on Github will fail to import that
dependency. This is pretty rare, but it happens with experimental
bioinfo software.
texlive (Ricardo? Thiago? Marius?)
This one is not usable. I’d even add “at all”. I keep announcing
that one day I’ll replace it with a new importer, but that new
importer just isn’t ready yet.
What about licensing info: which ones provide accurate licensing
info?
My guess:
gnu
pypi
cpan
cran
The CRAN importer is as accurate as upstream allows. CRAN
requires a free license, Bioconductor requires a license
declaration (there have been very few cases where the license was
not correct, but a number of cases where the license was non-free,
such as the Artistic 1.0 license. Bioconductor sometimes is
sneaky and the R code is free but a necessary library is not.
texlive
Pretty terrible. The license declaration is generally too vague.
Licenses are often declared without version number, and sometimes
it’s just some generic “free” license. A new importer based on
texlive.tlpdb would not improve this by much, because the upstream
declarations are just spotty and unreliable.
--
Ricardo
PS: attached is a rough WIP patch of what I had been using to
import new texlive stuff.
diff --git a/guix/import/texlive.scm b/guix/import/texlive.scm
index 18d8b95ee0..b94aa1cf40 100644
--- a/guix/import/texlive.scm
+++ b/guix/import/texlive.scm
@@ -19,10 +19,12 @@
(define-module (guix import texlive)
#:use-module (ice-9 match)
+ #:use-module (ice-9 rdelim)
#:use-module (sxml simple)
#:use-module (sxml xpath)
#:use-module (srfi srfi-11)
#:use-module (srfi srfi-1)
+ #:use-module (srfi srfi-2)
#:use-module (srfi srfi-26)
#:use-module (srfi srfi-34)
#:use-module (web uri)
@@ -125,9 +127,9 @@ (define (fetch-sxml name)
(xml->sxml (http-fetch url)
#:trim-whitespace? #t))))
-(define (guix-name component name)
+(define (guix-name name)
"Return a Guix package name for a given Texlive package NAME."
- (string-append "texlive-" component "-"
+ (string-append "texlive-"
(string-map (match-lambda
(#\_ #\-)
(#\. #\-)
@@ -186,12 +188,123 @@ (define (sxml-value path)
((lst ...) `(list ,@lst))
(license license)))))))
+(define tlpdb
+ (memoize
+ (lambda ()
+ (let ((file "/home/rekado/dev/gx/branches/master/texlive.tlpdb")
+ (fields
+ '((name . string)
+ (shortdesc . string)
+ (longdesc . string)
+ (catalogue-license . string)
+ (catalogue-ctan . string)
+ (srcfiles . list)
+ (runfiles . list)
+ (docfiles . list)
+ (depend . list)))
+ (record
+ (lambda* (key value alist #:optional (type 'string))
+ (let ((new
+ (or (and=> (assoc-ref alist key)
+ (lambda (existing)
+ (cond
+ ((eq? type 'string)
+ (string-append existing " " value))
+ ((eq? type 'list)
+ (cons value existing)))))
+ (cond
+ ((eq? type 'string)
+ value)
+ ((eq? type 'list)
+ (list value))))))
+ (acons key new (alist-delete key alist))))))
+ (call-with-input-file file
+ (lambda (port)
+ (let loop ((all (list))
+ (current (list))
+ (last-property #false))
+ (let ((line (read-line port)))
+ (cond
+ ((eof-object? line) all)
+
+ ;; End of record.
+ ((string-null? line)
+ (loop (cons (cons (assoc-ref current 'name) current)
+ all)
+ (list) #false))
+
+ ;; Continuation of a list
+ ((and (zero? (string-index line #\space)) last-property)
+ ;; Erase optional second part of list values like
+ ;; "details=Readme" for files
+ (let ((plain-value (first
+ (string-split
+ (string-trim-both line) #\space))))
+ (loop all (record last-property
+ plain-value
+ current
+ 'list)
+ last-property)))
+ (else
+ (or (and-let* ((space (string-index line #\space))
+ (key (string->symbol (string-take line
space)))
+ (value (string-drop line (1+ space)))
+ (field-type (assoc-ref fields key)))
+ ;; Erase second part of list keys like "size=29"
+ (if (eq? field-type 'list)
+ (loop all current key)
+ (loop all (record key value current field-type)
key)))
+ (loop all current #false))))))))))))
+
+(define (files->directories files)
+ (map (cut string-join <> "/" 'suffix)
+ (delete-duplicates (map (lambda (file)
+ (drop-right (string-split file #\/) 1))
+ files)
+ equal?)))
+
+(define (tlpdb->package name)
+ (and-let* ((data (assoc-ref (tlpdb) name))
+ (dirs (files->directories
+ (append (or (assoc-ref data 'docfiles) (list))
+ (or (assoc-ref data 'runfiles) (list))
+ (or (assoc-ref data 'srcfiles) (list))))))
+ (pk data)
+ ;; TODO
+ `(package
+ (name ,(guix-name name))
+ (version (number->string %texlive-revision))
+ (source (texlive-origin name version
+ ',dirs
+ (base32
+ "TODO"
+ #;
+ ,(bytevector->nix-base32-string
+ (let-values (((port get-hash)
(open-sha256-port)))
+ (write-file checkout port)
+ (force-output port)
+ (get-hash))))))
+ (build-system texlive-build-system)
+ (arguments ,`(,'quote (#:tex-directory "TODO")))
+ ,@(or (and=> (assoc-ref data 'depend)
+ (lambda (inputs)
+ `((propagated-inputs ,inputs))))
+ '())
+ ,@(or (and=> (assoc-ref data 'catalogue-ctan)
+ (lambda (url)
+ `((home-page ,(string-append "https://ctan.org" url)))))
+ '((home-page "https://www.tug.org/texlive/")))
+ (synopsis ,(assoc-ref data 'shortdesc))
+ (description ,(beautify-description
+ (assoc-ref data 'longdesc)))
+ (license ,(string->license
+ (assoc-ref data 'catalogue-license))))))
+
(define texlive->guix-package
(memoize
(lambda* (package-name #:optional (component "latex"))
"Fetch the metadata for PACKAGE-NAME from REPO and return the `package'
s-expression corresponding to that package, or #f on failure."
- (and=> (fetch-sxml package-name)
- (cut sxml->package <> component)))))
+ (tlpdb->package package-name))))
;;; ctan.scm ends here
- Accuracy of importers?, Ludovic Courtès, 2021/10/28
- Re: Accuracy of importers?, Lars-Dominik Braun, 2021/10/28
- Re: Accuracy of importers?, zimoun, 2021/10/28
- Re: Accuracy of importers?, Julien Lepiller, 2021/10/28
- Re: Accuracy of importers?,
Ricardo Wurmus <=
- Re: Accuracy of importers?, Katherine Cox-Buday, 2021/10/28
- Re: Accuracy of importers?, Nicolas Goaziou, 2021/10/29
- Re: Accuracy of importers?, Xinglu Chen, 2021/10/30