[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: python-pyarrow broken for parquet?
From: |
Phil Beadling |
Subject: |
Re: python-pyarrow broken for parquet? |
Date: |
Mon, 5 Jul 2021 13:13:51 +0100 |
As promised - this works for me but the patching of the make files, in
particular the 2 sed commands is very brittle to any changes in the
underlying project. I'm not sure it should go into Guix proper as-is, but
if people think it's useful I'm happy to submit the patch.
I may try to improve on this when I have moment, but if someone else wants
to run with it - it's a good starting point at least.
The problem is that the generation of the PARQUET_INCLUDE_DIR nad
PARQUET_LIB_DIR end up concatentating both the include and lib dirs
together:
For example debugging the cmake file the value
"PARQUET_INCLUDE_DIR/parquet" becomes:
"*/gnu/store/ywklhws3ccb457gsb605z95azbfpsbyl-apache-arrow-3.0.0-lib/*/gnu/store/zzzb4ymfj3igynsflxwxsn58kvnpa6qb-apache-arrow-3.0.0-include/share/include/parquet"
The *lib *directory shouldn't be there at all.
--8<---------------cut here---------------start------------->8---
(define-public python-pyarrow-parquet
(package/inherit python-pyarrow
(arguments
(substitute-keyword-arguments (package-arguments
python-pyarrow)
((#:phases phases)
`(modify-phases ,phases
(add-before 'install 'patch-cmake-variables
(lambda* (#:key inputs #:allow-other-keys)
;; Replace cmake locations with hardcoded
guix links for the underlying C++ lib - this is a pretty awful hack
(invoke "sed" "-i" (string-append
"1s#^#set(PARQUET_INCLUDE_DIR \"" (assoc-ref inputs "apache-arrow:include")
"/share/include\
\")\\n#") "cmake_modules/FindParquet.cmake")
(invoke "sed" "-i" (string-append
"116s#^#set(PARQUET_LIB_DIR \"" (assoc-ref inputs "apache-arrow:lib")
"/lib\")\\n#") "cmake\
_modules/FindParquet.cmake")))
(add-before 'install 'patch-parquet-library
(lambda _
;; Another nasty hack - there must be a
better way to change this?
(substitute* "CMakeLists.txt"
(("parquet_shared") "parquet"))))
(add-before 'install 'set-PYARROW_WITH_PARQUET
(lambda _
(setenv "PYARROW_WITH_PARQUET" "1")
;;(setenv "VERBOSE" "1") ;; useful debug for
cmake
#t))))))
;; we need includes from apache as well as libs for
parquet
(propagated-inputs
`(("python-pandas" ,python-pandas-simm)
("apache-arrow:lib" ,apache-arrow "lib")
("apache-arrow:include" ,apache-arrow "include")
,@(fold alist-delete (package-propagated-inputs
python-pyarrow)
'("python-pandas" "apache-arrow"))))))
--8<---------------cut here---------------end--------------->8---
On Fri, 2 Jul 2021 at 16:34, <phil@beadling.co.uk> wrote:
> Thanks Simon.
>
> Yep I got this far too - and I have a candidate fix for building parquet.
> But it's tremendously hacky (sed'ing hardcoded variables into the cmake
> files to trample the derived settings in several places). It seems to work
> but needs finessing. I'll post here shortly, but not sure it's stable
> enough to be updated in Guix proper. We can debate that when everyone sees
> my horrendous fix.
>
>
>