bug#27684: Can't build disk-images or vm-images on core-updates

From: Ludovic Courtès
Subject: bug#27684: Can't build disk-images or vm-images on core-updates
Date: Mon, 17 Jul 2017 16:02:14 +0200
Hi Leo,

Leo Famulari <address@hidden> skribis:

> Both `guix system disk-image` and `guix system vm-image` fail for me on
> core-updates. Read-only VMs seem to work fine. I'm currently checking to
> see if it fails when building on GuixSD.

The log shows that building the disk-image derivation starts with:

--8<---------------cut here---------------start------------->8---
creating raw image of 102400.00 MiB...
Formatting '/gnu/store/yv5r65584aaml86hc0xrgyffnp70ri36-disk-image', fmt=raw 
--8<---------------cut here---------------end--------------->8---

That’s a lot, no?

Regardless, memory consumption in the VM is not supposed to be
proportional to the size of the image being created.

The allocation failure happens while copying files:

--8<---------------cut here---------------start------------->8---
[  108.852534] init: page allocation stalls for 10004ms, order:0, 
mode:0x1400040(GFP_NOFS), nodemask=(null)
[  108.853423] init cpuset=/ mems_allowed=0
[  108.853781] CPU: 0 PID: 1 Comm: init Not tainted 4.12.0-gnu #1
[  108.854356] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[  108.855441] Call Trace:
[  108.855825]  dump_stack+0x63/0x82
[  108.856355]  warn_alloc+0x114/0x1b0
[  108.856838]  __alloc_pages_slowpath+0x91d/0xd90
[  108.857265]  __alloc_pages_nodemask+0x245/0x260
[  108.857937]  alloc_pages_current+0x95/0x140
[  108.858581]  __page_cache_alloc+0xb5/0xc0
[  108.859194]  pagecache_get_page+0x88/0x220
[  108.859582]  ext4_mb_load_buddy_gfp+0x214/0x400
--8<---------------cut here---------------end--------------->8---

Does that work on previous master with Linux-libre 4.12.0 (current
master is at 4.12.2)?  (This would allow us to determine if this is an
ext4 bug, who knows…)

If it does, then the only other issue I can think of is if Guile itself,
while running ‘copy-recursively’ from (guix build utils), eats memory
proportional to the number of files, leading to an OOM condition.
However the kernel message don’t report it as an OOM, AFAICS.


