[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#34110: feature request: dual-column du output, showing "real" and "o
bug#34110: feature request: dual-column du output, showing "real" and "on-disk" sizes (and about that "apparent-size" concept)
Wed, 16 Jan 2019 16:06:50 -0700
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0
I'll address only the "apparent-size" issue (not the two-columns, or
On 2019-01-16 1:13 p.m., René J.V. Bertin wrote:
According to `du --help`, the apparent-size option reports a size that is not
the actual disk usage. The numbers above seem to show the opposite.
If anything, I find the concept of "apparent size" more appropriate to the size a file
occupies on the storage medium because ultimately that storage device will not give you more than
"struct stat : st_size" bytes for uncompressed filesystems.
Another way to say it: with "--apparent-size", du returns the actual file size;
without, it returns how large the file appears to be (judging from its disk footprint).
"apparent-size" shows how much content/data the file has.
without "apparent-size" du shows the amount of storage consumed (or
"wasted"?) on the storage medium (accounting sparse file holes, though
I'm not sure about compression).
To illustrate, create three files with specific sizes:
$ head --bytes=1700 /dev/zero > a
$ head --bytes=4097 /dev/zero > b
$ truncate --size=1050000 c # will be a sparse file
These are their sizes, as in the amount of bytes they contain:
$ ls -log
-rw-r--r-- 1 1700 Jan 16 15:36 a
-rw-r--r-- 1 4097 Jan 16 15:36 b
-rw-r--r-- 1 1050000 Jan 16 15:37 c
These are their "apparent-sizes", rounded up to the nearest
$ du --apparent-size a b c
e.g. file "a" is 1700 bytes, rounded-up to 2K, and "du --apparent-size"
Using "--apparent-size --block-size=1" (and its equivalent, "--bytes")
will show the exact sizes:
$ du --apparent-size --block-size=1 a b c
Without "--apparent-size", du shows how much storage space is actually
used/wasted/consumed on the storage medium by the files:
$ du a b c
How are these numbers calculated?
The simplest case is file "c" - it is completely sparse - so despite
logically containing 1,050,000 zeros, on the actual storage medium it
consumes zero data blocks (ignoring inodes blocks and somesuch).
File "a" has 1,700 bytes of data.
On my filesystem the basic block size is 4096, as shown by "stat -f":
$ stat -f /
ID: 5a2cade519bada6a Namelen: 255 Type: ext2/ext3
->Block size: 4096 Fundamental block size: 4096 <-----
Blocks: Total: 27559017 Free: 18845977 Available: 17435289
Inodes: Total: 7036928 Free: 6496730
Therefore, any file from size 1 to size 4096 will consume exactly one
disk block. On most common filesystems, disk blocks can not be shared
between files. Meaning that this block is fully consumed.
That's why for file "a" du shows "4" - meaning 4K bytes (exactly one
block) is consumed on the storage medium by this file.
Similarly for file "b" - its size is 4097, which is 1 byte more than one
filesystem block. Hence, file "b" consumes 2 blocks, coming up to 8K.
du then shows "8" for file "b".
Now to your examples:
%> du -hcs /Volumes/nif64/tmp/.npm/ ; du -hcs --apparent-size
340M /Volumes/nif64/tmp/.npm/ > 180M /Volumes/nif64/tmp/.npm/
Same folder on btrfs (mounted with compress=lzo): > %> du -hcs /mnt/.npm/ ; du -hcs --apparent-size /mnt/.npm> 198M
/mnt/.npm/> 181M /mnt/.npm
In both cases, "du --apparent-size" shows about 180MB of actual data
(181MB in the second example). That is the amount of actual content
(number of total bytes in these files).
In the first case, these files consume 340MB of space on your disk.
In the second case, these files consume 198MB of space on your disk.
The reason they consume MORE than their actual data is explained above
with the file-system blocks.
This suggest to me that compression is not accounted for in these
values. If it was, then the consumed size (without "--apparent-size")
should've been less than the actual size (with "--apparent-size").
A quick on-line search shows that btrsf's default block size is 16K,
while ZFS's default record-size is 128KB. That might explain
why similar amount of data (and I assume, similar number of files and
sizes) consume more disk space on ZFS (Could be wrong, though, comments
I hope this helps to clarify "apparent-size".
I'll leave it to others to comment on how compressed file systems
come into play with du.
bug#34110: feature request: dual-column du output, showing "real" and "on-disk" sizes (and about that "apparent-size" concept), Paul Eggert, 2019/01/16