[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#34110: feature request: dual-column du output, showing "real" and "o

From: Assaf Gordon
Subject: bug#34110: feature request: dual-column du output, showing "real" and "on-disk" sizes (and about that "apparent-size" concept)
Date: Wed, 16 Jan 2019 16:06:50 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0


I'll address only the "apparent-size" issue (not the two-columns, or compressed file-systems):

On 2019-01-16 1:13 p.m., René J.V. Bertin wrote:

According to `du --help`, the apparent-size option reports a size that is not 
the actual disk usage. The numbers above seem to show the opposite.
If anything, I find the concept of "apparent size" more appropriate to the size a file 
occupies on the storage medium because ultimately that storage device will not give you more than 
"struct stat : st_size" bytes for uncompressed filesystems.
Another way to say it: with "--apparent-size", du returns the actual file size; 
without, it returns how large the file appears to be (judging from its disk footprint).

"apparent-size" shows how much content/data the file has.
without "apparent-size" du shows the amount of storage consumed (or "wasted"?) on the storage medium (accounting sparse file holes, though I'm not sure about compression).

To illustrate, create three files with specific sizes:

  $ head --bytes=1700 /dev/zero > a
  $ head --bytes=4097 /dev/zero > b
  $ truncate --size=1050000 c        # will be a sparse file

These are their sizes, as in the amount of bytes they contain:

  $ ls -log
  total 12
  -rw-r--r-- 1    1700 Jan 16 15:36 a
  -rw-r--r-- 1    4097 Jan 16 15:36 b
  -rw-r--r-- 1 1050000 Jan 16 15:37 c

These are their "apparent-sizes", rounded up to the nearest
1K block:

  $ du --apparent-size a b c
  2     a
  5     b
  1026  c

e.g. file "a" is 1700 bytes, rounded-up to 2K, and "du --apparent-size"
shows "2".

Using "--apparent-size --block-size=1" (and its equivalent, "--bytes")
will show the exact sizes:

  $ du --apparent-size --block-size=1 a b c
  1700     a
  4097     b
  1050000  c

Without "--apparent-size", du shows how much storage space is actually used/wasted/consumed on the storage medium by the files:

  $ du a b c
  4    a
  8    b
  0    c

How are these numbers calculated?

The simplest case is file "c" - it is completely sparse - so despite
logically containing 1,050,000 zeros, on the actual storage medium it consumes zero data blocks (ignoring inodes blocks and somesuch).

File "a" has 1,700 bytes of data.
On my filesystem the basic block size is 4096, as shown by "stat -f":

  $ stat -f /
    File: "/"
      ID: 5a2cade519bada6a Namelen: 255     Type: ext2/ext3
->Block size: 4096       Fundamental block size: 4096    <-----
  Blocks: Total: 27559017   Free: 18845977   Available: 17435289
  Inodes: Total: 7036928    Free: 6496730

Therefore, any file from size 1 to size 4096 will consume exactly one
disk block. On most common filesystems, disk blocks can not be shared
between files. Meaning that this block is fully consumed.

That's why for file "a" du shows "4" - meaning 4K bytes (exactly one
block) is consumed on the storage medium by this file.

Similarly for file "b" - its size is 4097, which is 1 byte more than one
filesystem block. Hence, file "b" consumes 2 blocks, coming up to 8K.
du then shows "8" for file "b".

Now to your examples:

%> du -hcs /Volumes/nif64/tmp/.npm/ ; du -hcs --apparent-size
340M    /Volumes/nif64/tmp/.npm/ > 180M    /Volumes/nif64/tmp/.npm/
Same folder on btrfs (mounted with compress=lzo): > %> du -hcs /mnt/.npm/ ; du -hcs --apparent-size /mnt/.npm> 198M
/mnt/.npm/> 181M    /mnt/.npm

In both cases, "du --apparent-size" shows about 180MB of actual data (181MB in the second example). That is the amount of actual content
(number of total bytes in these files).

In the first case, these files consume 340MB of space on your disk.
In the second case, these files consume 198MB of space on your disk.
The reason they consume MORE than their actual data is explained above
with the file-system blocks.

This suggest to me that compression is not accounted for in these
values. If it was, then the consumed size (without "--apparent-size")
should've been less than the actual size (with "--apparent-size").

A quick on-line search shows that btrsf's default block size is 16K,
while ZFS's default record-size is 128KB. That might explain
why similar amount of data (and I assume, similar number of files and
sizes) consume more disk space on ZFS (Could be wrong, though, comments
are welcomed).

I hope this helps to clarify "apparent-size".

I'll leave it to others to comment on how compressed file systems
come into play with du.

 - assaf

reply via email to

[Prev in Thread] Current Thread [Next in Thread]