[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [libreplanet-discuss] Dropbox vs. Linux
Re: [libreplanet-discuss] Dropbox vs. Linux
Mon, 13 Aug 2018 15:20:45 -0400
On Sun, Aug 12, 2018 at 07:46:15PM -0400, Arthur Torrey wrote:
> I am not a Dropbox user, so don't know a huge amount about it...
> However a friend was just posting that apparently DropBox is dropping
> their support for most of Linux - according to the SlashDot thing he
> linked to, DropBox is only going to support files from non-encrypted
> Ext4 file systems...
> He was wondering about alternatives....
> I pointed him at OwnCloud and a few others I found by searching for
> DropBox in the FSF Free Software Directory, but any other suggestions?
> I will say that I am a bit perplexed by the description of what
> DropBox is supposedly doing. My understanding has always been that
> all the file handling was done by the O/S (Gnu/Linux or other) in a
> transparent manner - you (a program) requests read/write on a file and
> the O/S handles it through the file system driver such that it always
> looks the same to the requester... If so, how how would DropBox even
> know what file system you were using, let alone care???
Knowing is easy--the filesystem type ID is available by invoking a system
call on a directory (see 'stat -f .' output).
There are innumerable filesystem features on Linux, and each Linux
filesystem is free to accept, reject, or ignore each one individually
with assorted differences in behavior. There are also core design
assumptions that are different on each filesystem, such as: whether
the filesystem has unique stable inode numbers or not, whether each
logical block in a file has a unique stable linear address on disk
or not, and whether readdir() enumerates every file exactly once in a
directory if the directory is modified between calls. ext4 meets all
of the above assumptions, but many filesystems do not--especially the
popular modern ones.
Common VFS requirements force Linux filesystem implementations to produce
highly convincing emulations of some features that are not present in
the underlying filesystem. The veil can be pierced by tools that rely
on the features really existing.
e.g. a common optimization is to assume that inode numbers are stable, so
a tool may cache inode numbers and some stat fields in a table to improve
performance. If inode number and ctime observed last week are different
from the inode number and ctime observed today, the tool might assume
the file contents have changed and perform an expensive update operation.
If the filesystem doesn't have inode numbers, the Linux implementation may
generate the inode number and store it temporarily in memory, or the ctime
may by derived from other file attributes like data modification time
and change at times not specified by POSIX. This difference in behavior
can make a tool run very slowly through a lot of false positives and may
even produce incorrect results. Many tools that synchronize data between
two filesystem trees have some variant of this optimization, and even
tools like svn, git, and rsync can fail on some filesystems because of it
(note most such tools also have features to turn this optimization off).
Filesystem support for overlay, NFS, the full gamut of available stat
attributes, and the specific quirks of deduplication, update notification,
free space estimation, and hole mapping are the ones I struggle with on
an every-other-week basis. Some of these require filesystem-specific
workarounds to make applications work properly, others make some
applications practically unusable with some filesystems.
Quite often things seem to work, but then a favorite tool silently fails
when it transitions from dev/test to production workloads, because the
system's capacity to maintain compatibility with a non-portable assumption
is exceeded. This case is a nightmare for support personnel, who then
have to deeply understand the interaction between their application
and _two_ Linux filesystems in worst-case situations to deal with the
I don't know the specific DropBox requirements, but if they have to call
out ext4 encryption (as opposed to dm-crypt) they they are almost
certainly relying on assumptions or features that are specific to ext4
and maybe a handful of other filesystems. It's likely that DropBox would
use at least some ext4-specific extensions or have some non-portable
assumptions baked into their code as a result of aggressive performance
optimization efforts or inappropriate design experience carried over
from other operating systems.
It's also possible that DropBox merely wants to minimize the number of
filesystem-specific extensions that they support to a subset of those
supported by ext4 (i.e. they think ext4 has too many features already
and want to avoid overcommitment of developer resources). That would
also explain calling out ext4 encryption, since that is a feature that
only three Linux filesystems support, and it's not a feature DropBox can
ignore or override and still mostly work properly. xattrs, attribute
bits, sparse files, and ACLs can be ignored or disabled, but if you
don't go out of your way to handle the crypto extension properly you
get a garbage file or no file at all.
> (Of course I've never understood why you would want to store your
> files on other people's computers to begin with, but that is a separate
A data transport is a data transport, whether it's implemented on top
of IP datagrams or a giant subsidized honeypot.
> Arthur Torrey - <address@hidden>
> libreplanet-discuss mailing list
Description: PGP signature