lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Validating paths with std::filesystem


From: Vadim Zeitlin
Subject: [lmi] Validating paths with std::filesystem
Date: Thu, 8 Oct 2020 15:22:58 +0200

 Hello,

 Sorry for yet another distracting question, but this is relatively
important and, after just making 14d582d49 (Trap exceptions from filesystem
library, 2020-10-07), you might already have the answer to this question in
your L1 cache, so I'd like to ask it now, before it's evicted from it:

 Do we need to validate paths at all when using std::filesystem?

 Currently using invalid characters in the path components results in an
exception, as you've seen with the commit above (although I'm not sure how
exactly can this exception be reproduced and if you could explain what do
we need to do to see it, i.e. what should be in configurable_settings.xml,
it would be useful for testing it with the new std::filesystem branch). But
with std::filesystem there are no such exceptions at all, i.e. it simply
doesn't do any validation of the path components, unlike Boost.Filesystem.

 The question is whether it's really a problem? In principle, constructing
an invalid path in memory doesn't do any harm, as long as any attempt to
actually use will result in an error -- as it will, under MSW. But this is
different from the current behaviour and it could be argued that detecting
the problem earlier is better than detecting it later. So do you think we
should implement our own path validation on top of std::filesystem::path,
just as we've already done with our own formatting, used instead of the
default behaviour of the standard class?

 And if you do think it's worth doing, should we faithfully the partial
path validation done by Boost.Filesystem version we currently use or should
we actually perform the full validation according to MSW rules described at
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file and
which I'll quote here in case this URL changes:

        * Use any character in the current code page for a name, including 
Unicode characters and characters in the extended character set (128–255), 
except for the following:

          - The following reserved characters:
                < (less than)
                > (greater than)
                : (colon)
                " (double quote)
                / (forward slash)
                \ (backslash)
                | (vertical bar or pipe)
                ? (question mark)
                * (asterisk)

          - Integer value zero, sometimes referred to as the ASCII NUL 
character.

          - Characters whose integer representations are in the range from 1 
through 31, except for alternate data streams where these characters are 
allowed. For more information about file streams, see File Streams.

          - Any other character that the target file system does not allow.

        * Do not use the following reserved names for the name of a file:

          CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, 
COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9. Also avoid 
these names followed immediately by an extension; for example, NUL.txt is not 
recommended. 

We could implement all of these rules with the exception of the
filesystem-dependent one while Boost.Filesystem doesn't check for all of
them (e.g. it makes no attempt to check for any of the reserved file
names) and if we do implement validation, I think it would be better to be
complete.

 So, as usual, there are several choices:

0. Don't implement any validation at all.
1. Implement minimal validation exactly reproducing the current behaviour.
2. Implement [as] full validation [as possible].

 I think (0) should actually be fine, but if not, then I'd rather do (2)
than (1).

 What do you think?
VZ

Attachment: pgpci56fmaoap.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]