bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS


From: George Valkov
Subject: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 9 Feb 2023 19:23:57 +0200


> On 2023-02-09, at 6:32 PM, Pádraig Brady <P@draigBrady.com> wrote:
> 
> On 09/02/2023 15:57, George Valkov wrote:
>>> On 2023-02-09, at 1:56 PM, Pádraig Brady <P@draigBrady.com> wrote:
>>> 
>>> On 09/02/2023 09:20, George Valkov wrote:
>>>> Due to a bug in macOS, sparse copies are corrupted on virtual disks
>>>> formatted with APFS. HFS is not affected. Affected are coreutils
>>>> install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
>>>> While reading the entire file returns valid data, scanning for
>>>> allocated segments may return holes where valid data is present.
>>>> In this case a sparse copy does not contain these segments and returns
>>>> zeroes instead. Once the virtual disk is dismounted and then
>>>> mounted again, a sparse copy produces correct results.
>>>> This breaks OpenWRT build on macOS. Details:
>>>> https://github.com/openwrt/openwrt/pull/11960
>>>> https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
>>>> Signed-off-by: Georgi Valkov <gvalkov@gmail.com>
>>>> ---
>>>>  src/copy.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>> diff --git a/src/copy.c b/src/copy.c
>>>> index e16fedb28..4f56138a6 100644
>>>> --- a/src/copy.c
>>>> +++ b/src/copy.c
>>>> @@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
>>>>        return PLAIN_SCANTYPE;
>>>>      }
>>>>  -#ifdef SEEK_HOLE
>>>> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
>>>>    scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>>>>    if (0 <= scan_inference->ext_start || errno == ENXIO)
>>>>      return LSEEK_SCANTYPE;
>>> 
>>> 
>>> Thanks for the detailed report.
>>> The patch might very well be appropriate.
>> Hi! Let’s test the ideas you have first, and fall-back to the patch.
>> In October 2021 macOS Finder was also affected, and that points directly at 
>> Apple.
>> I tested again today, they have fixed Finder. After performing a copy in 
>> Finder,
>> coreutils cp produces a good copy. I have to run
>> make toolchain/gcc/initial/{clean,compile} -j 16
>> before I can reproduce it again with the same file.
>> So while Apple didn't fix the underlaying issue with APFS, they did
>> provide a solution for Finder. And we can make coreutils work correctly too.
> 
> That suggests that Finder may have sync'd the file.
> Now syncing has overheads of course, so not an option to take lightly.

#include "stdio.h"
#include "unistd.h"
#include "fcntl.h"

int main(int argc, char ** argv)
{
    sync();

    int fd = open("cc1", O_RDWR);
    //int fd = open("/Users/g/vhd/coreutils.sparseimage", O_RDWR);
    int a = fdatasync(fd);
    int b = fsync(fd);
    int c = close(fd);

    printf("fdatasync %u  fsync %u  close %u\n", a, b, c);

    return 0;
}

fdatasync 0  fsync 0  close 0

I agree. It takes about one second to run this code.
All calls are successful. The sparse copy after that is still corrupted.
I also tried doing this on the sparse image while it is mounted.

> We may be safer just doing the normal copy on __APPLE__ as per your orig 
> patch.
> A dtruss of Finder might be instructive BTW.

Yes, sync is definitely not good for performance. What is ‘dtruss'?

> Why I suspect syncing is that old fiemap code we used to have in copy
> needed the FIEMAP_FLAG_SYNC flag to avoid bugs like this on some file systems.

I’m not familiar with the coreutils code. Check an old commit.


>>> We avoided copy_file_range() for a long time on Linux for example
>>> until it became usable.
>>> 
>>> Thanks for confirming the latest git is also impacted.
>> Yes, I tested on master today.
>>> I see you're on macOS 12.
>>> macOS 13 supports fclonefileat() which may
>>> avoid the issue, or also be impacted?
>> Yes, I use macOS 12. There are too many complains about 13.
>> I can boot macOS 13 recovery, so that might be an option,
>> if I can setup a working build environment.
>> Can you invite someone with macOS 13 to test the new API?
>> Github has hosted runners. But I’ve never used one. And I think they use 
>> macOS 12.
> 
> OK let's assume fclonefileat() on macOS 13 is fine.
> That's just a thing to consider if one is testing on macOS 13,
> rather than jumping through hoops to test it.
> 
>>> I wonder might an fdatasync(fd) avoid your issue,
>>> which we might do if defined(__APPLE__)?
>> Send me patches, and I will test them.
>> #ifdef SEEK_HOLE
>>   fdatasync(fd); // this did not work, is fd writable? Do we need to call it 
>> on the disk image?
>>    scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>>    if (0 <= scan_inference->ext_start || errno == ENXIO)
>>      return LSEEK_SCANTYPE;
> 
> 
> Oh right interesting. When you say doesn't work, do you mean the sync
> fails, or just doesn't help.  Note the separate coreutils sync util
> only opens with write mode on AIX and Cygwin.

fdatasync returns 0, which means success
The sparse copy after that is still corrupted.


> Relatedly you could just try that external `sync file && cp file dest` also 
> to test the sync hypothesis.

I tried these on cc1 and on the sparse image file, the copy still gets 
corrupted.
sync -d cc1
sync -f cc1
sync


> If the sync doesn't work, or requires write mode
> then that might be enough to decide to just avoid SEEK_DATA on __APPLE__ for 
> now.

manpage: On some UNIX systems (but not Linux), fd must be a writable file 
descriptor.
I’m not sure about macOS. The call succeeds if we open with O_RDONLY,
so I guess we don’t need a writable file descriptor.


>> It would be best if we test on an Intel Mac with T2 chip. I use MBP 2019 16”.
>> That security chip and the encrypted physical disk with APFS might also 
>> affect our results.
>> Georgi Valkov
>> httpstorm.com
>> nano RTOS
>> off-topic: can you please point me at API to take a disk offline or lock it 
>> for exclusive access?
>> I wrote a disk cloning tool and I have that functionality on Windows, I’d 
>> like to add it to Linux, BSD and macOS.
> 
> 
> Note it's best to keep this on list.

I apologise, I accidentally used replay instead of replay all.
Is there a safe way to add the group once a message is sent?

Good luck!


Georgi Valkov
httpstorm.com
nano RTOS






reply via email to

[Prev in Thread] Current Thread [Next in Thread]