bug-parted
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Linux-NTFS-Dev] NTFS resizer


From: Anton Altaparmakov
Subject: Re: [Linux-NTFS-Dev] NTFS resizer
Date: Thu, 09 Aug 2001 02:04:37 +0100

At 01:18 09/08/2001, Andrew Clausen wrote:
On Wed, Aug 08, 2001 at 12:45:22PM +0100, Anton Altaparmakov wrote:
> You have a structure: the mft record. It contains all attributes nicely
> sorted.

It contains the on-disk attributes, not ntfs_attr.

That's my point. There should be no such thing as ntfs_attr. The on disk representation is entirely sufficient to do everything.

I think attributes need to be abstracted from MFT records, because
they may need to be shuffled around between MFTs.  (Eg: an attribute
gets inserted, or a run-list grows...)

Then space is made for this attribute and it is inserted. mkntfs even goes as far as CREATING the attributes in their correct place in the mft record itself, i.e. no memcpy involved for that.

This is the only way to catch when attributes become too big and need to be made non-resident or when they have to be moved out to other mft records.

I was thinking this shuffling should happen when the file is synced
to disk.  (ntfs_file_sync()?)

This is a possible approach and is indeed what the old NTFS driver does (very badly so it trashes your fs most of the time...). I hate that approach (but I would accept it, if it were written properly, which I think is extremely hard to do because you are creating one single mammoth function which you will have a lot of fun to debug or alternatively you will have millions of if/else or switch statement to capture all the special cases). - It also has nasty side effect of resetting the attribute sequence numbers and even reshuffling them completely which is plain WRONG but considering we overwrite the journal with 0xff bytes not a too big problem. btw. We really need to delete the $Extend\$UsnJrnl file when writing to the partition or that could screw us badly, too, but deletion is not implemented yet at all.

For example if you are doing a new layout off all attributes from scratch you will need to check every single attribute for being a certain type and for whether it is allowed to be resident or non-resident or both, then you need to check whether there is still enough space in the mft record to add it and if not you have to start from scratch again or you have to edit what you have done so far to make space. - Now if you start editing what you have already created you end up having _all_ the functionality required to handle attributes in place in their mft records AND because you are doing the over the top flush function you have almost all the code duplicated there for doing all the checks etc. - That can only lead to ugly code IMHO, hence why NTFS TNG driver does not make use of this approach and instead works on the mft records directly and never keeps copies of attributes anywhere else.

If I was writing it I would never use the keep everything in separate structures in memory and then write it to the mft record approach but as I said if you write it all and you get it to work without trashing filesystems everytime your code is used then I will use it. It is a valid approach, it's just a coding nightmare AFAICS.

BTW: I would cache by file, not MFT in this case.

So, basically, when you get a file (i.e. a file isn't in the cache,
and you request via ntfs_volume_get_file(), or whatever), it would
something like:

MFT_REF _parse_mft_record_attrs(ntfs_file *file, MFT_RECORD rec)
{
        for [all attributes in mrec] {
                attr = malloc(sizeof(ntfs_attr));
                _attr_parse (attr, &rec + offset);
                _file_add_attr (file, attr);    // add to linked list
        }

        return next_record;
}

ntfs_file *ntfs_file_get_from_disk(ntfs_volume *v, MFT_REF mref) (
{
        ntfs_file       *file;
        MFT_RECORD      rec;

        file = malloc(sizeof(ntfs_file));
        file->dirty = 0;

        ntfs_volume_read_mft_record(v, mref, &rec);
        assert(!rec.base_mft_record);
        _file_parse_mft_record_header(file, &rec);

        while ((mref = _file_parse_mft_record_attrs(file, &rec)))
                ntfs_volume_read_mft_record(v, mref, &rec);

        return file;
}

I don't have any strong attraction to this approach, but it seems
to be easy to implement, and has a nice interface.  I don't think
CPU efficiency is an issue.  IO-wise, it looks ok... it supports
caching, etc.

What about the mft record then? I mean when you are writing back which mft record will you write to? The same one (you have to otherwise you would have to release the previous one and allocate a new one...)? How will you know which one that was?

Also, surely parted will not be working at file level but much deeper below in the inode/mft record level? Or will it not treat files as opaque structures and use them to access the underlying mft records?

For example if the resize requires some data to be moved because it would be left in unallocated space otherwise, how would you do that? You need low level control of cluster allocations, file level access is useless in this context.

Also you will need to rewrite every single run list on the volume by adding/subtracting a fixed delta to every LCN value. - You can't do this at a file level access either.

This is why I don't understand why you want to work on a file level...

My getfile would look like:
{
        is buffer in cache? -> yes: return buffer
        -> no: read from disk into new locked buffer()
        post read mst fixup()
        unlock buffer()
        return buffer()
}

Five lines of code or so (minus error handling).

Huh?  Presumebly, the only way you write to the MFT record is via
syncing a dirty file.  So, you keep the disk in sync with the internal
representation, not the other way around.

No. In my approach there is no internal representation. Disk = internal representation (but copied into a memory cache and mst fixups applied, then on write back, mst fixups are applied, data is committed synchronously and fast mst fixups are applied to make the data readable in the cache again).

ntfs_file_sync() would look something like:

BOOL ntfs_file_sync(ntfs_file* file)
{
        if (!file->dirty) return TRUE;

        for [attributes] {
                // syncs the stream, and creates the ATTR_REC
                ntfs_attr_sync (attr);
        }

        for [attributes] {
                if (pos + attr->a_rec->length > mft_rec_size) {
                        // write this record
                        // create / use an existing extension MFT rec
                }
                memcpy (mft_rec + pos, attr->a_rec, attr->a_rec->length);
        }
        // write the record

        file->dirty = 0;
}

my file sync would look like:

        for (all mft records owned by file) {
                lock mft record cached copy()
                pre write mst_fixup buffer()
                write to disk()
                fast post write mft fixup buffer()
                unlock buffer()
        }

Simple, only 6 lines of code (minus error handling).

I'm not convinced we want this [un]map() thing on records.  Records
aren't accessed on their own, except in the context of a file.  Files
are the atomic pieces... So, I think we should just have {read,write}
on records, and [un]map() on files, although I've called it get()
here.  (unmap() is just free() on a clean file)

They are in my implementation... Files have nothing to mft records. Directories are mft records, too. Then there are empty mft records not representing files, then there are the system files which need special handling in parts so you can't just treat them as files, especially the MFT itself is a file containing itself! Everything to do with the mft needs special handling.

> The user space side doesn't have to work this way. It just
> would be nice if the two are consistent from a maintenance point of view...
> or at least that they are as consistent as possible.

Yeah, I can understand this.  But, I think the kernel code is going to
be a nightmare here... (there's a reason it hasn't be written yet ;)

Yes, probably. It would basically require an implementation of the kernels VFS and memory management in userspace... - I was thinking of misusing UML (user mode linux) or parts of it if I can for this.

> Having said all that: if you were to implement something different and it
> works nicely than I would have no problem with using it. I would much
> rather use something that doesn't fit my design but DOES work nicely and
> has been written already rather than not have anything because I have not
> time to write it myself!

That's very open-minded of you :)

Well, that's how I am... (-;

That still stands. Even if you go with the approach I don't like I am willing to accept it...

Anton


--
  "Nothing succeeds like success." - Alexandre Dumas
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]