bug-parted
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Linux-NTFS-Dev] NTFS resizer


From: Andrew Clausen
Subject: Re: [Linux-NTFS-Dev] NTFS resizer
Date: Thu, 16 Aug 2001 18:25:36 +1000
User-agent: Mutt/1.2.5i

Hi,

Sorry for taking so long... non-maskable interrupts (maintainence!)

On Thu, Aug 09, 2001 at 02:04:37AM +0100, Anton Altaparmakov wrote:
> At 01:18 09/08/2001, Andrew Clausen wrote:
> >On Wed, Aug 08, 2001 at 12:45:22PM +0100, Anton Altaparmakov wrote:
> > > You have a structure: the mft record. It contains all attributes nicely
> > > sorted.
> >
> >It contains the on-disk attributes, not ntfs_attr.
> 
> That's my point. There should be no such thing as ntfs_attr. The on disk 
> representation is entirely sufficient to do everything.

I guess this is ok with lots of accessor functions, etc.
(These are necessary for endianness, and making things like
accesing the name easier)

> >I think attributes need to be abstracted from MFT records, because
> >they may need to be shuffled around between MFTs.  (Eg: an attribute
> >gets inserted, or a run-list grows...)
> 
> Then space is made for this attribute and it is inserted. mkntfs even goes 
> as far as CREATING the attributes in their correct place in the mft record 
> itself, i.e. no memcpy involved for that.
> 
> This is the only way to catch when attributes become too big and need to be 
> made non-resident or when they have to be moved out to other mft records.

Why "the only way"?  BTW: when you need to move an attribute into
another mft record... you might be able to move it into a record with
lots of free space, etc.

I would have thought this would be easy to do at "clean" time.

> >I was thinking this shuffling should happen when the file is synced
> >to disk.  (ntfs_file_sync()?)
> 
> This is a possible approach and is indeed what the old NTFS driver does 
> (very badly so it trashes your fs most of the time...). I hate that 
> approach (but I would accept it, if it were written properly, which I think 
> is extremely hard to do because you are creating one single mammoth 
> function which you will have a lot of fun to debug or alternatively you 
> will have millions of if/else or switch statement to capture all the 
> special cases).

I like this approach for exactly the opposite reason: it has fewer
special cases, and is much more elegant (almost trivial!)

Not single mammoth, but single miniature function...

Maybe I'm not understanding something.

> - It also has nasty side effect of resetting the attribute 
> sequence numbers and even reshuffling them completely which is plain WRONG 
> but considering we overwrite the journal with 0xff bytes not a too big 
> problem.

why?  (BTW: the update is completely atomic)

(Why is shuffling wrong?)  I'm new to all of this!

> btw. We really need to delete the $Extend\$UsnJrnl file when 
> writing to the partition or that could screw us badly, too, but deletion is 
> not implemented yet at all.

Why?

> For example if you are doing a new layout off all attributes from scratch 
> you will need to check every single attribute for being a certain type and 
> for whether it is allowed to be resident or non-resident or both, then you 
> need to check whether there is still enough space in the mft record to add 
> it and if not you have to start from scratch again or you have to edit what 
> you have done so far to make space.

Ah, the knapsack problem strikes again (run into this a bit in
partitioning!).  Starting "from scratch" is no worse.  It's NP hard either
way.

> - Now if you start editing what you 
> have already created you end up having _all_ the functionality required to 
> handle attributes in place in their mft records AND because you are doing 
> the over the top flush function you have almost all the code duplicated 
> there for doing all the checks etc.

I don't see why... it still seems rather trivial to me.

> What about the mft record then? I mean when you are writing back which mft 
> record will you write to? The same one (you have to otherwise you would 
> have to release the previous one and allocate a new one...)? How will you 
> know which one that was?

No problem since when writing to the attribute, there is no allocation.
So, when you rearrange the MFT's MFT records, they just move, but this
doesn't hinder writing the MFT's MFT records.

> Also, surely parted will not be working at file level but much deeper below 
> in the inode/mft record level? Or will it not treat files as opaque 
> structures and use them to access the underlying mft records?

Well inode == file NOT mft, IMHO.  It should work at the inode level, yes.

> For example if the resize requires some data to be moved because it would 
> be left in unallocated space otherwise, how would you do that?  You need low 
> level control of cluster allocations, file level access is useless in this 
> context.

Well, in the first pass (traversing all inodes), it marks all blocks
(in a big in-memory bitmap) that need to be copied/relocated.  (This
includes the above-mentioned blocks, and also the MFT, for doing the
atomic update trick)

So, when copying blocks to free space, we need to allocate clusters, yes.
No big deal.

> Also you will need to rewrite every single run list on the volume by 
> adding/subtracting a fixed delta to every LCN value. - You can't do this at 
> a file level access either.

File == inode.  Files/Inode's have attributes, not MFT records.

> This is why I don't understand why you want to work on a file level...
> 
> My getfile would look like:
> {
>          is buffer in cache? -> yes: return buffer

what is buffer?

> my file sync would look like:
> 
>          for (all mft records owned by file) {
>                  lock mft record cached copy()
>                  pre write mst_fixup buffer()
>                  write to disk()
>                  fast post write mft fixup buffer()
>                  unlock buffer()
>          }
> 
> Simple, only 6 lines of code (minus error handling).

But doesn't handle run lists overflowing.

> >I'm not convinced we want this [un]map() thing on records.  Records
> >aren't accessed on their own, except in the context of a file.  Files
> >are the atomic pieces... So, I think we should just have {read,write}
> >on records, and [un]map() on files, although I've called it get()
> >here.  (unmap() is just free() on a clean file)
> 
> They are in my implementation... Files have nothing to mft records. 
> Directories are mft records, too.

Directories are files.  Maybe this is the source of our misunderstanding.
I'm saying file == inode == "set of MFT records".

I'll call it inode if that sounds better.  (Just, I thought that was
NTFS terminology... sorry!)

Andrew




reply via email to

[Prev in Thread] Current Thread [Next in Thread]