avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] avr-g++


From: Joerg Wunsch
Subject: Re: [avr-gcc-list] avr-g++
Date: Tue, 28 May 2002 16:04:14 +0200
User-agent: Mutt/1.2.5i

As Marek Michalkiewicz wrote:

> > Perhaps not even malloc().  avr-libc offers a malloc(), but its
> > implementation as pretty simple and has a number of deficiencies.  (If
> > i get around, i'll try to make it a bit more efficient.)  Apart from
> > that, i've got an application that is using it (a one-shot timer
> > service that allows for many concurrent timers in one program), so i
> > can confirm that malloc() works.
> 
> Good to know that malloc works, as I haven't tested it for a long time.
> Even better to know that you want to improve it :)

OK, i got around to improve it. ;-)

First, the two main problems with the malloc() implementation are:

. Since we've got no hardware memory management, there's no guarantee
  that stack and heap don't collide.  The old implementation tried to
  work around that problem by ensuring that the heap won't grow beyond
  20 bytes below the current stack pointer value.  So all possible
  stack frames of interrupt routines that could interrupt malloc(),
  and all stack frames of functions to be invoked in the future (which
  could possibly recurse much deeper than the function currently
  calling malloc()) need to fit into this 20 byte space reserve.

  This seems to be about the best possible option, however the 20
  bytes have been a "magic" number and could not be influenced by the
  user.  I've tried to improve this by introducing a global variable
  "malloc_margin" (initialized to 20) that can be adjusted by the
  application program in order to modify this space reserve.  This
  adjustment should be done /before/ the very first call to malloc().

  Also, documentation needs to mention these problems.

. The biggest problem with the current implementation was that it
  tends to fragmentize memory.  Once a call to malloc() had ``sliced
  off'' a chunk of memory, the size of this chunk had been ``set in
  stone'' forever, even after free()ing the block again.  It could
  only be re-used, ever, if another malloc() request of the same size
  was issued.

  It turned out that working around this restriction resulted in a
  complete rewrite of malloc.c.  The only line of code that remained
  was the assembler macro to access the current stackpointer
  value. ;-)

  The new implementation maintains freed up space on a freelist.  When
  freeing memory blocks, attempts are made to aggregate the block
  currently being freed with adjacent blocks already on the free list.
  So after freeing all blocks that have once been allocated, the
  entire heap is collected in a single large entry in the free list.

  In order to get a good compromise between execution time and wasted
  memory, the following allocation strategy has been implemented:

  - The freelist is walked, looking for a block that would fit
    exactly.  If one could be found, it is used immediately, since
    there can't be any better match at all.

  - While walking the list in the first step, the size of the smallest
    memory segment that would still fit the request is noticed.  If we
    didn't find an exact match, step 2 will then use the (first) block
    of that size to satisfy the request.  In case the block found was
    only slightly larger than needed (so little that no further
    freelist entry would fit into the remainder), this block is used
    entirely.  Otherwise, the block found will be split into two
    parts, one to be returned to the caller, and another one that
    still remains on the freelist.  For practical reasons, the latter
    will be formed from the lower part of the block, while the upper
    part is returned to the user.  That way, the freelist itself
    doesn't need to be adjusted during this operation, only the new
    size of the current entry needs to be recorded.

  - If step 2 didn't yield a block satisfying the request, this means
    we need to increase the heap in order to obtain more memory.
    That's the point where the current stackpointer value and the
    "malloc_margin" variable mentioned above come into the game: if
    the new allocation will result in increasing the heap still below
    that limit, the request can be satisfied.

  - If this didn't help, we stand no chance, and return NULL to the
    caller.

  Note that the above process of increasing the heap is irreversible;
  memory that has once been "obtained" for the heap won't be
  "returned" to the (non-existing :) environment.  It can only be
  returned to the freelist, available for future allocations.  A
  strategy that will reduce the heap size when freeing memory could be
  implemented, but IMHO there's not much point in having it, compared
  to the additional cost in free() that would be required.  The only
  situation where it would help is one where all dynamically allocated
  memory segments have been free()d again before someone starts a deep
  recursion of functions that requires a lot of stack.

  Even then, the existing implementation would suffer no harm since
  the only four bytes of the freelist in that case are right at
  __bss_end, and trashing memory by the stack that in theory belongs
  to the range described by this freelist entry would cause nothing
  bad to happen.  (The situation is, however, totally different if
  even a single object is malloc()ed since heap re-use starts at
  top of stack, thus collides with the overflowing stack.)

The implementation should be ISO C conformant.  free() tests for a
NULL pointer (ISO C demands this -- the old implementation failed to
implement this).  free()ing already free memory results in ``undefined
behaviour'', namely in a disaster.  Adding checks to prevent lusers
from doing so will increase the complexity of the implementation and
thus the cost of the function calls and the waste of memory needed to
maintain the heap blocks.  Trying to allocate 0 bytes of memory per
ISO C can (optionally) result in a behaviour ``as if the size were
some nonzero value'', namely 2, as it would also be the case for
trying to allocate just a single byte.  We always need 2 bytes space
in order to fit the freelist's "nx" pointer into the block.  Needless
to say, even though ISO C doesn't allow this allocation to be used for
any object, it needs to be freed up anyway.

I think it would be possible to implement realloc() and calloc() as
well, but it should preferably be done in a separate file then since
in particular the implementation of realloc() will consume quite some
space, and not everybody who's going to link malloc() into their
applications wants to pay for the space of realloc().  Unfortunately,
this would preclude the use of static variables for the malloc
internals, so we had to move towards using __xxx variables (reserved
for the ``implementation'') that can be shared between various library
object files then.  The main advantage of realloc over a user-crafted
function is supposedly that it could in theory decide whether the
existing piece of memory is sufficient to fit the additional required
space, so no memcpy() is needed.  However, due to the memory
allocation strategy outlined above, this will only succeed for blocks
of memory that have been (freshly) allocated on top of the heap, since
for re-used space, allocation always happens from the top of the
freelist block to be used.  Thus, realloc() is likely to fall back to
a malloc(new block) -> memcpy(new block, old block) -> free(old block)
strategy after dynamic memory has been used quite a bit in the
application.

If people think this is bad, it could as well be done to split up the
memory blocks in step 2 mentioned above the other way, i. e. return
the lower part to the caller, and maintain the new freelist entry in
the upper part.  On the pro side, this might help a realloc()
implementation to extend a block of memory in-place.  On the con side,
it'll require another freelist walk in order to record the new
freelist entry, since the freelist is a singly-linked list.  (The link
fields need to fit into the allocated space that is normally owned by
the user, so the size of the "nx" field determines the minimal block
size that can be handled by malloc().  Blocks smaller than this need
to be enlarged to that size so they could later fit a complete
freelist entry.  That's why using a doubly-linked list is not a good
option.  Also, maintaining two link pointers requires even more code.)

I've added a small main() function that i've been using for regression
testing the implementation on a host machine.  So far, it looks good,
memory doesn't seem to ``get lost'' or gets fragmentized.  Note that
this test program requires a host machine to run that can access
unaligned pointers since this is what the AVR can do anyway, and i
didn't want to add to the complexity just for the ability to also run
my tests on RISC machines that can't access unaligned pointers.

Ironically, even though the new memory allocation strategy sounds
quite a bit more complex than the old one used to be, the code
required for malloc() is even smaller than it was in the old
version. ;-)  However, of course, the new version of free() which needs
to handle the freelist block aggregation costs a lot more of code than
the old one that just cleared a bit in the freed block, and was done.

I welcome any comments about the code or the implementation strategies
described above.

Marek, if you feel more comfortable with the FSF-approved three-clause
BSD-style copyright, feel free to replace my copyright notice by that
one.  I don't care, i've just added my ~/src/Copyright file on top.

-- 
J"org Wunsch                                           Unix support engineer
address@hidden        http://www.interface-systems.de/~j/

Attachment: newmalloc.c
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]