[bug-mailutils] iterating through large mailbox: memory consumption

From: Robby Villegas
Subject: [bug-mailutils] iterating through large mailbox: memory consumption
Date: Thu, 3 Mar 2005 12:06:01 -0600

I am writing a program that reads all messages in a mailbox in
sequence, say to convert the mailbox into a structured representation,
or to compile on-the-fly statistics regarding headers, or similar

One of the mailboxes contains spam sent to a domain.  The sample I
have is a Unix mbox file with 45,000 messages.  Iterating through the
messages with

  mailbox_messages_count(mboxObj, &numMessages)

  for (m = 1; m <= numMessages; ++m) {
    mailbox_get_message(mboxObj, m, &msgObj);
      /* ... some code ... */
    message_destroy(&msgObj, message_get_owner(msgObj));

consumes close to 300 MB of RAM on my machine.

Is there a way to go through the messages one-by-one without using
memory proportional to the total file size?  Maybe I'm doing something

I looked at the supplied frm.c, and also ran frm on my big mailbox,
but found that it, also, consumes lots of memory.

Robby Villegas

P.S.  Actually, the count alone, mailbox_messages_count(mboxObj,
&numMessages), consumes this much memory.  Seeking to a message at the
end with, say mailbox_get_message(mboxObj, 45000 &msgObj), without
computing the count first (since I know that 45000 is valid here),
also consumes the memory.

