[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk 3.1.5, patch for multibyte locales

From: Andrew J. Schorr
Subject: Re: gawk 3.1.5, patch for multibyte locales
Date: Mon, 24 Jul 2006 16:03:07 -0400
User-agent: Mutt/

Hi Arnold,

On Sun, Jul 23, 2006 at 10:30:56PM +0300, Aharon Robbins wrote:
> The problem with gawk and multibyte locales has come up enough that it's
> time to post a patch.  This is extracted from my current code base.
> It is untested but should do the trick.

Thanks for posting this patch.  I have a few questions.

In node.c:mk_number, you have this patch:

@@ -344,11 +341,7 @@ mk_number(AWKNUM x, unsigned int flags)
        r->stref = 1;
        r->stptr = NULL;
        r->stlen = 0;
-#if defined MBS_SUPPORT
-       r->wstptr = NULL;
-       r->wstlen = 0;
-       r->flags &= ~WSTRCUR;
-#endif /* MBS_SUPPORT */
+       free_wsptr(r);
 #endif /* GAWKDEBUG */
        return r;

That seems to be a typo: it should be free_wstr, not free_wsptr
(but doesn't really matter because it's inside #ifdef GAWKDEBUG).

+/* free_wstr --- release the wide string part of a node */
+free_wstr(NODE *n)
+       if ((n->flags & WSTRCUR) != 0) {
+               assert(n->wstptr != NULL);
+               free(n->wstptr);
+       }
+       n->wstptr = NULL;
+       n->wstlen = 0;
+       n->flags &= ~WSTRCUR;

I'm wondering about the safety of this function.  It seems to me
that this implementation would be safer:

free_wstr(NODE *n)

        if ((n->flags & WSTRCUR) != 0) {
                assert(n->wstptr != NULL);
                n->wstptr = NULL;
                n->wstlen = 0;
                n->flags &= ~WSTRCUR;

My concern here relates to whether it is safe to access
the wstptr field of the NODE union in the event that WSTRCUR
was not set.  As I understand it, the wstptr and wstlen
fields of the union are not defined unless the WSTRCUR flag
is set.  If that flag is not set, then it seems to me that
we have no idea whether the wstptr and wstlen parts of the union
are actually currently allocated for this purpose, unless
we presuppose that this function is never called on a NODE
of the incorrect type (that might use the fields for another
purpose).  For example, as I found in my debugging, I think
the exec_count (sub.nodep.reflags) overlaps with wstptr
(sub.val.wsp) on 64-bit opteron.  So it seems dangerous to
me to define free_wstr in such a way that it depends on the
type of the NODE argument.  But if one takes care to call it
only from safe places, then I suppose it's fine.

My most serious concern about this patch is how it handles
the setting of wstptr strings in field variables.  My patch
to the bug I found involved this fix to node.c:unref:

@@ -508,20 +500,13 @@ unref(register NODE *tmp)
-#if defined MBS_SUPPORT
-                       if (tmp->wstptr != NULL) {
-                               assert((tmp->flags & WSTRCUR) != 0);
-                               free(tmp->wstptr);
-                       }
-                       tmp->flags &= ~WSTRCUR;
-                       tmp->wstptr = NULL;
-                       tmp->wstlen = 0;
+                       free_wstr(tmp);
        if ((tmp->flags & FIELD) != 0) {
+               free_wstr(tmp);

I see that your patch does not include the latter change in unref
(to free the wstr nodes with the FIELD flag set).  As a result,
I suspect that your patch may leave a memory leak in place.  But
this is masked by the changes that you made in field.c to add
calls to free_wstr in various places.  Note that those calls
will never actually free anything (because the NODE flags are
always set prior to calling free_wstr, and the WSTRCUR is never
set, so the free_wstr function merely sets wstptr to NULL).
But they do mask out the memory leak, since the non-NULL wstptr
that was being passed to unref is now getting nulled out by the
calls to free_wstr when the NODE is subsequently allocated from
the free list (and thus the assertion is not being triggered).
Was there a problem with the free_wstr called I added to unref?
That seems to me to be the easiest way to fix the memory leak, although
one could take care to do this inside field.c by adding a call to
free_wstr prior to each call to unref...

When I run valgrind on 3.1.5 plus your patch, I see this:

bash-2.05b$ (echo "test"; echo "foo") | LANG=en_US.UTF-8 valgrind 
--leak-check=full ./gawk '{ if (substr($1,1,1) == substr($0,1,1))               
      print "substr matches"; sub(/foo/,"bar"); print nr++}'
==28750== Memcheck, a memory error detector for x86-linux.
==28750== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==28750== Using valgrind-2.4.1, a program supervision framework for x86-linux.
==28750== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==28750== For more details, rerun with: -v
substr matches
substr matches
==28750== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 15 from 1)
==28750== malloc/free: in use at exit: 17782 bytes in 91 blocks.
==28750== malloc/free: 265 allocs, 174 frees, 61091 bytes allocated.
==28750== For counts of detected errors, rerun with: -v
==28750== searching for pointers to 91 not-freed blocks.
==28750== checked 111856 bytes.
==28750== 88 bytes in 4 blocks are definitely lost in loss record 21 of 32
==28750==    at 0x1B903234: malloc (vg_replace_malloc.c:130)
==28750==    by 0x806C369: str2wstr (node.c:710)
==28750==    by 0x80577BC: do_substr (builtin.c:1387)
==28750==    by 0x807A38C: r_tree_eval (eval.c:991)
==28750==    by 0x807A47D: r_tree_eval (eval.c:1232)
==28750==    by 0x807A98B: eval_condition (eval.c:1355)
==28750==    by 0x8078BF3: interpret (eval.c:482)
==28750==    by 0x8078BC5: interpret (eval.c:477)
==28750==    by 0x8078B34: interpret (eval.c:456)
==28750==    by 0x80655DD: do_input (io.c:457)
==28750==    by 0x8069C01: main (main.c:595)
==28750== LEAK SUMMARY:
==28750==    definitely lost: 88 bytes in 4 blocks.
==28750==      possibly lost: 0 bytes in 0 blocks.
==28750==    still reachable: 17694 bytes in 87 blocks.
==28750==         suppressed: 0 bytes in 0 blocks.
==28750== Reachable blocks (those to which a pointer was found) are not shown.
==28750== To see them, rerun with: --show-reachable=yes

This seems to confirm that memory is still being leaked.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]