sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: weird value flow question


From: Assaf Gordon
Subject: Re: weird value flow question
Date: Sat, 3 Mar 2018 13:04:35 -0700
User-agent: Mutt/1.5.24 (2015-08-30)

Hello Timotej,

On Sat, Mar 03, 2018 at 01:48:04AM +0000, Kapus, Timotej wrote:
> This is a bit of a weird request for help.

Indeed a weird request, especially without additional context
or explanation of what you are trying to achieve.
See http://xyproblem.info/ .

Are you looking for a vulnerability that can be triggered
from external input?

> I'm trying to figure out if there is a way for a value to 
> flow from either a command line argument or a file or 
> stdin, to the return value of calc_state_hash in 
> regex_internal.c .

Before even starting, be aware that if you are using recent
sed on recent gnu/linux systems, that code is not used at all.
sed on systems with modern glibc use glibc's regex code (with the
exception of the faster DFA implementation).
The files you are looking at (regex_internal.c) are part of gnulib:
the GNU portability library. These are used on systems where
an adaquate regex implementation is not found.

To rebuild sed with the gnulib regex implementation, use:
   ./configure --with-included-regex

The gnulib implemenation is very similar to glibc's,
but there could be some differences.

> [...] But I failed to find an convincing relation between the DFA 
> nodes and re_node_set, but from the names I would assume 
> there is one.
>
> [...] 
> 
> So to perhaps a more precise question I should be asking is 
> what is contained in re_node_set->elements and can it be 
> influenced by something from the outside of the program?
>
> [...]
> 
> I know it's a bit of a long shoot, but I would appreciate 
> any help with this.

I can suggest the following:

First,
Use OpenGrok server at 
https://opengrok.housegordon.com/source/
to view and search the code (select both "sed" and "gnulib" as projects).
This should allow faster and more intuitive exploration of the code.


Second,
Compile sed with debug information and without optimizations:

  ./configure --with-included-regex CFLAGS="-O0 -g"
  make clean
  make

Then use gdb (or another debugger) to examine the nodes.

For example:

  $ gdb ./sed/sed
  (gdb) b calc_state_hash 
  Breakpoint 1 at 0x41cd6d: file lib/regex_internal.c, line 1449.

  (gdb) r 's/A*/B/' /dev/null
  Starting program: /home/gordon/projects/sed/sed/sed 's/A*/B/' /dev/null
  Breakpoint 1, calc_state_hash (nodes=0x7fffffffd980, context=0)
     at lib/regex_internal.c:1449
  1449      re_hashval_t hash = nodes->nelem + context;

  (gdb) p *nodes
  $1 = {alloc = 3, nelem = 3, elems = 0x63e190}

The more kleene closures in your regex - the more nodes, e.g.:

  (gdb) r 's/A*B*/B/' /dev/null                                        
  Breakpoint 1, calc_state_hash (nodes=0x7fffffffd980, context=0)
    at lib/regex_internal.c:1449
  1449      re_hashval_t hash = nodes->nelem + context;
  (gdb) p *nodes
  $3 = {alloc = 5, nelem = 5, elems = 0x63e3a0}



Third,
After you are more familiar with the code and operation,
consider asking follow-up questions at address@hidden ,
where the maintainers of this module will be in a better
position to help.

regards,
 - assaf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]