bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#30116: [PATCH] `substitute' crashes when file contains NUL character


From: Maxim Cournoyer
Subject: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))
Date: Sat, 16 Jun 2018 12:47:01 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)

Hi Mark,

Mark H Weaver <address@hidden> writes:

> Hi Maxim,
>
> Thanks for working on this.  I found a problem with this patch,
> and I also have a suggestion.  Please see below.
>
> Maxim Cournoyer <address@hidden> writes:
>
>> From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001
>> From: Maxim Cournoyer <address@hidden>
>> Date: Sun, 14 Jan 2018 20:31:33 -0500
>> Subject: [PATCH] utils: Prevent substitute from crashing on files containing
>>  NUL chars.
>>
>> Fixes issue #30116.
>>
>> * guix/build/utils.scm (substitute): Add condition to skip lines containing
>> the NUL character.
>> ---
>>  guix/build/utils.scm | 44 ++++++++++++++++++++++++++------------------
>>  1 file changed, 26 insertions(+), 18 deletions(-)
>>
>> diff --git a/guix/build/utils.scm b/guix/build/utils.scm
>> index 7391307c8..975f4e70a 100644
>> --- a/guix/build/utils.scm
>> +++ b/guix/build/utils.scm
>> @@ -3,6 +3,7 @@
>>  ;;; Copyright © 2013 Andreas Enge <address@hidden>
>>  ;;; Copyright © 2013 Nikita Karetnikov <address@hidden>
>>  ;;; Copyright © 2015 Mark H Weaver <address@hidden>
>> +;;; Copyright © 2018 Maxim Cournoyer <address@hidden>
>>  ;;;
>>  ;;; This file is part of GNU Guix.
>>  ;;;
>> @@ -621,28 +622,35 @@ PROC as (PROC LINE MATCHES); PROC must return the line 
>> that will be written as
>>  a substitution of the original line.  Be careful about using '$' to match 
>> the
>>  end of a line; by itself it won't match the terminating newline of a line."
>>    (let ((rx+proc  (map (match-lambda
>> -                        (((? regexp? pattern) . proc)
>> -                         (cons pattern proc))
>> -                        ((pattern . proc)
>> -                         (cons (make-regexp pattern regexp/extended)
>> -                               proc)))
>> +                         (((? regexp? pattern) . proc)
>> +                          (cons pattern proc))
>> +                         ((pattern . proc)
>> +                          (cons (make-regexp pattern regexp/extended)
>> +                                proc)))
>>                         pattern+procs)))
>>      (with-atomic-file-replacement file
>>        (lambda (in out)
>>          (let loop ((line (read-line in 'concat)))
>> -          (if (eof-object? line)
>> -              #t
>> -              (let ((line (fold (lambda (r+p line)
>> -                                  (match r+p
>> -                                    ((regexp . proc)
>> -                                     (match (list-matches regexp line)
>> -                                       ((and m+ (_ _ ...))
>> -                                        (proc line m+))
>> -                                       (_ line)))))
>> -                                line
>> -                                rx+proc)))
>> -                (display line out)
>> -                (loop (read-line in 'concat)))))))))
>> +          (cond
>> +           ((eof-object? line)
>> +            #t)
>> +           ((string-contains line (make-string 1 #\nul))
>> +            ;; The regexp functions of the GNU C library (which Guile uses)
>> +            ;; cannot deal with NUL characters, so skip to the next line.
>> +            (format #t "skipping line with NUL characters: ~s\n" line)
>> +            (loop (read-line in 'concat)))
>
> This code will unconditionally *delete* all lines that contain NULs.
>
> This will happen because the lines with NULs are not being written to
> the output file, which will replace the original file when this loop
> reaches EOF.  So, any lines that are not copied to the output will be
> lost.
>
> To preserve the lines with NULs, you should call (display line out)
> before calling 'loop'.

Good observation! I agree that we should keep limit the effect of
ignoring NULs only to the substitution.

> Also, please use (string-index line #\nul) to check for NULs instead of
> 'string-contains'.  It should be more efficient.

OK!

>
>> +           (else
>> +            (let ((line (fold (lambda (r+p line)
>> +                                (match r+p
>> +                                  ((regexp . proc)
>> +                                   (match (list-matches regexp line)
>> +                                     ((and m+ (_ _ ...))
>> +                                      (proc line m+))
>> +                                     (_ line)))))
>> +                              line
>> +                              rx+proc)))
>> +              (display line out)
>> +              (loop (read-line in 'concat))))))))))
>
> With the changes suggested above, I would have no objection to pushing
> this to core-updates.  However, it occurs to me that we could handle the
> NUL case in a better way:
>
> Since the C regex functions that we use cannot handle NUL bytes, we
> could use a different code point to represent NUL during those
> operations.  We could choose a code point from one of the Unicode
> Private Use Areas <https://en.wikipedia.org/wiki/Private_Use_Areas> that
> does not occur in the string.
>
> Let NUL* be the code point which will represent NUL bytes.  First
> replace all NULs with NUL*s, then perform the substitutions, and finally
> replace all ALT*s with NULs before writing to the output.

Do I understand this transformation as NULs -> NUL*s and back from NUL*s
-> NULs correctly? I'm not sure how NUL*s became ALT*s in your explanation.

> As an important optimization, we should avoid performing these extra
> operations unless (string-index line #\nul) finds a NUL.

OK.

> We could then perform these extra substitutions with simple, inefficient
> code.  Maybe something like this (untested):
>
>     (with-atomic-file-replacement file
>       (lambda (in out)
>         (let loop ((line (read-line in 'concat)))
>           (if (eof-object? line)
>               #t
>               (let* ((nul* (or (and (string-index line #\nul)
>                                     (unused-private-use-code-point line))
>                                #\nul))
>                      (line* (replace-char #\nul nul* line))
>                      (line1* (fold (lambda (r+p line)
>                                      (match r+p
>                                        ((regexp . proc)
>                                         (match (list-matches regexp line)
>                                           ((and m+ (_ _ ...))
>                                            (proc line m+))
>                                           (_ line)))))
>                                    line*
>                                    rx+proc))
>                      (line1 (replace-char nul* #\nul line1*)))
>                 (display line1 out)
>                 (loop (read-line in 'concat)))))))))
>
>
> Where the following additional private procedures would be added to
> (guix build utils) above the definition for 'substitute':
>
> (define (unused-private-use-code-point s)
>   "Find a code point within a Unicode Private Use Area that is not
> present in S, and return the corresponding character object.  If one
> cannot be found, return false."
>   (define (scan lo hi)
>     (and (<= lo hi)
>          (let ((c (integer->char lo)))
>            (if (string-index s c)
>                (scan (+ lo 1) hi)
>                c))))
>   (or (scan   #xE000   #xF8FF)
>       (scan  #xF0000  #xFFFFD)
>       (scan #x100000 #x10FFFD)))
>
> (define (replace-char c1 c2 s)
>   "Return a string which is equal to S except with all instances of C1
> replaced by C2.  If C1 and C2 are equal, return S."
>   (if (char=? c1 c2)
>       s
>       (string-map (lambda (c)
>                     (if (char=? c c1)
>                         c2
>                         c))
>                   s)))
>
> What do you think?

It raises the complexity level a bit for something which doesn't seem to
be a very common scenario, but otherwise seems a very elegant
workaround. It seems to me that your implementation is already pretty
complete. I'll try write a test for validating it and report back.

Thank you for sharing your ideas!

Maxim





reply via email to

[Prev in Thread] Current Thread [Next in Thread]