guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Do not scan for coding declarations in open-file


From: Mark H Weaver
Subject: [PATCH] Do not scan for coding declarations in open-file
Date: Thu, 31 Jan 2013 00:06:24 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux)

Andy Wingo <address@hidden> writes:
> On Tue 15 Jan 2013 10:32, address@hidden (Ludovic Courtès) writes:
>> As usual, backward-compatibility gives us an incentive not to change
>> anything in 2.0.  But perhaps we should change that in 2.2.
>>
>> Thoughts?
>
> IMO we should update the docs and leave it as it is, though I don't care
> much.
>
> Mark?

My position is that the current coding-auto-detection behavior of
'open-file' is likely to lead to security flaws in software built using
Guile.  The issue is that programs that receive text from an untrusted
source, write those strings to a file, and then read them back in, is
potentially vulnerable to hostile coding declarations inserted within
those strings.

I feel strongly that this security issue is important enough to risk the
possibility that someone's code might have become dependent upon the
guesswork in 'open-file'.

While it is possible to work around this problem in various non-portable
ways, it is unlikely that most people would think to apply such
workarounds.  Frankly, I find it quite embarrassing that our basic
low-level text I/O is not robust.

I've attached a patch to fix this.  Please consider it.

  Thoughts?
    Mark


>From f936b2553c809967fd6703d8ec8fc9a7ef7bd5af Mon Sep 17 00:00:00 2001
From: Mark H Weaver <address@hidden>
Date: Wed, 30 Jan 2013 14:45:28 -0500
Subject: [PATCH] Do not scan for coding declarations in open-file.

* libguile/fports.c (scm_open_file): Do not scan for coding
  declarations.  Replace 'use_encoding' local variable with
  'binary'.  Update documentation string.

* module/ice-9/psyntax.scm (include): Add the same file-encoding
  logic that's used in compile-file and scm_primitive_load.

* module/ice-9/psyntax-pp.scm: Regenerate.

* doc/ref/api-io.texi (File Ports): Update docs.

* test-suite/tests/ports.test: Change "open-file HONORS file coding
  declarations" test to "open-file IGNORES file coding declaration".

* test-suite/tests/coding.test (scan-coding): Use 'file-encoding' to
  scan for the encoding, since 'open-input-file' no longer does so.
---
 doc/ref/api-io.texi          |   13 +++----------
 libguile/fports.c            |   33 ++++++++-------------------------
 module/ice-9/psyntax-pp.scm  |   10 ++++++----
 module/ice-9/psyntax.scm     |   13 +++++++++----
 test-suite/tests/coding.test |    4 ++--
 test-suite/tests/ports.test  |    5 ++---
 6 files changed, 30 insertions(+), 48 deletions(-)

diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 11ae580..fca3875 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -1,7 +1,7 @@
 @c -*-texinfo-*-
 @c This is part of the GNU Guile Reference Manual.
 @c Copyright (C)  1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009,
address@hidden   2010, 2011  Free Software Foundation, Inc.
address@hidden   2010, 2011, 2013  Free Software Foundation, Inc.
 @c See the file guile.texi for copying conditions.
 
 @node Input and Output
@@ -884,8 +884,8 @@ Use binary mode, ensuring that each byte in the file will 
be read as one
 Scheme character.
 
 To provide this property, the file will be opened with the 8-bit
-character encoding "ISO-8859-1", ignoring any coding declaration or port
-encoding.  @xref{Ports}, for more information on port encodings.
+character encoding "ISO-8859-1", ignoring the default port encoding.
address@hidden, for more information on port encodings.
 
 Note that while it is possible to read and write binary data as
 characters or strings, it is usually better to treat bytes as octets,
@@ -902,13 +902,6 @@ because of its port encoding ramifications.
 If a file cannot be opened with the access
 requested, @code{open-file} throws an exception.
 
-When the file is opened, this procedure will scan for a coding
-declaration (@pxref{Character Encoding of Source Files}). If a coding
-declaration is found, it will be used to interpret the file.  Otherwise,
-the port's encoding will be used.  To suppress this behavior, open the
-file in binary mode and then set the port encoding explicitly using
address@hidden
-
 In theory we could create read/write ports which were buffered
 in one direction only.  However this isn't included in the
 current interfaces.
diff --git a/libguile/fports.c b/libguile/fports.c
index 10cf671..c3af8b4 100644
--- a/libguile/fports.c
+++ b/libguile/fports.c
@@ -1,5 +1,6 @@
-/* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
- *   2004, 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, 
Inc.
+/* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001,
+ *   2002, 2003, 2004, 2006, 2007, 2008, 2009, 2010, 2011,
+ *   2012, 2013 Free Software Foundation, Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
@@ -371,8 +372,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
            "@item b\n"
            "Open the underlying file in binary mode, if supported by the 
system.\n"
            "Also, open the file using the binary-compatible character 
encoding\n"
-           "\"ISO-8859-1\", ignoring the port's encoding and the coding 
declaration\n"
-           "at the top of the input file, if any.\n"
+           "\"ISO-8859-1\", ignoring the default port encoding.\n"
            "@item +\n"
            "Open the port for both input and output.  E.g., @code{r+}: open\n"
            "an existing file for both input and output.\n"
@@ -387,11 +387,6 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
            "Add line-buffering to the port.  The port output buffer will be\n"
            "automatically flushed whenever a newline character is written.\n"
            "@end table\n"
-           "When the file is opened, this procedure will scan for a coding\n"
-           "address@hidden Encoding of Source Files}. If present\n"
-           "will use that encoding for interpreting the file.  Otherwise, 
the\n"
-           "port's encoding will be used.\n"
-           "\n"
            "In theory we could create read/write ports which were buffered\n"
            "in one direction only.  However this isn't included in the\n"
            "current interfaces.  If a file cannot be opened with the access\n"
@@ -399,7 +394,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
 #define FUNC_NAME s_scm_open_file
 {
   SCM port;
-  int fdes, flags = 0, use_encoding = 1;
+  int fdes, flags = 0, binary = 0;
   unsigned int retries;
   char *file, *md, *ptr;
 
@@ -434,7 +429,7 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
          flags = (flags & ~(O_RDONLY | O_WRONLY)) | O_RDWR;
          break;
        case 'b':
-         use_encoding = 0;
+         binary = 1;
 #if defined (O_BINARY)
          flags |= O_BINARY;
 #endif
@@ -473,20 +468,8 @@ SCM_DEFINE (scm_open_file, "open-file", 2, 0, 0,
   port = scm_i_fdes_to_port (fdes, scm_i_mode_bits (mode),
                              fport_canonicalize_filename (filename));
 
-  if (use_encoding)
-    {
-      /* If this file has a coding declaration, use that as the port
-        encoding.  */
-      if (SCM_INPUT_PORT_P (port))
-       {
-         char *enc = scm_i_scan_for_encoding (port);
-         if (enc != NULL)
-           scm_i_set_port_encoding_x (port, enc);
-       }
-    }
-  else
-    /* If this is a binary file, use the binary-friendly ISO-8859-1
-       encoding.  */
+  if (binary)
+    /* Use the binary-friendly ISO-8859-1 encoding. */
     scm_i_set_port_encoding_x (port, NULL);
 
   scm_dynwind_end ();
diff --git a/module/ice-9/psyntax-pp.scm b/module/ice-9/psyntax-pp.scm
index 139c02b..032034e 100644
--- a/module/ice-9/psyntax-pp.scm
+++ b/module/ice-9/psyntax-pp.scm
@@ -2968,10 +2968,12 @@
          (read-file
            (lambda (fn dir k)
              (let ((p (open-input-file (if (absolute-path? fn) fn (in-vicinity 
dir fn)))))
-               (let f ((x (read p)) (result '()))
-                 (if (eof-object? x)
-                   (begin (close-input-port p) (reverse result))
-                   (f (read p) (cons (datum->syntax k x) result))))))))
+               (let ((enc (file-encoding p)))
+                 (set-port-encoding! p (let ((t enc)) (if t t "UTF-8")))
+                 (let f ((x (read p)) (result '()))
+                   (if (eof-object? x)
+                     (begin (close-input-port p) (reverse result))
+                     (f (read p) (cons (datum->syntax k x) result)))))))))
         (let ((src (syntax-source x)))
           (let ((file (if src (assq-ref src 'filename) #f)))
             (let ((dir (if (string? file) (dirname file) #f)))
diff --git a/module/ice-9/psyntax.scm b/module/ice-9/psyntax.scm
index 4abd3c9..d3e2616 100644
--- a/module/ice-9/psyntax.scm
+++ b/module/ice-9/psyntax.scm
@@ -2940,10 +2940,15 @@
 
     (define read-file
       (lambda (fn dir k)
-        (let ((p (open-input-file
-                  (if (absolute-path? fn)
-                      fn
-                      (in-vicinity dir fn)))))
+        (let* ((p (open-input-file
+                   (if (absolute-path? fn)
+                       fn
+                       (in-vicinity dir fn))))
+               (enc (file-encoding p)))
+
+          ;; Choose the input encoding deterministically.
+          (set-port-encoding! p (or enc "UTF-8"))
+
           (let f ((x (read p))
                   (result '()))
             (if (eof-object? x)
diff --git a/test-suite/tests/coding.test b/test-suite/tests/coding.test
index 4152af8..0a15d93 100644
--- a/test-suite/tests/coding.test
+++ b/test-suite/tests/coding.test
@@ -1,6 +1,6 @@
 ;;;; coding.test --- test suite for coding declarations. -*- mode: scheme -*-
 ;;;;
-;;;; Copyright (C) 2011 Free Software Foundation, Inc.
+;;;; Copyright (C) 2011, 2013 Free Software Foundation, Inc.
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -40,7 +40,7 @@
      ;; relies on the opportunistic filling of the input buffer, which
      ;; doesn't happen after a seek.
      (let* ((port (open-input-file name))
-            (res (port-encoding port)))
+            (res (file-encoding port)))
        (close-port port)
        res))))
 
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index 65c3b3f..5109583 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -269,13 +269,12 @@
                    (delete-file filename)
                    (string=? line2 binary-test-string)))))
 
-;; open-file honors file coding declarations
-(pass-if "file: open-file honors coding declarations"
+;; open-file ignores file coding declaration
+(pass-if "file: open-file ignores coding declarations"
   (with-fluids ((%default-port-encoding "UTF-8"))
                (let* ((filename (test-file))
                       (port (open-output-file filename))
                       (test-string "€100"))
-                 (set-port-encoding! port "ISO-8859-15")
                  (write-line ";; coding: iso-8859-15" port)
                  (write-line test-string port)
                  (close-port port)
-- 
1.7.10.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]