[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing CSV files

From: Ken Anderson
Subject: Re: Parsing CSV files
Date: Thu, 02 Oct 2003 09:30:34 -0400

This is an interesting approach, however it assumes that each field is followed 
by a delimiter.  In the CSV format that EXCEL uses, the end of a field is also 
indicated by the end of line.  Also, in EXCEL, a field that contains a 
delimiter will be wrapped in double quotes, like "this, and that", and a double 
quote is escaped by doubling it.

Here's an approach i use:

(define (csv-read port delimiter cell-action row-action)
  (define (!)
    (let ((c (read-char port)))
  (define k1 (lambda () (state (!))))
  (define k2 (lambda () (row-action k1)))
  (define (give-cell b k) (cell-action (list->string (reverse b)) k))
  (define (state c)
    (cond ((eqv? c delimiter) (cell-action "" k1))
          ((eqv? c #\") (state-string (!) '()))
          ((eqv? c #\newline) (row-action k1))
          ((eof-object? c) #t)
          (else (state-any c '()))))
  (define (state-string c b)
    (cond ((eqv? c #\") (state-string-quote (!) b))
          ((not (eof-object? c)) (state-string (!) (cons c b)))))
  (define (state-string-quote c b)
    (cond ((eqv? c #\") (state-string c (cons c b))) ; Escaped double quote.
          ((eqv? c delimiter) (give-cell b k1))
          ((eqv? c #\newline) (give-cell b k2))
          ((eof-object? c)    (give-cell b k2))
          (else (error "Single double quote at unexpected place."))))
  (define (state-any c b)
    (cond ((eqv? c delimiter) (give-cell b k1))
          ((eqv? c #\newline) (give-cell b k2))
          ((eof-object? c)    (give-cell b k2))
          (else (state-any (!) (cons c b)))))
  (state (!)))

This uses continuation passing style to separate the parsing from what
the user does with each cell and row.

(cell-action value k) is called with a value of the next cell and a
continuation, k to resume the computation.

(row-action k) is called at the end of a row, also with a continuation.

The state... procedures are a tail recursive finite state machine.

Here's an example of converting a csv file to a string of HTML:

(define (csv->html port)
  (let ((result '("<html><table><tr>")))
    (csv-read port #\,
              (lambda (value k)
                (set! result (cons "</td>" (cons value (cons "<td>" result))))
              (lambda (k)
                (set! result (cons "</tr><tr>" result))
    (apply string-append (reverse (cons "</html>" result)))))


At 11:23 PM 9/30/2003 +0200, Wolfgang Jaehrling wrote:
>Hi there!
>For those of you who want to read some interesting code, here is a
>program to parse a file in CSV (Comma Separated Value) format.  I
>think it shows how one should use Scheme, but some might say it goes a
>bit too far... (and I'd like to receive comments on this topic.)
>Note `READ-TABLE' can be called with the source port as argument, or
>without an argument to use the current input port.
>;; Reading a table from a port where it resides in CSV format.
>;; Copyright (C) 2003 Wolfgang Jährling <address@hidden>
>;; This program is free software; you can redistribute it and/or modify
>;; it under the terms of the GNU General Public License as published by
>;; the Free Software Foundation; either version 2 of the License, or
>;; (at your option) any later version.
>;; This program is distributed in the hope that it will be useful,
>;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>;; GNU General Public License for more details.
>(define field-delimiter #\,)
>;; Return a procedure that calls CONSUMER with three arguments: The
>;; value returned by the PRODUCER applied to the procedures arguments,
>;; a list that is initially empty, and a thunk to restart this process
>;; with the value given by the PRODUCER added at the beginning of the
>;; list given to the CONSUMER.
>(define (collectrec producer consumer)
>  (lambda args
>    (letrec ((loop (lambda (lst)
>                     (let ((x (apply producer args)))
>                       (consumer x lst (lambda ()
>                                        (loop (cons x lst))))))))
>      (loop '()))))
>;; Read and return a field, that ends with the configured delimiter
>;; character, or return false at the end of a line, or the eof-object
>;; at end of file.
>(define read-field
>  (collectrec read-char
>              (lambda (c chars loop)
>                (cond ((eof-object? c) c)
>                      ((char=? c field-delimiter)
>                       (apply string (reverse chars)))
>                      ((char=? c #\newline) #f)
>                      (else (loop))))))
>;; Read a line and split it up into a list of fields which gets
>;; returned, or false at the end of the file.
>(define read-row
>  (collectrec read-field
>              (lambda (f fields loop)
>                (cond ((not f) (reverse fields))
>                      ((eof-object? f) #f)
>                      (else (loop))))))
>;; Read a table and return it as a list of rows, each row being a list
>;; of fields, which are strings.
>(define read-table
>  (collectrec read-row
>              (lambda (r rows loop)
>                (if (not r)
>                    (reverse rows)
>                  (loop)))))
>;;;; End of code. ;;;;
>(define eq? (lambda (x y) #t))  ;; How could it be otherwise?
>Guile-user mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]