The delimiter-separated values format is a superset of CSV (although headers are not currently supported). This implements a DSV parser for guile with a streaming interface and a more convenient port-exhausting or string-reading interface.
Look here. Much better. Modularized. Documented. Much more finite state automata: https://github.com/artyom-poptsov/guile-dsv
(import (vsd)) (define file (open-input-file "csv.csv")) ;; These are all the available options for the procedures in this library. ;; All options below are the standard ones, and do not have to be provided. ;; #:newline can be 'cr, 'lf, 'crlf and 'lax. Lax accepts all other newline ;; characters (define reader (make-dsv-reader file #:delimiter #\, #:newline 'lf #:escape #\")) ;; reader is now a thunk that returns a vector of dsv cells: (reader) ;; => (#("my" "delimited" "data")) ;; When there is no more data to be read #<eof> is returned. (reader) ;; => #<eof> (close-port file) ;; There is also a higher level interface for exhausting data: (dsv-file->list "csv.csv") ;; => (#("my" "delimited" "data")) (call-with-input-file "csv.csv" dsv->list) ;; => (#("my" "delimited" "data")) ;; Both the above procedures (dsv-file->list and dsv->list) take an ;; optional keyword spec as shown for make-dsv-reader
It is slightly faster than guile-csv for CSV files, with the bonus that it actually parses proper CSV files with CRLF line endings. This means a 35mb CSV file is parsed in about 4s using guile 2.9.4. Python is twice as fast, due to it's csv reader being written in optimized and nicely buffered C.
LGPLv3. See the file header.
I was trying my best to use data-type specific comparisons, but apparently eqv? was faster (probably due to fewer type checks in the generated code). That yielded quite a speed increase. I will have to try to find other such nice little speedups.
Enforce length o f rows.
Change the interface to allow composing with call-with-input-xxxx and the likes.
I tried using a bigger string buffer and using the same buffer for each instantiated reader, but that made it run slower than using a new buffer for each line.
Anyway, I would like to write some tests to make sure it outputs correct code. Then I would like to make it fast. After that, I would like to make it pretty.