Values, separated by delimiters. For guile scheme
5d7a4b25ecb8 — Linus Björnstam default tip 5 years ago
There is already a guile-dsv. A very pretty one indeed. Use that instead.
093f103d011a — Linus Björnstam 5 years ago
corrected error in code example
9478628cfa2e — Linus Björnstam 5 years ago
Changed the implementation a bit.

heads

tip
browse log

clone

read-only
https://hg.sr.ht/~bjoli/guile-vsd
read/write
ssh://hg@hg.sr.ht/~bjoli/guile-vsd

#Guile-vsd

The delimiter-separated values format is a superset of CSV (although headers are not currently supported). This implements a DSV parser for guile with a streaming interface and a more convenient port-exhausting or string-reading interface.

#If you want it properly done, there is a better library:

Look here. Much better. Modularized. Documented. Much more finite state automata: https://github.com/artyom-poptsov/guile-dsv

#Documentation

(import (vsd))
(define file (open-input-file "csv.csv"))

;; These are all the available options for the procedures in this library.
;; All options below are the standard ones, and do not have to be provided.
;; #:newline can be 'cr, 'lf, 'crlf  and 'lax. Lax accepts all other newline
;; characters
(define reader (make-dsv-reader file #:delimiter #\, #:newline 'lf #:escape #\"))

;; reader is now a thunk that returns a vector of dsv cells:
(reader) ;; => (#("my" "delimited" "data"))

;; When there is no more data to be read #<eof> is returned.
(reader) ;; => #<eof>
(close-port file)

;; There is also a higher level interface for exhausting data:

(dsv-file->list "csv.csv") ;; => (#("my" "delimited" "data"))

(call-with-input-file "csv.csv" dsv->list) ;; => (#("my" "delimited" "data"))

;; Both the above procedures (dsv-file->list and dsv->list) take an 
;; optional keyword spec as shown for make-dsv-reader

#Speed

It is slightly faster than guile-csv for CSV files, with the bonus that it actually parses proper CSV files with CRLF line endings. This means a 35mb CSV file is parsed in about 4s using guile 2.9.4. Python is twice as fast, due to it's csv reader being written in optimized and nicely buffered C.

#License

LGPLv3. See the file header.

#Todo

I was trying my best to use data-type specific comparisons, but apparently eqv? was faster (probably due to fewer type checks in the generated code). That yielded quite a speed increase. I will have to try to find other such nice little speedups.

Re-add trimming.

Enforce length o f rows.

Change the interface to allow composing with call-with-input-xxxx and the likes.

I tried using a bigger string buffer and using the same buffer for each instantiated reader, but that made it run slower than using a new buffer for each line.

Anyway, I would like to write some tests to make sure it outputs correct code. Then I would like to make it fast. After that, I would like to make it pretty.