Parser for a subset of XML, written in Standard ML
Bitbucket -> Sourcehut
66babaa281dc — Chris Cannam 2 years ago
Fix links
ee71a9146f59 — Chris Cannam 2 years ago
Note on CDATA

heads

tip
browse log

clone

read-only
https://hg.sr.ht/~cannam/sml-subxml
read/write
ssh://hg@hg.sr.ht/~cannam/sml-subxml

#SubXml - A parser for a subset of XML, written in Standard ML

https://hg.sr.ht/~cannam/sml-subxml

SubXml is a parser and serialiser for a format resembling XML. It can be used as a minimal parser for small XML configuration or interchange files, so long as they make suitably limited use of XML and so long as that is not expected to change.

The format supported by SubXml consists of the element, attribute, text, CDATA, and comment syntax from XML. It differs from XML in the following ways:

  • The document is assumed to be in an 8-bit "ASCII-compatible" format such as UTF-8; UTF-16 is not supported

  • Processing instructions () are ignored

  • DOCTYPE declarations are ignored; all other declarations (<! ... >) are rejected except for CDATA, which is handled properly

  • Character and entity references (&-escapes) have no special status and are just passed through literally

Note that although the parser is limited, it is not forgiving -- anything it can't understand is rejected with a clear error message. But because of its super-limited design, it will both accept many documents that are not well-formed XML, and reject many that are.

An equally simplistic serialiser is also provided.

For a proper XML parser in SML, consider fxp, mirrored at https://github.com/cannam/fxp.

Copyright 2018 Chris Cannam. MIT/X11 licence. See the file COPYING for details.