Porter-2 Stemmer for English
Intended to be an implementation of the [Porter Stemming Algorithm]
for English.
[Porter Stemming Algorithm]
This package exports a single function `stem' which will reduce
inflectional forms and affixes for English words by a heuristic
│ STEMMER> (stem "adverserial")
│ "adverseri"
│ STEMMER> (stem "disjointed")
│ "disjoint"
│ STEMMER> (stem "hangings")
│ "hang"
This originated as a port of the [Snowball Go module] (MIT
licensed). It has been trimmed down and modified to the point that it
might be recognizable if you squint.
[Snowball Go module] https://github.com/kljensen/snowball
Based on the published vocabularies this implementation produces the
following discrepancies:
Input Output Canonical
"'" "" "'"
"''" "" "''"
"'a" "a" "'a"
"'s" "s" "'s"
"a'" "a" "a'"
This results from the (perceived) ambiguity in the handling of
apostrophes between other implementations and the written descriptions
within [the documentation]. These discrepancies represent 0.016997%
error in the included test corpus.
[the documentation] http://snowball.tartarus.org/texts/apostrophe.html