Porter-2 Stemmer for English
════════════════════════════
Intended to be an implementation of the [Porter Stemming Algorithm]
for English.
[Porter Stemming Algorithm]
http://snowball.tartarus.org/algorithms/english/stemmer.html
Usage
─────
This package exports a single function `stem' which will reduce
inflectional forms and affixes for English words by a heuristic
process.
Example
╌╌╌╌╌╌╌
┌────
│ STEMMER> (stem "adverserial")
│ "adverseri"
│
│ STEMMER> (stem "disjointed")
│ "disjoint"
│
│ STEMMER> (stem "hangings")
│ "hang"
└────
Origin
──────
This originated as a port of the [Snowball Go module] (MIT
licensed). It has been trimmed down and modified to the point that it
might be recognizable if you squint.
[Snowball Go module] https://github.com/kljensen/snowball
Caveats
═══════
Based on the published vocabularies this implementation produces the
following discrepancies:
━━━━━━━━━━━━━━━━━━━━━━━━━━
Input Output Canonical
──────────────────────────
"'" "" "'"
"''" "" "''"
"'a" "a" "'a"
"'s" "s" "'s"
"a'" "a" "a'"
━━━━━━━━━━━━━━━━━━━━━━━━━━
This results from the (perceived) ambiguity in the handling of
apostrophes between other implementations and the written descriptions
within [the documentation]. These discrepancies represent 0.016997%
error in the included test corpus.
[the documentation] http://snowball.tartarus.org/texts/apostrophe.html