Porter 2 stemmer in Common Lisp

heads

tip
browse log

clone

read-only
https://hg.sr.ht/~nprescott/stemmer
read/write
ssh://hg@hg.sr.ht/~nprescott/stemmer
Porter-2 Stemmer for English
════════════════════════════

  Intended to be an implementation of the [Porter Stemming Algorithm]
  for English.


[Porter Stemming Algorithm]
http://snowball.tartarus.org/algorithms/english/stemmer.html

Usage
─────

  This package exports a single function `stem' which will reduce
  inflectional forms and affixes for English words by a heuristic
  process.


Example
╌╌╌╌╌╌╌

  ┌────
  │ STEMMER> (stem "adverserial")
  │ "adverseri"
  │ 
  │ STEMMER> (stem "disjointed")
  │ "disjoint"
  │ 
  │ STEMMER> (stem "hangings")
  │ "hang"
  └────


Origin
──────

  This originated as a port of the [Snowball Go module] (MIT
  licensed). It has been trimmed down and modified to the point that it
  might be recognizable if you squint.


[Snowball Go module] https://github.com/kljensen/snowball


Caveats
═══════

  Based on the published vocabularies this implementation produces the
  following discrepancies:

  ━━━━━━━━━━━━━━━━━━━━━━━━━━
   Input  Output  Canonical 
  ──────────────────────────
   "'"    ""      "'"       
   "''"   ""      "''"      
   "'a"   "a"     "'a"      
   "'s"   "s"     "'s"      
   "a'"   "a"     "a'"      
  ━━━━━━━━━━━━━━━━━━━━━━━━━━

  This results from the (perceived) ambiguity in the handling of
  apostrophes between other implementations and the written descriptions
  within [the documentation]. These discrepancies represent 0.016997%
  error in the included test corpus.


[the documentation] http://snowball.tartarus.org/texts/apostrophe.html