How to Write a Spelling Corrector (in 21 lines)


/ Published in: Python
Save to your folder(s)

Requires an external file for the statistical algorithm... (big.txt)


Copy this code and paste it in your HTML
  1. import re, collections
  2.  
  3. def words(text): return re.findall('[a-z]+', text.lower())
  4.  
  5. def train(features):
  6. model = collections.defaultdict(lambda: 1)
  7. for f in features:
  8. model[f] += 1
  9. return model
  10.  
  11. NWORDS = train(words(file('big.txt').read()))
  12.  
  13. alphabet = 'abcdefghijklmnopqrstuvwxyz'
  14.  
  15. def edits1(word):
  16. splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
  17. deletes = [a + b[1:] for a, b in splits if b]
  18. transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
  19. replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
  20. inserts = [a + c + b for a, b in splits for c in alphabet]
  21. return set(deletes + transposes + replaces + inserts)
  22.  
  23. def known_edits2(word):
  24. return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
  25.  
  26. def known(words): return set(w for w in words if w in NWORDS)
  27.  
  28. def correct(word):
  29. candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
  30. return max(candidates, key=NWORDS.get)

URL: http://norvig.com/spell-correct.html

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.