Posted By

nickleeh on 10/30/08


Tagged

python


Versions (?)

Who likes this?

4 people have marked this snippet as a favorite

carlosabargues
adkatrit
gartenstuhl
goliatone


Word Frequency


 / Published in: Python
 

Let’s write a program that analyzes text documents and counts how many times each word appears in the document.

  1. # wordfreq.py
  2.  
  3. import string
  4.  
  5. def compareItems((w1,c1), (w2,c2)):
  6. if c1 > c2:
  7. return - 1
  8. elif c1 == c2:
  9. return cmp(w1, w2)
  10. else:
  11. return 1
  12.  
  13. def main():
  14. print "This program analyzes word frequency in a file"
  15. print "and prints a report on the n most frequent words.\n"
  16.  
  17. # get the sequence of words from the file
  18. fname = raw_input("File to analyze: ")
  19. text = open(fname,'r').read()
  20. text = string.lower(text)
  21. for ch in """!"#$%&()*+,-./:;<=>?@[\\]?_'`{|}?""":
  22. text = string.replace(text, ch,' ')
  23. words = string.split(text)
  24.  
  25. # construct a dictionary of word counts
  26. counts = {}
  27. for w in words:
  28. try:
  29. counts[w] = counts[w] + 1
  30. except KeyError:
  31. counts[w] = 1
  32.  
  33. # output analysis of n most frequent words.
  34. n = input("Output analysis of how many words? ")
  35. items =counts.items()
  36. items.sort(compareItems)
  37. for i in range(n):
  38. print "%-10s%5d" % items[i]
  39.  
  40. if __name__ == '__main__': main()

Report this snippet  

You need to login to post a comment.