Posted By

clopez on 01/27/12


Tagged

Bash text frequency distribution


Versions (?)

Get word frequency distribution


 / Published in: Bash
 

When you run this over a text.txt with some text you will get the word distribution on output_ngram.txt as follows:

30 m 29 por 29 aplicaci 27 modelo 27 datos 24 con 21 este 21 esta 20 En 18 posible 18 palabras 18 como 17 texto 14 tem 14 no 14 documentos 14 cada 14 Por 13 ya 13 todo 13 textos 13 proceso

  1. tr -sc 'A-Za-z' '\012' < text.txt | sort | uniq -c | sort -nr > output_ngram.txt

Report this snippet  

You need to login to post a comment.