Pure Programmer
Blue Matrix


Cluster Map

Project: Frequency Table (Words)

Write a program to generate a table of word frequencies. The program should accept a stream on stdin and total the number of each word seen. Once the stream has been read the program should print a tab-delimited table with columns for the Word, Count, and Frequency. The frequency of each word is the count for that word divided by the total number of words in the file.

You can get books from the [[Project Gutenberg]] site to use in testing your program.

Output
$ perl FrequencyTableWords.pl < ../../data/text/GettysburgAddress.txt Word Count Freq 1863 1 0.003484 19 1 0.003484 a 7 0.024390 above 1 0.003484 add 1 0.003484 address 1 0.003484 advanced 1 0.003484 ago 1 0.003484 all 1 0.003484 ... war 2 0.006969 we 10 0.034843 what 2 0.006969 whether 1 0.003484 which 2 0.006969 who 3 0.010453 will 1 0.003484 work 1 0.003484 world 1 0.003484 years 1 0.003484 $ perl FrequencyTableWords.pl < ../../data/text/USConstitution.txt Word Count Freq 1 21 0.002750 10 1 0.000131 10th 1 0.000131 11th 1 0.000131 12th 1 0.000131 13th 1 0.000131 14th 1 0.000131 15th 2 0.000262 16th 1 0.000131 ... would 2 0.000262 writ 1 0.000131 writing 1 0.000131 writings 1 0.000131 writs 2 0.000262 written 5 0.000655 year 10 0.001310 years 23 0.003012 yeas 2 0.000262 york 2 0.000262 $ perl FrequencyTableWords.pl < ../../data/text/UnicodeTest.utf8 Word Count Freq 10 1 0.003802 11 1 0.003802 4 1 0.003802 4スコアと7年前 1 0.003802 7 2 0.007605 8 1 0.003802 9 1 0.003802 a 1 0.003802 ago 1 0.003802 ... 우리 1 0.003802 인이 1 0.003802 잉태 1 0.003802 전 1 0.003802 점 1 0.003802 제안했습니다 1 0.003802 조상들은 1 0.003802 창조되었다고 1 0.003802 평등하게 1 0.003802 한 1 0.003802

Solution