Pure Programmer
Blue Matrix


Cluster Map

Project: Frequency Table (Characters)

Write a program to generate a table of frequencies. The program should accept a stream on stdin and total the number of each character seen. Once the stream has been read the program should print a tab-delimited table with columns for Codepoint in hexadecimal, Character, Count, and Frequency. The frequency of each character is the count for that character divided by the total number of characters in the file.

You can capture webpages using the 'curl' program as tests for your program. If the webpage is encoded in ISO-8859-1 (Latin-1) instead of Unicode you can convert it using the 'iconv' program. For example:

$ curl https://pt.lipsum.com/ | iconv -f ISO-8859-1 -t UTF-8 | ./FrequencyTableChars $ curl https://cn.lipsum.com/ | ./FrequencyTableChars

Output
$ javac -Xlint FrequencyTableChars.java $ java -ea FrequencyTableChars < ../../data/text/USConstitution.txt Hex Char Count Freq 0000a 0xa 978 0.0202 00020 10282 0.2129 00022 " 2 0.0000 00028 ( 5 0.0001 00029 ) 5 0.0001 0002c , 565 0.0117 0002d - 51 0.0011 0002e . 290 0.0060 00030 0 4 0.0001 ... 00071 q 47 0.0010 00072 r 2138 0.0443 00073 s 2393 0.0495 00074 t 3647 0.0755 00075 u 747 0.0155 00076 v 416 0.0086 00077 w 347 0.0072 00078 x 95 0.0020 00079 y 492 0.0102 0007a z 31 0.0006 $ javac -Xlint FrequencyTableChars.java $ java -ea FrequencyTableChars < ../../data/text/UnicodeTest.utf8 Hex Char Count Freq 0000a 0xa 70 0.0415 00020 243 0.1442 0002c , 11 0.0065 0002d - 2 0.0012 0002e . 7 0.0042 00030 0 1 0.0006 00031 1 3 0.0018 00034 4 2 0.0012 00037 7 3 0.0018 ... 1f974 ? 1 0.0006 1f975 ? 1 0.0006 1f976 ? 1 0.0006 1f980 ? 1 0.0006 1f981 ? 1 0.0006 1f984 ? 1 0.0006 1f988 ? 1 0.0006 1f98a ? 1 0.0006 1f996 ? 1 0.0006 1f9d0 ? 1 0.0006

Solution