Project: Frequency Table (Characters)
Write a program to generate a table of frequencies. The program should accept a stream on stdin and total the number of each character seen. Once the stream has been read the program should print a tab-delimited table with columns for Codepoint in hexadecimal, Character, Count, and Frequency. The frequency of each character is the count for that character divided by the total number of characters in the file.
You can capture webpages using the 'curl' program as tests for your program. If the webpage is encoded in ISO-8859-1 (Latin-1) instead of Unicode you can convert it using the 'iconv' program. For example:
$ curl https://pt.lipsum.com/ | iconv -f ISO-8859-1 -t UTF-8 | ./FrequencyTableChars
$ curl https://cn.lipsum.com/ | ./FrequencyTableChars
Output
$ g++ -std=c++17 FrequencyTableChars.cpp -o FrequencyTableChars -lfmt
$ ./FrequencyTableChars < ../../data/text/USConstitution.txt
Hex Char Count Freq
0000a 0xa 978 0.0202
00020 10282 0.2129
00022 " 2 0.0000
00028 ( 5 0.0001
00029 ) 5 0.0001
0002c , 565 0.0117
0002d - 51 0.0011
0002e . 290 0.0060
00030 0 4 0.0001
...
00071 q 47 0.0010
00072 r 2138 0.0443
00073 s 2393 0.0495
00074 t 3647 0.0755
00075 u 747 0.0155
00076 v 416 0.0086
00077 w 347 0.0072
00078 x 95 0.0020
00079 y 492 0.0102
0007a z 31 0.0006
$ g++ -std=c++17 FrequencyTableChars.cpp -o FrequencyTableChars -lfmt
$ ./FrequencyTableChars < ../../data/text/UnicodeTest.utf8
Hex Char Count Freq
0000a 0xa 70 0.0415
00020 243 0.1442
0002c , 11 0.0065
0002d - 2 0.0012
0002e . 7 0.0042
00030 0 1 0.0006
00031 1 3 0.0018
00034 4 2 0.0012
00037 7 3 0.0018
...
1f974 🥴 1 0.0006
1f975 🥵 1 0.0006
1f976 🥶 1 0.0006
1f980 🦀 1 0.0006
1f981 🦁 1 0.0006
1f984 🦄 1 0.0006
1f988 🦈 1 0.0006
1f98a 🦊 1 0.0006
1f996 🦖 1 0.0006
1f9d0 🧐 1 0.0006
Solution