MOAR Dictionary Exercises

by Elliott Hauser

07 Apr 2016

Complete the below individually, with reflections, in a PR. Use the starter code, and all of 2nd through 5th exercises can be completed in the same trinket if you want.

You will need regexes to complete these exercises. It’s OK to search for and use regexes from the internet if you document their sources since we’re trying to learn dictionaries. But It’s best if you can write them yourself.

For ease of debugging I suggest commenting out the part that asks for a file name until the very end. Instead, set the variable to "mbox-short.txt". This will save you a lot of typing.


Exercise 1:

Write a program that reads the words in words.txt and stores them as keys in a dictionary. It doesn’t matter what the values are. Then you can use the in operator as a fast way to check whether a string is in the dictionary.


You can complete all of the below in this one trinket:

Note: with these problems you’ll probably want the .read() method, which returns a file’s contents as a string. You can then regex over the entire file at once, extracting all the matches into a list. Then you know how to count a list into a dictionary.

Exercise 2:

Write a program that categorizes each mail message by which day of the week the commit was done. To do this look for lines that start with “From”, then look for the third word and keep a running count of each of the days of the week. At the end of the program print out the contents of your dictionary (order does not matter).

Sample Line:

From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008

Sample Execution:

Enter a file name: mbox-short.txt
{'Fri': 20, 'Thu': 6, 'Sat': 1}

Exercise 3:

Write a program to read through a mail log, build a histogram using a dictionary to count how many messages have come from each email address, and print the dictionary.

Enter file name: mbox-short.txt
{'gopal.ramasammycook@gmail.com': 1, 'louis@media.berkeley.edu': 3,
'cwen@iupui.edu': 5, 'antranig@caret.cam.ac.uk': 1,
'rjlowe@iupui.edu': 2, 'gsilver@umich.edu': 3,
'david.horwitz@uct.ac.za': 4, 'wagnermr@iupui.edu': 1,
'zqian@umich.edu': 4, 'stephen.marquard@uct.ac.za': 2,
'ray@media.berkeley.edu': 1}

Exercise 4:

Add code to the above program to figure out who has the most messages in the file.

After all the data has been read and the dictionary has been created, look through the dictionary using a maximum loop to find who has the most messages and print how many messages the person has.

Enter a file name: mbox-short.txt
cwen@iupui.edu 5

Note: in the book they try this on mbox.txt, which is 6MB. We won’t. If you’d like to, do so, but don’t include it in what you submit- it’s just a beast to download.

Exercise 5:

This program records the domain name (instead of the address) where the message was sent from instead of who the mail came from (i.e., the whole email address). At the end of the program, print out the contents of your dictionary.

Enter a file name: mbox-short.txt
{'media.berkeley.edu': 4, 'uct.ac.za': 6, 'umich.edu': 7,
'gmail.com': 1, 'caret.cam.ac.uk': 1, 'iupui.edu': 8}
Elliott Hauser is a PhD Student in information science at UNC Chapel Hill. He's hacking education as one of the cofounders of Trinket.io. Find Elliott Hauser on Twitter, Github, and on the web.