Files Exercises

by Elliott Hauser

22 Mar 2016

We’ll work on these together in class! If you finish you’re done. Otherwise, finish by midnight.

Submit a well-formatted pull request to our class blog with embedded Trinket programs for the below three exercises from Chapter 7 (use these instead of the ones in the book - I added a few explanations) AND the extra one I added.

Complete these on your own, using only the materials in this Chapter. Do not look at other students’ submissions until after you’ve completed your work.

After your programs are done, check other students’ work and other resources online if you had questions. Include a reflection about what you think you’ve learned and any concepts that are still fuzzy to you. Did you encounter frustrating situations? Did you feel a lightbulb turn on?


Exercise 1: Write a program to read through a file and print the contents of the file (line by line) all in upper case. Executing the program will look as follows:

python shout.py
Enter a file name: mbox-short.txt
FROM STEPHEN.MARQUARD@UCT.AC.ZA SAT JAN  5 09:14:16 2008
RETURN-PATH: <POSTMASTER@COLLAB.SAKAIPROJECT.ORG>
RECEIVED: FROM MURDER (MAIL.UMICH.EDU [141.211.14.90])
     BY FRANKENSTEIN.MAIL.UMICH.EDU (CYRUS V2.3.8) WITH LMTPA;
     SAT, 05 JAN 2008 09:14:16 -0500

You can download the file from www.pythonlearn.com/code3/mbox-short.txt and then paste it into a Trinket.

Exercise 2: Write a program to prompt for a file name, and then read through the file and look for lines of the form:

X-DSPAM-Confidence:0.8475

When you encounter a line that starts with “X-DSPAM-Confidence:” pull apart the line to extract the floating-point number on the line. Count these lines and then compute the total of the spam confidence values from these lines. When you reach the end of the file, print out the average spam confidence.

Enter the file name: mbox.txt
Average spam confidence: 0.894128046745

Enter the file name: mbox-short.txt
Average spam confidence: 0.750718518519

Test your file on the mbox.txt and mbox-short.txt files. Note: the mbox.txt file is VERY LARGE (6mb), so don’t include it in your final trinket as it will make loading the trinket very slow. You can find the file here: www.pythonlearn.com/code3/mbox.txt

Exercise 3: Sometimes when programmers get bored or want to have a bit of fun, they add a harmless Easter Egg to their program (en.wikipedia.org/wiki/Easter_egg_(media)). Modify the program that prompts the user for the file name so that it prints a funny message when the user types in the exact file name “na na boo boo”. The program should behave normally for all other files which exist and don’t exist. Here is a sample execution of the program:

python egg.py
Enter the file name: mbox.txt
There were 1797 subject lines in mbox.txt

python egg.py
Enter the file name: missing.tyxt
File cannot be opened: missing.tyxt

python egg.py
Enter the file name: na na boo boo
NA NA BOO BOO TO YOU - You have been punk'd!

We are not encouraging you to put Easter Eggs in your programs–this is just an exercise.

Exercise 4: Analyze the data in the following Trinket

  • Build a list of lists data structure
  • Explore: sample the 4th column, rows 0-25
  • Explore: Form a table, rows 0-10, columns 0-4
  • Form a table with only Product, Price, Payment Type, and State
  • Print each Payment Type and its share of the number of transactions

We’ve not learned dictionaries yet, but lists of lists are a good introduction. They’re somewhat powerful but we’ll return to this example with dictionaries in hand and everything will be much easier.

Remember lists of lists:

[ [... , ... , ...]   # Row 0
  [... , ... , ...]   # Row 1
  [... , ... , ...]   # Row 2
  [... , ... , ...]   # Row 3
  [... , ... , ...] ] # Row 4
Elliott Hauser is a PhD Student in information science at UNC Chapel Hill. He's hacking education as one of the cofounders of Trinket.io. Find Elliott Hauser on Twitter, Github, and on the web.