Matt's Third Project Update

Here’s my program so far:

Even though I don’t see as many milestones checked off as I would like, I feel like I’ve made a great deal of progress on this project. The first thing I worked on was to create a new class called “book” which would have all of the functions for importing and splicing the text files of the Project Gutenberg books. Once again, I used my tried and true method of printing each step as I go to make sure nothing was broken. I also decided to create a test file that was shorter than either of the two full books I uploaded, and I added “test sentences” to it to see if it calculated the most common words and sentences correctly. I got everything working right on the test file, so I tried it out on the full books, but it was causing an error on the H.L. Mencken book. It turned out I had to add a space to one of the lines in the file to make my program read it properly. The calculations for word count take quite a while, so I used Aaron’s “print processing” function from his printer module to show that the application hadn’t crashed. I plan to do something else

#Copied from Aaron Plocharczyk <https://silshack.github.io/summer2017/businessowl-project-update-stand-up-2.html>

def print_processing():
  print("Working on it...")
  time.sleep(1)

I am still confident that I can accomplish my goals, because the base caulculations are done. I need to visualize the statistics in a histogram, create a loop that allows for the user to compare several books at once. And display the most common words for each book. My milestones are the same as before. I feel I can accomplish them in time:

[X] Read Project Gutenberg text file.
[X] Isolate the relevant text from the file.
[X] Calculate average word, sentence, paragraph lengths
[X] Store these statistics for comparison to other files.
Visualize the data with histograms.
Save the statistics in a new file.

Advanced milestones:

Add abilitiy to enter a URL to store a new Project Gutenberg book into the program and perform the same analysis.