Here’s my program so far:
Even though I don’t see as many milestones checked off as I would like, I feel like I’ve made a great deal of progress on this project. The first thing I worked on was to create a new class called “book” which would have all of the functions for importing and splicing the text files of the Project Gutenberg books. Once again, I used my tried and true method of printing each step as I go to make sure nothing was broken. I also decided to create a test file that was shorter than either of the two full books I uploaded, and I added “test sentences” to it to see if it calculated the most common words and sentences correctly. I got everything working right on the test file, so I tried it out on the full books, but it was causing an error on the H.L. Mencken book. It turned out I had to add a space to one of the lines in the file to make my program read it properly. The calculations for word count take quite a while, so I used Aaron’s “print processing” function from his printer module to show that the application hadn’t crashed. I plan to do something else
#Copied from Aaron Plocharczyk <https://silshack.github.io/summer2017/businessowl-project-update-stand-up-2.html>
def print_processing():
print("Working on it...")
time.sleep(1)
I am still confident that I can accomplish my goals, because the base caulculations are done. I need to visualize the statistics in a histogram, create a loop that allows for the user to compare several books at once. And display the most common words for each book. My milestones are the same as before. I feel I can accomplish them in time:
- [X] Read Project Gutenberg text file.
- [X] Isolate the relevant text from the file.
- [X] Calculate average word, sentence, paragraph lengths
- [X] Store these statistics for comparison to other files.
- Visualize the data with histograms.
- Save the statistics in a new file.
Advanced milestones:
- Add abilitiy to enter a URL to store a new Project Gutenberg book into the program and perform the same analysis.