Project Update 3

by Erica Brody

26 Apr 2016

A snapshot of my progress can be found in this GitHub repo: https://github.com/ericabrody/data_explore.git

Between Thursday and Saturday, I made a lot of progress. I got to a major hurdle trying to fomat the output for looking at univariate statistics for a variable by another variable (state/sex). The mean is a float with 2 decimal places, while all the other numbers are integers, so they just were not going to line up with tabs. Eric suggested I look at Python format strings and I found some helpful resources – https://pyformat.info and http://www.python-course.eu/python3_formatted_output.php. It took a lot of time to parse the documentation to figure out exactly what I needed, but I finally got there with a lot of breaks and trial and error. I am confident that I will complete all the project requirements by Friday, but not sure how the stretch goals will go.

Milestones Finished - [x] Review BRFSS documentation for 2014 and 2013 to identify which columns are associated with the variables I want to include in my datasets

[x] Import the desired variables from the 2013 and 2014 data files which are in fixed width format

[x] Convert the fixed width format into a csv format outside of my actual data analysis program (and require users who bring their own data to bring a csv)

[x] New (surpise milestone) Set up new GitHub repo and connect it to Cloud 9 to accomodate the data files that were too big and blew up my Trinket

[x] Input 2 csv data files into GitHub repo and stored them as a dictionary of a list of dictionaries using csv.DictReader after reading the python csv module documenation

[x] Created and review frequencies of each categorical variable using dictionary of response values

[x] Created dictionary of response values for categorical variables

[x] Create text visualization - horizontal bar chart of response values of any variable

[x] Calculate univariate statistics - min, mean, max, mode, median, range

[x] get function working that changes variable response values to response names as defined in “dataDictionary” dictionary in helpers.py

[x] :Calculate bivariate statistics for two specific variables- average tobacco use by state

[x] Generalize bivariate statistics into a function

[x] Create nicely formatted output for the univariate statistics

[x] Create nicely formatted table of bivariate statistics results - first for specific variables and then in generalized function

[x] Put results of twovar function that shows results of a categorical variable by state or sex or another variable into nicely formatted table - first for specific variables and then in a generalized function

[x] Create interface for stage 1 of user interaction - choose topic of interest from list of topics and year of data

[x] Create functionality for program to work with a “bring your own” dataset

[x] stretch goal - Functionality to print out nicely formatted output for categorical variable by state and sex

[x] stretch goal - generalize the categorical variable by state and sex and the corresponding nicely formatted output into functions

For Thursday: Finish creating user interface and documentation, possibly work on stretch goals

[ ] Create interface for stage 2 of user interaction - see data by sex or state or (look at a new topic and/or new year of data - i.e back to stage 1

[ ] Create interface for stage 3 of user interaction

[ ] Create help document

[ ] Create introduction to the program

[ ] Stretch Goal - create vertical bar charts

[ ] Stretch goal: create bar chart that shows results of one variable by another for specific variables tobacco use by state

[ ] Stretch goal: Generalize bar char of one var vs. another into a function

I am a library science student, focused on health information. I am dabbling in information science this semester. Find Erica Brody on Twitter, Github, and on the web.