Natalie's project topic

by Natalie Le

13 Apr 2016

I decided that I will use a dataset from Kaggle. The program will explore different aspects of health plans offered on the marketplaces across the country. https://www.kaggle.com/hhsgov/health-insurance-marketplace This dataset consists of several smaller csv files. One particular file is 2GB so I will need to take a sample of it to work with the sample first. I’m still figuring how to do this. I see a few suggestions on stackoverflow and will try them out. The biggest issue for me at this point is to try and conceptualize what I want to do with the data, what interesting questions do I think these data will answer, what variables do I want to use. I have to anticipate the interesting questions that users can ask with this dataset. Sort of like the drawing app where I provider users with a set of tools but they can create something creative entirely of their own. Also it is very important for me to spend time to understand the data, variables and values. For example I saw some plans with only $50 premium whereas some are $1000 and the age range is the same. Upon looking up the plan ID in another file, I realized that the really low premium are the dental-only plans. That’s why they’re so low! This means I have to include this variable and have some way to let the user choose between medical plans and dental plans.

Milestones:

  • clean and attach relevant files into Trinket
  • main files from Kaggle
  • Consumer Price Index file (to compare premium with price parity)
  • county codes file/json code
  • design menu selection screen
  • description of the program and the variables available
  • help menu on buttons and functions
  • descriptive statistics functions
  • interaction functions (eg. How many plans in NC lower than $100?)
  • bar charts
  • transition between functions
  • clear and reset
Natalie is a first year MSPH student at the Gillings School of Public Health. Find Natalie Le on Twitter, Github, and on the web.