Yiyang's Final Project Plan

by Yiyang Shi

13 Apr 2016

Plan:

For the final project, I decided to choose the data analysis project. I found an interesting dataset on airline on-time performance from US department of transportation. The database consist of detailed information on each flight happened in the specific month based on choice in United States. For each flight, there is data on the carrier, the flight number, the flight date and time, the origin location, the destination location, and, most importantly the delay time and reasons. For the analysis, I would like the users to be able to obtain three types of information. First, what is the basic statistics on the delay time and reasons for the specific carrier or airport they chose. Second, what are the carriers that are most likely to delay time. Third, what day of a week or what airport to avoid in order to avoid uncontrollable delay. I would also like to include several months of data so much users can cross compare.

Milestones:

  • Configure PyCharm. Clean and upload the large database into PyCharm.
  • Write function that take carrier name as input and calculate the max, mean, and min delay time for the carrier.
  • Write function that take carrier name as input and calculate the top delay reasons and corresponding delay time for the carrier.
  • Do step 2 and 3 again for the airports.
  • Write function to calculate the top 10 carriers that are most likely to delay.
  • Write function to calculate the average delay time for different day of week. Write function to calculate the top airports that have high delay time due to weather or other reasons.
  • Run the above codes on data of another month.
  • Build visualization on the top 10 carriers
  • Build user interface. Allow users to choose between the three functionality.
  • Create Help Functions
  • Improve the program. Creat “graceful” error message and quit option.
  • Debug.

For Next Tuesday:

  • Configure PyCharm. Clean and upload the large database into PyCharm.
  • Write function that take carrier name as input and calculate the max, mean, and min delay time for the carrier.
  • Write function that take carrier name as input and calculate the top delay reasons and corresponding delay time for the carrier.
  • Do step 2 and 3 again for the airports.

To Be Scheduled:

  • Write function to calculate the top 10 carriers that are most likely to delay.
  • Write function to calculate the average delay time for different day of week. Write function to calculate the top airports that have high delay time due to weather or other reasons.
  • Run the above codes on data of another month.
  • Build visualization on the top 10 carriers
  • Build user interface. Allow users to choose between the three functionality.
  • Create Help Functions
  • Improve the program. Creat “graceful” error message and quit option.
  • Debug.

Strech Goals:

  • Maybe use turtle to create line graph.
  • Make the program into an “App”-like program. Make it fun.
Find Yiyang Shi on Twitter, Github, and on the web.