Nat's Final Data Analysis Project

by Natalia Lopez

30 Apr 2016

For my data analysis final project, I used UNC course catalog data and wrote a script that would parse through the description in search of a user-generated keyword. As this project is for a work assignment, I initially planned on parsing through data from LibAnalytics that documents courses the Research Hub at Davis Library has supported within the past two years to generate a list of terms that were most frequently found in the descriptions. However, after creating a step by step plan, I decided to break this project into smaller phases; the first of which is this project.

Ultimately, I think I had two big successes with this project:

WIN 1: Learning Project Management and Scoping Skills!

The biggest take away from this project was the ability to develop stronger project management and scoping skills. Developing milestones in the beginning of the project was relatively helpful in terms of getting me going and creating a skeletal framework for how to work. However, this was the first time I basically had a dedicated notebook where I broke down the specific tangible steps I needed to take to carry out the aspect of the project I was working on that day. I frequently wrote out what I wanted my program to do and then went back and inserted programming language into my write-up. From that, I was able to pretty consistently generate a list of three to four steps that I wanted to try to focus on and also to prioritize accordingly. There were times where I just generated a list in a random order and was able to reflect and better understand how focusing on one step was necessary before trying to move forward to the next. This was particularly striking when I compared it to my process during the turtle mid-term assignment where I was incredibly overwhelmed and had no ability to just focus on any one part at a time. Instead, I kept frustratingly going back and forth between multiple parts and thus, was never really able to truly get any one thing to work. Suddenly, my project became really small manageable tasks that were immensely rewarding and motivating whenever they were completed. I actually really enjoyed working on this project and TOTALLY want to continue coding now! Thanks Elliott!

WIN 2: CLEANING MY CODE

After several days of working on my project, I got to a point where I realized that I had basically made a ton of dictionaries when all I really needed was to make a dictionary within a dictionary. I was mildly terrified and was somewhat avoiding having to do this. However, there did come a point where there was no avoiding it any longer. Once I finally figured out how to set up my dictionary within a dictionary, I realized that my code was extra long and messy and could be significantly simplified by editing down my code and consolidating it all into one. I was pretty nervous about moving around my code that I had so painstakingly spent days trying to write. To my surprise, the editing and cleaning was actually pretty quick! It was assuring to me that I had a pretty grounded understanding of what I was doing at that point since I was able to move a lot more quickly. I had also gotten better at using printing in targeted locations to better assess where an error was coming from. With it cleaned up, I was really able to move much faster, read my code, and more effectively discuss it with others when I hit roadblocks.

User Interface: Using the starter code provided in class, I was able to create a user interface where the user can first select the way in which they would like their data returned on the screen. They have four options: (1) codes and counts, (2) codes, course descriptions, and counts, (3) codes, course descriptions, counts, and visualizations. I built a function that basically searches for the key word and then call it under each if statement that matches the way the user wants the results reported. The one thing I didn’t really have time to figure out was how to add the helper function to the helper code rather then have it as part of the main page. That just makes the main page a bit longer than I’d prefer.

Some of the major issues I had with my code are the following:

ISSUE#1: One of the issues I came across that is a small (and annoying bug) is with option one, I made a table to return the codes and counts next to them. Because some of the codes are three letter versus four letter codes, though I was able to write the code with regex to return the whole course code, it is messing up the way the tab works so that the ones with three letter codes doesn’t tab as much as the one with four letter codes. This is just aesthetically annoying!

ISSUE#2: Data file is too large for trinket. I ran the program with another IDE and was able to run the entire code. Right now, its pretty frustrating because I couldn’t figure out how to get it to sift through the whole course catalog. I tried creating separate CSV files in tricket and basically breaking it up so the user could select the discipline based on alphabet (so pick courses in departments that start with letters A-D) and then only call and read that file if that one is selected but it slowed trinket down and wouldn’t run. I don’t really know how to handle this situation.

One thing I feel I can still improve upon is my ability to search for answers and resources on my own. For whatever reason, the documentation that exists is sometimes really overwhelming and confusing to parse through. I did keep referring back to our chapters on dictionaries and regex in particular and actually had them just saved as a pdf that I would regularly pull up and go through. I also kept going through previous projects and looking through the code so that I could apply it to my situation. During group work, I had times when I asked people specific tangible questions, but I also definitely saw the value of talking through my problem with someone else aloud. One more than one occasion, I found myself figure out the answer while talking it out. For instance, while I was explaining the issue of wanting to generate the descriptions of a course code and the count, I realized pretty quickly that the most efficient way to go about that was to just build a dictionary within a dictionary. I also talked through how to create a user interface that would be give people options for how to receive their data and realized that one idea I had would’ve been overly complicated and unnecessary.

I also recognized through this project that I really enjoy tangible projects and that even though it felt super daunting, it was really great to have a problem to solve and to do it alongside people who were also invested in the same kind of work. It helped me just generally reflect on my learning style a lot more and what things motivate me most to move forward. I would frequently be surprised at how much I wanted to prolong starting coding because of sheer anxiety and a sense of overwhelming dread. Everytime I started thinking about this project when I first started, I wanted to curl up into a ball and hide. So, its been especially empowering to sit down and slowly see small progress. I think having the group work time in class made a big difference for me. I also think it ended up coinciding with some of the meet ups I went to with friends from class and that was helpful. I just realized how setting aside time to work on things in a specific space made it easier to not get distracted or avoid the anxiety.

Nat (batlopez) is a first year MSLS student interested in digital research services. Find Natalia Lopez on Twitter, Github, and on the web.