Nat's Project Update I

by Natalia Lopez

18 Apr 2016

Since last class, I was able to access the two data files (one of the course catalog and one from the list of classes that the Research Hub has already supported). I was able to get both files to open and generate a list of just the courses with a specific term, such as the word “seminar.” I was most recently going through the other data set with information about previous classes and trying to pull just the course code from the data that has a course code associated with it. I was able to do that! Now, I’m trying to use the course code data on that sheet to pull the course description from the other data file. I have not really figured out how to do that. That and creating a dictionary of all the words in the descriptions with their counts to generate the most frequently used words is my next major task I want to focus on. I have relied heavily on code from previous exercises to help me. I do wonder if sometimes I’m writing or using starter code that could be better written. Should I focus on that? Or is that something you can refine as you progress? Milestones:

  • Create text-based user interface with options for printing course descriptions based on a key term that can be selected by user
  • For file with data about classes supported in the past, create dictionary with values that count number of times each word appears.
  • For file with data about classes supported in thr past, join course titles with course descriptions from the larger data set (if needed)
  • Return a list of words with highest values.
  • For file with course data information, search for most frequently used terms. I can create a for loop that goes through each word in the list and searches each row in a the course description column and returns the course title (will have to print a different column then the one being searched) plus the column with the full description.
  • Possibly see if I can see if multiple terms are used within the same course description and use the number of keywords used plus their frequency to rank them in order.
  • If time permits, return a list of departments with the highest returns of frequently used terms.
  • Create graph of departments with courses that have the highest returns of frequently used terms.

Steps: For Tuesday: - [ ] Compile and prepare data - [ ] Return course descriptions and course information using test keyword - [ ] Return list of course codes from data set with courses supported previously.

For Thursday: - [ ] For file with data about classes supported in thr past, join course titles with course descriptions from the larger data set (if needed) - [ ] For file with data about classes supported in the past, create dictionary with values that count number of times each word appears. - [ ] Return a list of words with highest values.

To be scheduled:

  • For file with course data information, search for most frequently used terms. I can create a for loop that goes through each word in the list and searches each row in a the course description column and returns the course title (will have to print a different column then the one being searched) plus the column with the full description.
  • Possibly see if I can see if multiple terms are used within the same course description and use the number of keywords used plus their frequency to rank them in order.
  • If time permits, return a list of departments with the highest returns of frequently used terms.
  • Create graph of departments with courses that have the highest returns of frequently used terms
  • Create user interface
Nat (batlopez) is a first year MSLS student interested in digital research services. Find Natalia Lopez on Twitter, Github, and on the web.