Final Project Topic Ideas

Data Analysis/Explainer

Newspaper Navigator

The Newspaper Navigator is a Library of Congress dataset that extracts the visual content of historic newspaper pages. They apply crowdsourcing and machine learning techniques to identify photographs, illustrations, maps, comics, cartoons, headlines and advertisements. Design an interactive explainer to let people explore different aspects of this data set.

Data source: https://news-navigator.labs.loc.gov/

Why is my flight delayed? Investigating Flight Delays & Cancellations

Flight delays are estimated to have cost air travelers billions of dollars. FAA/Nextor estimated the annual costs of delays (direct cost to airlines and passengers, lost demand, and indirect costs) in 2017 to be $26.6 billion. With external dataset, you can try to uncover the correlation between flight delays and factors like weather. Using the geological information, you can also identify and visualize the geological pattern of the flight delays. Investigate the common causes or potential pattern of the delays and present the insights you find with visualization.

Data source: https://www.kaggle.com/usdot/flight-delays#flights.csv

Other data sources

As noted in assignment 2, there are a variety of data sources available online. Here are some possible sources to consider for a data analysis/explainer project.

Research Projects

Label time-series graphs with news headlines

The Stanford Cable TV News Analyzer allows you to see who and what has appeared on cable TV news (CNN, Foc News, MSNBC) over the last decade. While the graphs it produces can be interesting, the graphs often contain outliers, spikes and trends that can be difficult to interpret without additional context. While the tool allows users to click on the graph to see the corresponding videos, a more direct way to provide context would be to label outliers, spikes and trends in the data with new headlines from major newspapers (e.g. the New York Times, The Wall Street Journal, etc.) in the graphs. For example in this graph of the occurrences of the word “domestic terrorism”, it would be useful context if the system could label the peak in Aug 2017 with the headline news that “Attorney General calls Charlottesville attack domestic terrorism”. For this class project your implementation should download csv files from the Stanford Cable TV News Analyzer and develop a tool that automatically identifies features that should be labeled (e.g. outliers, peaks and trends), queries news sites for the relevant headlines and draws a graph with the headlines as annotations. You may alternatively consider labeling stock charts in this manner rather than the TV news data if you prefer.

Cartograms

Cartograms are maps that scale the area of a region to reflect some other data (e.g. population). As noted by the cartogram central website, cartographers have developed many different types of cartograms. We are beginning to see algorithms capable of generating some types of cartograms. Many of the algorithms use optimization techniques to design a cartogram that maintains a particular set of constraints. One project in this area is to develop a new algorithm for creating cartograms. The project could focus on identifying a particular set of constraints that are important for creating a particular type of cartogram and then implementing the constraints using standard optimization techniques. For example, you might develop an algorithm for producing Dorling cartograms.

Visualizing rhyme structure in musical lyrics

The Wall Street Journal recently published an article visualizing the rhyme patterns in the lyrics of the songs from Hamilton. They have described their process for creating the visualization on this webpage and in a published article. The goal of this project is to apply this analysis process to other musical lyrics – can be from musicals, rap songs, etc. – and automate the analysis as much as possible.