Final Project Topic Ideas
Research Projects
Semantic data types for visualization generation
We have seen how basic type information for data variables, such as nominal, ordinal, and quantitative, provide valuable guidance for automating visualization design. Richer semantic types may further support visualization design as well as facilitate data transformation and integration. For example, knowing that a quantitative variable is a particular type of currency allows software to generate more appropriate labels and perform exchange rate lookups when visualizing multiple currencies at once. Date information might further allow proper historical adjustments. Incorporate semantic typing into a visualization generation tool and explore the impacts of various types of semantics (e.g., geography, currency, person, measurement units, etc) in production rules for visualization creation. You might also explore interface mechanisms for tagging data variables with type information and algorithms to automatically infer semantic types from an input data set.
Cartograms
Cartograms are maps that scale the area of a region to reflect some other data (e.g. population). As noted by the cartogram central website, cartographers have developed many different types of cartograms. We are beginning to see algorithms capable of generating some types of cartograms. Many of the algorithms use optimization techniques to design a cartogram that maintains a particular set of constraints. One project in this area is to develop a new algorithm for creating cartograms. The project could focus on identifying a particular set of constraints that are important for creating a particular type of cartogram and then implementing the constraints using standard optimization techniques. For example, you might develop an algorithm for producing Dorling cartograms.
Graphical perception
We have seen a number of papers in the class that conduct graphical perception experiments to evaluate how quickly and accurately people can decode various types of graphs and charts (e.g. Cleveland and McGill’s experiments comparing bar charts to pie charts). However there are many types of graphs and charts for which such experiments have not yet been performed. For this project you could run a graphical perception experiment for a kind of chart that hasn’t been investigated yet. The challenges are to identify which aspects of perception to test, designing the experimental methodology and analyzing the resulting data.
Genealogy of D3 code
D3 has become the standard tool for developing interactive visualizations for the Web and programmers have created tens of thousands of visualizations using D3. However, many programmers often start by taking existing examples of D3 code and modifying them to solve their own problems. For example Mike Bostock’s blocks are a common starting point. Other collections include The Big List of D3.js Examples, and the D3.js Gallery. Many of these examples are modifications/extensions of earlier examples. The goal of this project is to apply code diffing and comparison techniques to identify the shared pieces of code amongst these examples and then to visualize the evolution of the code. You should try to identify the primary source examples and then trace out how these examples are modified to generate new examples by looking for shared code.
Visualizing rhyme structure in musical lyrics
The Wall Street Journal recently published an article visualizing the rhyme patterns in the lyrics of the songs from Hamilton. They have described their process for creating the visualization on this webpage and in a published article. The goal of this project is to apply this analysis process to other musical lyrics – can be from musicals, rap songs, etc. – and automate the analysis as much as possible.
Data Analysis/Explainer
Visualizing TED Talk Popularity
TED is a nonprofit organization devoted to the ideas worth spreading. The TED talks cover a broad range of topics, from Technology, Education, Design to global issues. With the rich data types available in the dataset, which includes comments, dates, tags, views, and even transcripts, there is a lot of analysis you can delve deep in. For example, you can design an interactive interface for people to explore topics they may be interested in or uncover insights about what topics people are most interested in, what topics are overlooked, and how the pattern has shifted over time. Since there is a great range of textual input, it provides opportunities in exploring text mining and effective text visualization.
Data source: https://www.kaggle.com/rounakbanik/ted-talks
Why is my flight delayed? Investigating Flight Delays & Cancellations
Flight delays are estimated to have cost air travelers billions of dollars. FAA/Nextor estimated the annual costs of delays (direct cost to airlines and passengers, lost demand, and indirect costs) in 2017 to be $26.6 billion. With external dataset, you can try to uncover the correlation between flight delays and factors like weather. Using the geological information, you can also identify and visualize the geological pattern of the flight delays. Investigate the common causes or potential pattern of the delays and present the insights you find with visualization.
Data source: https://www.kaggle.com/usdot/flight-delays#flights.csv
Other data sources
As noted in assignment 2, there are a variety of data sources available online. Here are some possible sources to consider for a data analysis/explainer project.
- Data is Plural - Variety of datasets and sources covering many topics.
- Stanford Institutional Research & Decision Support - Stanford institutional data (e.g. enrollment, admissions, diversity, etc.).
- data.gov - U.S. Government open datasets.
- U.S. Census Bureau - Census data.
- Federal Elections Commission - Campaign finance and expenditures.
- Federal Aviation Administration - FAA data.
- Awesome Public Datasets - Variety of public datasets.