Final Project Topic Ideas
Semantic data types for visualization generation
We have seen how basic type information for data variables, such as nominal, ordinal, and quantitative, provide valuable guidance for automating visualization design. Richer semantic types may further support visualization design as well as facilitate data transformation and integration. For example, knowing that a quantitative variable is a particular type of currency allows software to generate more appropriate labels and perform exchange rate lookups when visualizing multiple currencies at once. Date information might further allow proper historical adjustments. Incorporate semantic typing into a visualization generation tool and explore the impacts of various types of semantics (e.g., geography, currency, person, measurement units, etc) in production rules for visualization creation. You might also explore interface mechanisms for tagging data variables with type information and algorithms to automatically infer semantic types from an input data set.
Cartograms
Cartograms are maps that scale the area of a region to reflect some other data (e.g. population). As noted by the cartogram central website, cartographers have developed many different types of cartograms. We are beginning to see algorithms capable of generating some types of cartograms. Many of the algorithms use optimization techniques to design a cartogram that maintains a particular set of constraints. One project in this area is to develop a new algorithm for creating cartograms. The project could focus on identifying a particular set of constraints that are important for creating a particular type of cartogram and then implementing the constraints using standard optimization techniques. For example, you might develop an algorithm for producing Dorling cartograms.
Graphical perception
We have seen a number of papers in the class that conduct graphical perception experiments to evaluate how quickly and accurately people can decode various types of graphs and charts (e.g. Cleveland and McGill’s experiments comparing bar charts to pie charts). However there are many types of graphs and charts for which such experiments have not yet been performed. For this project you could run a graphical perception experiment for a kind of chart that hasn’t been investigated yet. The challenges are to identify which aspects of perception to test, designing the experimental methodology and analyzing the resulting data.
Genealogy of D3 code
D3 has become the standard tool for developing interactive visualizations for the Web and programmers have created tens of thousands of visualizations using D3. However, many programmers often start by taking existing examples of D3 code and modifying them to solve their own problems. For example Mike Bostock’s blocks are a common starting point. Other collections include The Big List of D3.js Examples, and the D3.js Gallery. Many of these examples are modifications/extensions of earlier examples. The goal of this project is to apply code diffing and comparison techniques to identify the shared pieces of code amongst these examples and then to visualize the evolution of the code. You should try to identify the primary source examples and then trace out how these examples are modified to generate new examples by looking for shared code.
Visualizing the supreme court
Legal scholars here at Stanford engage in in-depth study of Supreme Court decisions and voting patterns. Analyses range from structural data (such as the citation graph among court cases), to tabular data (voting patterns across justices and cases), to the highly nuanced (cataloging features of the decision text). Much of this supreme court data is freely available online. Based on the needs of legal scholars, build a visual analysis tool for Supreme Court data. This will entail visualizations of many data types–including network patterns, tabular data, and text–and will likely involve associated data mining and transformation techniques. Design a coherent interface for legal scholars to access, visualize, and correlate these various data sources.
Visualizing rhyme structure in musical lyrics
The Wall Street Journal recently published an article visualizing the rhyme patterns in the lyrics of the songs from Hamilton. They have described their process for creating the visualization on this webpage and in a published article. The goal of this project is to apply this analysis process to other musical lyrics – can be from musicals, rap songs, etc. – and automate the analysis as much as possible.
When is the next rocket launch?
Rocket launch data is available in a number of different databases, but it is difficult to search for and filter through. However, it’s important to know what is launching when and where to schedule other flights. Help create a visual explainer that displays useful information about current and past rocket launches (i.e. trajectory, launch time, weather, etc.) It may be helpful to use techniques from the class (filtering, brushing & linking, etc.). Here is the NASA challenge you could also enter with this project.
Explore the Stanford Daily archives
The Stanford Daily Archives contains 18,931 issues comprising 143,685 pages and over a million articles. The archives document campus life and history are documented in these archives from 1892 to 2014, but it’s difficult to find trends or insights with the current system. Create an analysis tool for the Stanford Daily Archives. This could include allowing viewers to find trending words, topics, movements in the archives, or possibly campus views on different political movements or actions.
Visualizing TED Talk Popularity
TED is a nonprofit organization devoted to the ideas worth spreading. The TED talks cover a broad range of topics, from Technology, Education, Design to global issues. With the rich data types available in the dataset, which includes comments, dates, tags, views, and even transcripts, there is a lot of analysis you can delve deep in. For example, you can design an interactive interface for people to explore topics they may be interested in or uncover insights about what topics people are most interested in, what topics are overlooked, and how the pattern has shifted over time. Since there is a great range of textual input, it provides opportunities in exploring text mining and effective text visualization.
Data source: https://www.kaggle.com/rounakbanik/ted-talks
Why is my flight delayed? Investigating Flight Delays & Cancellations
Flight delays are estimated to have cost air travelers billions of dollars. FAA/Nextor estimated the annual costs of delays (direct cost to airlines and passengers, lost demand, and indirect costs) in 2017 to be $26.6 billion. With external dataset, you can try to uncover the correlation between flight delays and factors like weather. Using the geological information, you can also identify and visualize the geological pattern of the flight delays. Investigate the common causes or potential pattern of the delays and present the insights you find with visualization.
Data source: https://www.kaggle.com/usdot/flight-delays#flights.csv