Final Project Topic Ideas

Semantic data types for visualization generation

We have seen how basic type information for data variables, such as nominal, ordinal, and quantitative, provide valuable guidance for automating visualization design. Richer semantic types may further support visualization design as well as facilitate data transformation and integration. For example, knowing that a quantitative variable is a particular type of currency allows software to generate more appropriate labels and perform exchange rate lookups when visualizing multiple currencies at once. Date information might further allow proper historical adjustments. Incorporate semantic typing into a visualization generation tool and explore the impacts of various types of semantics (e.g., geography, currency, person, measurement units, etc) in production rules for visualization creation. You might also explore interface mechanisms for tagging data variables with type information and algorithms to automatically infer semantic types from an input data set.

Cartograms

Cartograms are maps that scale the area of a region to reflect some other data (e.g. population). As noted by the cartogram central website, cartographers have developed many different types of cartograms. We are beginning to see algorithms capable of generating some types of cartograms. Many of the algorithms use optimization techniques to design a cartogram that maintains a particular set of constraints. One project in this area is to develop a new algorithm for creating cartograms. The project could focus on identifying a particular set of constraints that are important for creating a particular type of cartogram and then implementing the constraints using standard optimization techniques. For example, you might develop an algorithm for producing Dorling cartograms.

Graphical perception

We have seen a number of papers in the class that conduct graphical perception experiments to evaluate how quickly and accurately people can decode various types of graphs and charts (e.g. Cleveland and McGill’s experiments comparing bar charts to pie charts). However there are many types of graphs and charts for which such experiments have not yet been performed. For this project you could run a graphical perception experiment for a kind of chart that hasn’t been investigated yet. The challenges are to identify which aspects o perception to test, designing the experimental methodology and analyzing the resulting data.

Perceptual metric for evaluating effectiveness of graphs and charts

We have read and discussed a number of papers on perception of graphs and charts (Cleveland’s The Elements of Graphing Data describes summarizes the most comprehensive studies on this topic.) Use as much of the data space as possible to depict data, and clearly show scale breaks are two well known principles for improving perceptual effectiveness. The goal of this project is to develop a quantitative metric for the effectiveness of a given graph or chart. Given a graph or chart (either as an XML specification, or if you want a bigger challenge, a bitmap) compute how well it conforms to the perceptual principles outlined by Cleveland and others. Other approaches might devise perceptual metrics building on prior work on computing grouping patterns or visual clutter. One application of such metrics may be visualizing the metric values within the original visualization, thereby providing an interactive assessment tool for visualization designers.

Automated table design

Both Stephen Few and Edward Tufte have described a number of principles for designing more effective tables. Yet, the default table designs in Excel and Latex do not do a good job of highlighting the important information in the table. Develop a system for automatically designing more effective tables of numerical data. You may assume that metadata about the rows and columns is part of your input. For example the data type (N, O, Q), whether or not the data represents a date, financial data, etc. may be assumed as part of the input. The challenge is to operationalize the principles in Few and Tufte to automatically generate more effective tables.

Genealogy of D3 code

D3 has become the standard tool for developing interactive visualizations for the Web and programmers have created tens of thousands of visualizations using D3. However, many programmers often start by taking existing examples of D3 code and modifying them to solve their own problems. For example Mike Bostock’s blocks are a common starting point. Other collections include The Big List of D3.js Examples, and the D3.js Gallery. Many of these examples are modifications/extensions of earlier examples. The goal of this project is to apply code diffing and comparison techniques to identify the shared pieces of code amongst these examples and then to visualize the evolution of the code. You should try to identify the primary source examples and then trace out how these examples are modified to generate new examples by looking for shared code.

Visualize developmental learning data (from Psychology Prof. Mike Frank)

How do children learn to talk? Does this process operate the same way if you’re learning English, Mandarin, or American Sign Language? The Language and Cognition lab in the Psychology Department curates several datasets on children’s language development that can help us find answers to these kinds of questions. Wordbank is a database with more than 60,000 questionnaires about children’s vocabulary across languages and cultures, and childes-db.stanford.edu is a site archiving data about the way parents and children talk to each other as children develop. We have already built some R Shiny-based visualizations of these datasets but there are many more possibilities available. If you’re interested in psychology and would like a visualization project that makes real-world research data more accessible and interactive, consider using these data for your project!

Visualizing the supreme court

Legal scholars here at Stanford engage in in-depth study of Supreme Court decisions and voting patterns. Analyses range from structural data (such as the citation graph among court cases), to tabular data (voting patterns across justices and cases), to the highly nuanced (cataloging features of the decision text). Much of this supreme court data is freely available online. Based on the needs of legal scholars, build a visual analysis tool for Supreme Court data. This will entail visualizations of many data types–including network patterns, tabular data, and text–and will likely involve associated data mining and transformation techniques. Design a coherent interface for legal scholars to access, visualize, and correlate these various data sources.