Assignment 2: Exploratory Data Analysis
Due: Monday Oct 15, 2018 by 4:30pm (before class)
A wide variety of digital tools have been designed to help users visually explore data sets and confirm or disconfirm hypotheses about the data. The task in this assignment is to use an existing software tool (Tableau) to formulate and answer a series of specific questions about a data set of your choice. After answering the questions you should create a final visualization that is designed to present the answer to your question to others. You should maintain a notebook that documents all the questions you asked and the steps you performed from start to finish. The goal of this assignment is not to develop a new visualization tool, but to understand better the process of exploring data using an off-the-shelf visualization tool. Documenting the data analysis process you went through is the main pedagogical goal of the assignment and more important than the design of the final visualization.
Here is one way to start.
Step 1. Pick a domain that you are interested in.
Some good possibilities might be the physical properties of chemical elements, the types of stars, or the human genome. Feel free to use an example from your own research, but do not pick an example that you already have created visualizations for.
Step 2. Pose an initial question that you would like to answer.
For example: Is there a relationship between melting point and atomic number? Are the brightness and color of stars correlated? Are there different patterns of nucleotides in different regions in human DNA?
Step 3. Assess the fitness of the data for answering your question.
Inspect the data – it is invariably helpful to first look at the raw values. Does the data seem appropriate for answering your question? If not, you may need to start the process over. If so, does the data need to be reformatted or cleaned prior to analysis? Perform any steps necessary to get the data into shape prior to visual analysis.
You will need to iterate through these steps a few times. It may be challenging to find interesting questions and a dataset that has the information that you need to answer those questions. You may need to try several datasets.
Exploratory Analysis Process
After you have an initial question and a dataset, construct a visualization that provides an answer to your question. As you construct the visualization you will find that your question evolves - often it will become more specific. Keep track of his evolution and the other questions that occur to you along the way. Once you have answered all the questions to your satisfaction, think of a way to present the data and the answers as clearly as possible. In this assignment, you should use an existing visualization software tool (Tableau). You may find it beneficial to use more than one tool.
Before starting, write down the initial question clearly. And, as you go, maintain a notebook (e.g. a Google or Word document) of what you had to do to construct the visualizations and how the questions evolved. Include in the notebook where you got the data, and documentation about the format of the dataset. Describe any transformations or rearrangements of the dataset that you needed to perform; in particular, describe how you got the data into the format needed by the visualization system. Keep copies of any intermediate visualizations that helped you refine your question. After you have constructed the final visualization for presenting your answer, write a caption and a paragraph describing the visualization, and how it answers the question you posed. Think of the figure, the caption and the text as material you might include in a research paper.
Your assignment must be posted to Canvas before class on Oct 15, 2018.
You should look for data sets online in convenient formats such as Excel or a CSV file. The web contains a lot of raw data. In some cases you will need to convert the data to a format you can use. Format conversion is a big part of visualization research so it is worth learning techniques for doing such conversions. Although it is best to find a data set you are especially interested in, here are pointers to a few datasets:
To create the visualizations, we will be using Tableau, a commercial visualization tool that supports many different ways to interact with the data. Tableau has offers free student licenses so that you can install the software on your own computer. One goal of this assignment is for you to learn to use and evaluate the effectiveness of Tableau. Please talk to me if you think it won’t be possible for you to use the tool. In addition to Tableau, you are free to also use other visualization tools as you see fit.
Each submission will be graded based on both the analysis process and included visualizations. Here are our grading criteria:
- Exploration Thoroughness (6): Sufficient breadth of analysis, exploring questions in sufficient depth (with appropriate follow-up questions). Appropriate data quality assessment and transformation.
- Documentation (6): Clear documentation of exploratory process, including justification for pivots in approach and intermediate visualizations.
- Final Visualization (3): Clearly designed final visualization communicating final insights with understandable captions and annotations.
This is an individual assignment. You may not work in groups. Your completed assignment is due on Mon Oct 15 before class.
To submit your assignment, prepare a PDF containing your notebook and your final visualization and description with the filename: