Assignment 04

Preliminaries
Part 1: Basic Parallel Coordinates
Part 2: Advanced Features and Data Exploration
Grading

Preliminaries

The exploration and visualization of multidimensional datasets is a fundamental challenge in visualization. In this assignment you will implement a widely used visual representation: parallel coordinates. The goal of the assignment is twofold. First, you will gain experience designing and implementing a variety of interaction mechanisms to support exploration of multidimensional data with parallel coordinates. Second, you will evaluate the effectiveness of parallel coordinates on a variety of multidimensional data sets. This assignment also promotes sketching out some ideas on paper before coding your visualizations and interactions.

To start, you need to understand what parallel coordinates are and how they can be used. They are briefly discussed in section 7.6.2 of Munzner. In addition, it is highly recommended that you read some of the relevant literature in visualization, including:

Multidimensional detective. Inselberg, Alfred. IEEE Information Visualization, 1997.
Hierarchical Parallel Coordinates for Exploration of Large Datasets. Ying-Huey Fua, Matthew O. Ward, and Elke A. Rundensteiner. IEEE Visualization ‘99.

Parallel coordinates have a rich history of use and occasional misuse! Before you begin, experiment with the implementations by Michael Bostock in ProtoVis and two examples by Kai Chang:

Be sure to try out all of the different interface features that these tools provide. Note that you can filter along an axis and rearrange axes. Think about what you like in the designs and what you would want to do differently. Also consider scalability issues as the number of data items grows.

There are many different ways to encode, layout, and interact with data in a parallel coordinates plot–these examples are just one way to do this. Please keep in mind that there is no “right” answer for your design!

For this assignment you will submit two directories that demonstrate the various parts of your assignment, named A04P01 and A04P02. In each, I will expect the following structure:

A04P0X/
  src/*
  bin/data/*
  CMakeLists.txt
  report.YYY

Please do not commit any additional files unless requested. Your CMakeLists.txt should use this template and be edited so that any .cpp files that you use are included, one per line, in the ${APP_NAME}_SOURCE_FILES portion. Also be sure to replace the PROJNAME with an appropriate name (e.g. A04P01 for Part 1).

Reports need not be a .txt file, but should be something reasonably easy for me to read (e.g. a .pdf or .docx file). Feel free to use whatever format (LaTeX, word, etc.) is most convenient to express your ideas.

Part 1: Basic Parallel Coordinates

To start, download and investigate the following datasets:

Cars - contains information about 406 different cars from the 1983 ASA Data Exposition dataset, originally from http://davis.wpi.edu/xmdv/datasets/cars.html. This dataset is not identical to the Auto-MPG dataset used in the last assignment.
Cameras - contains information about 1000+ digital cameras and 13 different attributes, Original source is the ScatterDice project
Nutrients - contains information about various foods and their nutritional content. Originally from USDA, but modified from the file available at http://bl.ocks.org/syntagmatic/3150059

While similar to the .tsv files we worked with in the last assignment, the major difference is that these files can also contain data columns that are strings and integers, so not just floating point data. You will need to modify your TableReader (or equivalent) from the last assignment to handle such data. The second difference is that these files may be missing certain values. These will be found by searching for places where two (or more) tabs are next to each other. You must modify your parser to handle these. You also have the option to manually edit the files to make your parsing easier. For example, you could include a second row in the file that specifies the data type. Be sure to submit any modified data files. In your report for this part, briefly describe how you handled data cleaning.

After getting comfortable with the data, sketch out the concept of your basic visualization and encoding mechanisms. Make sure to include things like labels and titles in your sketch. Think especially about what you might do to deal with the numerous overlapping lines that are common to parallel coordinates. Summarize your plan in your report, perhaps including scans of drawings that show what you would like to develop in openFrameworks. Be sure to describe which visual encodings you plan to use and why.

Next, create an openFrameworks project with your initial concept. Write a basic renderer which loads the data and displays it as needed. Be sure to label everything you can: titles, tick marks, ranges, etc. Comment in your report about what changed from the sketch of your initial concept.

Finally, work on interactivity (without interaction, parallel coordinates can be a challenging visualization to understand). You must support the following basic tasks:

Filtering the data across multiple attributes,
Reordering the axes, and
Inverting the axes.

Consider sketching out the concepts of these interactions before coding. Think about how you want to interact with the axes and the data. How would you design these interactions to make them effective? In your report, you must both explain all interactions that you implemented AND justify your decisions for which interaction mechanisms you chose. While addons like ofxGui might be helpful in your design, its use is not required.

At this stage, your implementation must be able to support both the Cars2 and Cameras dataset, either using a file open dialog (e.g. ofSystemLoadDialog) or a button toggle. Make sure you document how to use your tool in your report.

Part 2: Advanced Features and Data Exploration

At this stage you will improve your system further. You will implement either:

A multiple-view visualization that connects your parallel coordinates view to your scatteplot view, for example showing a scatterplot of two select axes from the parallel coordinates view. Filtering the data in one view should highlight the data in the other, in both directions (e.g. filtering axes in the parallel coordinates should select points in the scatterplot, and a brushed selection in the scatterplot should select certain items in the parallel coordinates).
A simple clustering scheme on the data (i.e. k-means clustering where the user specifies k). You could also consider implementing Fua et al.’s hierarchical parallel coordinates for a more sophisticated approach. Clustering should take advantage of some variant of visual encoding to convey the different clusters.

Other improvements are certainly allowed and any extra features will be considered for extra credit. These might include: scented widgets, interactive legends, and additional searching/aggregation/filtering mechanisms. In your report, make sure you document any additional features that you added and how they were helpful for data analysis.

Use your finished tool to investigate at least two different datasets and report any conclusions you were able to draw from the tool. You can use any two of the three datasets provided, as well as any data you find and convert from the web. Many other data sets can be obtained can be found on the XmdvTool, R data sets, and UCI Machine Learning repository pages:

Your report should also discuss which interaction features your found to be the most fruitful towards data exploration. Include screenshots from your tool that illustrate these conclusions. Finally, critique the utility of parallel coordinates – did you find your implementation easy or hard to use? For the features you identified, could you have found them using other means or without interactivity?

Grading

My expectation is that you will submit two new folders in your git repository. Each folder should contain everything necessary to compile and run your program as described above. Each folder should contain a report document in a format of your choosing.

Each part of the assignment is weighed as follows:

60% Completing Part 1
40% Completing Part 2

For each part, a correct implementation is worth half of the points. A satisfactory report is worth the other half.

This percentage will be scaled to the total value of this assignment for your final grade (8%). For each of the coding parts, I will specifically check that you’ve read in the data correctly and processed it as described above. I will also check for coding style, commenting, and organization where appropriate.

Extra credit will be awarded for implementing features that significantly go beyond the requirements.