Preliminaries

In this assignment, you will progressively build up a tool for interactively visualizing multiple columns of tabular data using scatterplots. This will culminate in producing an interactive scatterplot matrix (SPLOM) in openFrameworks. You may find inspiration from a variety of online examples, including:

Scatterplots and SPLOMs are also discussed by Munzner in various sections (7.3, 7.5.2, and 15.3). Some very early versions of SPLOMs show up as pairwise plots from J.A. Hartigan’s Printer graphics for clustering.

For this assignment you will submit three directories that demonstrate the various parts of your assignment, named A03P01, A03P02, and A03P03. In each, I will expect the following structure:

A03P0X/
  src/*
  bin/data/*
  CMakeLists.txt
  report.txt

Please do not commit any additional files unless requested. Your CMakeLists.txt should use this template and be edited so that any .cpp files that you use are included, one per line, in the ${APP_NAME}_SOURCE_FILES portion. You are, of course, welcome to build with any other compiler (e.g. XCode, Visual Studio), but on my platform I will use CMake.

Part 1: A Static Scatter Plot

In the first part of this assignment, you will implement a basic Scatterplot class to plot bivariate (two attributes) of quantitative data. This will requite two key pieces:

  1. A class for reading / parsing data, and

  2. A class for drawing the scatterplot.

First, implement a class TableReader. This class should parse a tab-separated list of data and assume that the first row of data is a tab-separated list of column names. You may assume that all data values are floats. Four example files, all converted from the UC Irvine Machine Learning Repository, are available for you to experiment with:

Note that the column names in the first row may have spaces in them. You need to parse them based on tabs. You’re welcome to download (and convert) any other dataset you would like to experiment with. Other data repositories include http://davis.wpi.edu/xmdv/datasets.html and http://vincentarelbundock.github.io/Rdatasets/datasets.html.

Your TableReader should open, read completely, and close the file, storing all of the data in memory. It should then support the following queries:

class TableReader {
  void read_data(std::string& filename);

  std::vector<float> get_column(int which_column);
  std::string get_column_name(int which_column);
};

How you store the underlying data within the class is entirely up to you. You are encouraged to create any additional helper functions of your choosing. I am not opposed to your changing the typing (e.g. used const char * instead of std::string).

Next, implement a class Scatterplot. This basic scatterplot will accept (1) two columns of data, (2) a width and a height, (3) an x and a y position, and (4) any other metadata of your choosing (such as the column names). The basic class signature should thus be:

class Scatterplot {
  void setup(...);
  void draw();

  vector<float> x_data;
  vector<float> y_data;
  int x_pos, y_pos;
  int width, height;
};

Feel free to deviate from this basic structure. The key idea is that you will put all functionality to draw the scatterplot within Scatterplot::draw(). Your ofApp class will use the TableReader to load the dataset by calling TableReader::read_data() in ofApp::setup() and initialize the Scatterplot with Scatterplot::setup(). Thus, ofApp::draw() should simply call Scatterplot::draw() in addition to setting up any other global information you would like (e.g. changing the background color for your app, handling any messages you want to show the user, etc.).

Your basic Scatterplot drawing routine should handle x- and y-data of any range. This data will be mapped to a coordinate space that spans [0,width] and [0,height], respectively. (x_pos,y_pos) refers to the top left corner of your scatterplot. To handle this gracefully, a number of OF utility functions may be of use, including

  • ofMap for rescaling the data values to coordinates
  • ofTranslate for correctly positioning the scatterplot relative to (x,y).
  • Both ofPushMatrix and ofPopMatrix for quickly transitioning between camera scopes, and
  • The section on Moving the world from the OF book may also be inspirational.

You should think of the Scatterplot as a rectangular canvas on which you will draw the scatterplot. Scatterplot::draw() should draw this rectangle, in a color different from the background, and then draw a collection of discs (using calls to ofDrawCircle()) whose centers correspond to each pair in (x_data[i],y_data[i]).

After getting the basic positional version of Scatterplot up, consider augmenting it to include:

  • Labels for both the x- and y-axis based on the column name.
  • Axis lines and tick marks that indicate the range of the data
  • Grid lines

You are free to choose to do this in whatever way you like, but please describe in this section’s report what variations you tried and justify why you chose your final design. Unjustified (major) design choices will be penalized. You may have to adjust the canvas size beyond width*height to accommodate these extra elements. Your justification should (briefly) discuss design choices wrt: color, size, and shape as well as placement and density of text and grids/axes.

In addition, A03P01/report.txt should also include any information necessary for running your program. Use your scatterplot to experiment with viewing:

  1. Petal length vs. Petal width in the Iris data
  2. MPG vs. Horsepower in the Auto-mpg data
  3. Diameter vs Rings in the Abalone data

In your report, describe any conclusions or relationships you identified for each of the three pairs of data. Feel free to include screenshots showing your plots (you can name these A01ReportImg01.png, A01ReportImg02.png, etc.), or anything else you believe necessary to make your point.

Part 2: Interactive Scatterplots

We will next use the Scatterplot class to encapsulate some basic interactions. To make your Scatterplot interactive through the mouse, you must add the following functions to its declaration:

class Scatterplot {
  ...

  void mouseMoved(ofMouseEventArgs & args);
  void mouseDragged(ofMouseEventArgs & args);
  void mousePressed(ofMouseEventArgs & args);
  void mouseReleased(ofMouseEventArgs & args);
  void mouseScrolled(ofMouseEventArgs & args);
  void mouseEntered(ofMouseEventArgs & args);
  void mouseExited(ofMouseEventArgs & args);
};

Even if they have a blank implementation, all of these functions must be both declared and defined. Having such functions will allow you to code specific mouse callbacks that only the Scatterplot class responds to. To connect this class to the event loop, in Scatterplot::setup() call ofRegisterMouseEvents(this);. Nothing else will need to be updated in ofApp. For more information on OF’s event processing, please see http://openframeworks.cc/documentation/events/ as well as the code for examples/events/simpleEventsExamples in the examples including with OF.

Our goal is to add the following interactions to the Scatterplot

  1. A mouse rollover that displays the precise x- and y-value of the data when the mouse cursor is on place on top of a data point (using Scatterplot::mouseMoved().

  2. A zooming function based on the user brushing a selected region. This will use a triad of functions: Scatterplot::mousePressed(), Scatterplot::mouseDragged(), and Scatterplot::mouseReleased(). The idea is that the user will click to initiate drawing a selection box and drag the cursor, while holding the mouse button down, to encompass a new selected region. The range of the x- and y-axis should then be updated and only those data points within that range should be drawn. This D3 Zoomable Scatterplot achieves the same effect, but using the mouse wheel.

  3. Right clicking anywhere on the Scatterplot should reset the view to the original extents of the data, removing the zoom.

The documentation for ofMouseEventArgs is relevant here. Furthermore, to code these, it will be useful to implement a function Scatterplot::onCanvas() that accepts a pair of integers for the current click position. This should check and return a boolean that indicates whether or not the click is within the bounds of the canvas (ofInRange is a handy function for this). Such functionality will ensure that the Scatterplot only responds to mouse events within its extent.

In A03P02/report.txt describe any extra design choices you implemented to make these interface tools. In addition, answer the following analysis questions:

  1. For the Auto-mpg data, what is the minimum and maximum weight for each of the five different cylinder options.

  2. For the Abalone data, length and height tend to follow a linear trend – use your zooming functionality to estimate the slope. Additionally, there are outliers. What values do they have?

Again, feel free to include screenshots to reinforce your answers.

Part 3: Scatterplot Matrices

Finally, you will develop an application for interactive scatterplot matrices. Given a table of data, your goal is to draw multiple scatterplots, appropriately sized, for all possible pairs of attributes. You should able to reuse your Scatterplot class in this regards, appropriately setting the x- and y-position as well as the width/height.

When drawing multiple scatterplots, it will become increasingly important to ensure that your mouse interactions are cognizant of the extent of the scatterplot. In particular, the function Scatterplot::onCanvas() should be designed in such a way that multiple instances of Scatterplot do not conflict.

As a new interaction, we will augment the Scatterplot so that it can handle a linked selection through brushing. We will remove the zooming feature and replace the brushed metaphor so that brushing in one scatterplot will create a selection of some subset of the data values. This selection should be drawn using a different visual encoding (e.g. changing the color). The selection should also be propagated to all other scatterplots.

To do this, you will have to register events that occur globally. Your ofApp will handle coordinating these events. examples/events/simpleEventsExamples has a great example of this. The mechanism requires three pieces:

  1. Your Scatterplot class will create an instance of ofEvent<T> that will be used to pass data of type T to the other instances of Scatterplot. This event can be shared, so it will be declared as a static member of Scatterplot. You will choose your type T accordingly to define the data that you want to pass to store the selected set of data points.

  2. On demand, ofNotifyEvent() will be called in your Scatterplot’s mouse functions to propagate the selection back to ofApp.

  3. In ofApp::setup(), a listener callback for this event will be registered with ofAddListener(). Thus each time a Scatterplot calls ofNotifyEvent(), it will pass data to the listener callback and then process it accordingly.

You will then modify your ofApp so that it dynamically creates the appropriate number of scatterplots to form a scatterplot matrix for a given dataset. Your application should at least support the example datasets provided. You should keep your application window size fixed – after loading the data you will then need to determine the maximum possible width and height for the scatterplots. You will then create an instance of Scatterplot for all possible pairs of attributes, and register the appropriate listeners for the brushed selection interaction.

It is OK to leave the cells on the SPLOM diagonal empty, or use them to show the attribute labels. There are other choices here as well, including drawing a histogram of possible values in that attribute.

After completing your SPLOM layout, investigate all four datasets. In A03P03/report.txt, describe interesting relationships that you found between pairs of attributes for all four examples. Use the linked selection to help identify relationships that span more than two variables. Again, feel free to include screenshots where appropriate.

Also briefly describe your experiences with interacting with scatterplots. What worked well? What other interactions do you think might be useful to implement?

Grading

My expectation is that you will submit three new folders in your git repository. Each folder should contain everything necessary to compile and run your program as described above. Each folder should contain a report file.

Each part of the assignment is weighed in the following way:

  • 40% Completing Part 1
  • 30% Completing Part 2
  • 30% Completing Part 3

For each part, a correct implementation is worth half of the points. A satisfactory report is worth the other half.

This percentage will be scaled to the total value of this assignment for your final grade (7%). For each of the coding parts, I will specifically check that you’ve read in the data correctly and processed it as described above. I will also check for coding style, commenting, and organization where appropriate.

Extra credit will be awarded for implementing features that significantly go beyond the requirements.