In this assignment, we’ll be working on building interactive visualizations for multidimensional datasets. Our main tool for this will be to create scatterplot matrices (also sometimes abbreviated as “SPLOMs”).

Your main focus will be on generalizing the ideas of a scatterplot from the previous assignment, so that we can encapsulate it for repeated usage. In addition, you will get a first taste of linked visualizations that enable interaction through brushing.

The code that you write should be generic and make no assumptions about the dataset other than the format it is provided in. Thus, in addition to the Calvin College 2004 seniors, I’ve also provided a couple of additional datasets for you to experiment with:

  • The well known Iris Dataset
  • The Palmer Penguins Dataset, which is also 4 quantitative dimensions like Iris, but includes additional nominal dimensions
  • A much more complex dataset know as Abalone. This dataset has many more data points and each point has 8 quantitative dimensions rather than 4.

Your visualization should support visualizing both the Calvin College and Iris datasets with minimal changes, the additional two are included for experimentation purposes only.

Please click here to create your repository

Objectives

This assignment is designed to get comfortable with more sophisticated plotting techniques as well as introduce you to brushing and linking in d3. Specific objectives include:

  • Practicing the design of multiple views visualizations
  • Defining generic implementations in d3 that can be used repeatedly for each view through .each()
  • Learning how to define interactions in d3, particularly through implementing click events for selection and d3.brush’s for brushing and linking.
  • Experimenting with d3 brush callbacks
  • Practicing the use of d3 filters for capturing selected data objects

Introduction

In this assignment, you will create a scatterplot matrix viewer that supports brushing in each plot to select and highlight elements. Many of the ideas of this assignment are modeled off of this d3 example: Brushable Scatterplot Matrix. You may also find Becker and Cleveland’s 1987 paper Brushing Scatterplots to be of interest.

While I recommend reviewing this example and its source code, your implementation will be decidedly different even if it takes into account a few of the tricks used in this example. Particularly, that example uses a different data format, and it only supports one brush. Your visualization should work for dataset provided in the format I provide, and it will support a brush in each plot. Combined, your brush interaction should produce a selected set that is the intersection of all brushes.

All code for your implementation should be in a file a02.js, which is included at the end of a file index.html. Like in your last assignment, index.html should (minimally) include d3.js and the datasets in the <head> tag, and it should define a single <div> for which you will create a d3 selection.

As in A01, you will also insert the answers to the three written questions in index.html.

Part 1: Data access and layout

I recommend you develop your SPLOM in phases. Phase 1 will use d3 selections to build a collection of groups associated with the grid of plots for the matrix.

The Iris data has been provided in the file iris.js that defines the variable iris, and the Calvin College data has been provided in the file scores.js which defines the variables scores. The format for both datasets is similar – each are an array of Javascript objects with a collection of fields that are data attributes.

In your implementation, to be generic I would create a new variable called data in a02.js and set it equal to whichever dataset you are experimenting with.

Your first task is to extract the list of attributes from this data. To do so, you will need to access the defined keys that have numeric types. I did this using a single line that produced an array of attributes names by examining the first data element in the array:

let attribs = Object.keys(data[0]).filter(d => typeof data[0][d] === "number");

Once your have a list of attributes, your first task is to create a rectangular grid of locations for each of the scatterplots. For each plot, you will create an svg group <g> tag that you have appropriately positioned using SVG transforms. If you’re following along in the Brushable Scatterplot Matrix example, there is some clever shorthand for this that you can use based on the d3.cross() function to produce an \(n\times n\) grid for \(n\) attributes.

Notably, you’ll also want to set up some variables for the width and height of your svg canvas, and then use these to define how big your scatterplots will be based on how many attributes you are working with. These can be used to define the size of each scatterplot. Since we’ll keep everything square, size will serve for both the width and height of any individual scatterplot.

If done right, you’ll create a d3 selection of <g> tags, where there is one <g> tag for every pair of possible data attributes. This is an instance where you will do a data join on something that is indirectly the “dataset”. To verify these that these groups are working, I would use the .text() to make sure that you’re clear on which attribute pair you’ve assigned to which SVG group, and you can test this by drawing simple shapes in each group (e.g. a rectangle of a different color).

Part 2: Creating the scatterplots

Your next task will be to code a single function, makeScatterplot(), that will be used to create the scatterplot shown in each cell of the matrix. This function should be generic, in that it will produce a scatterplot for any pair of attributes, depending on which column/row in the scatterplot matrix you are working with.

The idea here is we will call makeScatterplot() once for each group, utilizing the joined data of the group (i.e. which pair of attributes) to dictate which pair of attributes are plotted. This will be accomplished using d3’s .each() method, which accepts a function to call once per every element in a selection. Moreover, it can accept anonymous functions so that one can pass the associated pair of attributes into makeScatterplot()

Specifically, makeScatterplot() should accept a selection (to insert the plot into) as well as a pair of attributes. Next, to create the scatterplot, it requires another d3 data join, this time to create the objects we’ll draw once per data point. We will create circle elements for each data point. You will then set the cx and cy for each circle based on associate attribute pair passed into the function. Other visual properties of the circles are for you to decide. You can set the radius as you wish, and you might want to consider setting the fill color to be based on other properties (for example, the Iris dataset has a species attribute that takes on three values: “setosa”, “versicolor”, and “virginica”).

For this to work well, there are two concepts you need to generalize: First, for any item d in the dataset, you need to use accessors to access the appropriate attribute. By this I mean, you need to be able to pass in sufficient information to makeScatterplot() so that when you get to this stage in the function:

let circles = selection.selectAll("circle")
  .data(data)
  .enter()
  .append("circle")
  .attr("cx", function(d) { /* fill me in */ } )

You know how to go from d to the appropriate attribute of d for the give plot you are drawing.

Second, you also need to know how to convert from the appropriate attribute of d to the specific visual space you are assigning to the plot. To do this, you will need to make use of scales. I recommend creating one scale for each attribute defined in the attribs array. You should do this by taking the min/max of data values for each attribute (d3.extent() is particularly helpful here) and then set them to the range of visual space associated with the plot itself (luckily, this is fixed as the size of the plot). SVG transforms will save you here, as once you’ve transformed the group for which the plot is in, you can work in the coordinate space where \((0,0)\) is the top left corner of the group rather than the canvas. This means you can define just two scales (a horizontal and vertical one) for each attribute, rather than redefining scales in each call to makeScatterplot.

Part 3: Annotations

Once you have a basic visualization up, you’ll realize the plot is quit difficult to read because you will be missing axes for each of the rows/columns. You must correct for this by adding d3.axis objects in appropriate places. Luckily, you can rely on the d3 scales you created for positioning elements in makeScatterplot() to do this.

There are many ways to do this, and you may want to experiment to see what looks best and makes the most sense to you. Minimally, the way I did this was by organizing my svg canvas into three top-level groups. The first group was the SPLOM itself (which then contains the matrix of groups for the plots). I then next added a group on the left size for the left axes of each row and a group for the bottom for each column axis.

Within each of these two new groups, I did data join on the list of columns, and created groups that I transformed to line up with the rows/columns. These groups were then used to place each of the d3.axis objects, that I created using another round of .each() functions.

Feel free to experiment with additional marks such as grid lines. Your final visualization must include, somewhere, an indication of what position is associated with what data value and what data attribute pair is being shown in each matrix.

Part 4: Brushing and linking

Finally, your code should support a selection mechanism that allows the user to brush in one plot, select a set of data points, and then brush in additional plots to refine this selection.

Specifically, when the user drags the mouse on any scatterplot, a rectangular brush is drawn on that scatterplot, indicating the region of interest. All the points with attributes inside the brushed region are considered selected. To link the visualization, selected points should be drawn in a unique stroke color while unselected points should be drawn with no stroke. This change in visual encoding should be reflected in all plots. I recommend using a stroke color that is more saturated to help the selection visually pop out.

Next, if the user brushes on a second scatterplot, the selected points must satisfying the intersection of the selections. This means you should only highlight points that are within both selections. If no brush is active, all points should be drawn in their original style (e.g. if you color by species). When more than two brushes are active, the selected points should be contained within all brushes.

To accomplish this, you will implement one common onBrush() function. When makeScatterplot() is called, it will create a d3.brush object assigned with the group and sized appropriately. This brush will be linked to a function updateBrush() that captures and updates the selected region for the brush. I have provided code that configures the brush in makeScatterplot() and sets up the approach callbacks for the .on("brush",...) and .on("end",...) events. The callback updateBrush should store the event.selection object for the brush in a global variable. These objects store the range for each brush in pixel coordinates.

Using the stored selection ranges, onBrush() should create a filter function, isSelected(), that will return true if a data element is contained within the active selection regions of all brushes. This will allow you to use the d3’s selection filters. With it, onBrush() can be implemented so that it selects all circles (from all plots) and applies this filter to identify the data elements that are selected. Selected elements should be have their borders set to a highlight stroke color, while not selected elements should be reverted to the visual appearance based on their species.

To help design the callback function, check out d3’s Handling Events documentation. Essentially, the callback function will be a function that accepts two parameters, event and d, which are the d3 event object and the current datum. (Note: older version of d3 had a slightly different signature, so when you’re looking for examples online keep this in mind. See https://observablehq.com/@d3/d3v6-migration-guide#events for more details. We’re on D3 v7.0 right now, but the major changes to event handling happened in v6.0.)

Written Questions (worth 25/90 points)

Each written question should be answered with a brief paragraph (approximately 100 words or less) and appear below your SPLOM in index.html. I will not read extra material if I deem the answer too long.

  1. Is adjusting the stroke color an effective choice for employing popout to highlight the selected points? If so, explain why and if there are any requirements for it to work well. If not, explain why not and what you might choose to do instead to improve it.

  2. Explain the differences between data attributes that are nominal, ordinal, and quantitative. For the datasets provided in this assignment, how would you classify each of the attributes?

  3. Was brushing and linking with a rectangular brush a satisfying interaction in your experimentation? Briefly state one strength and one limitation of it.

Submission

You should use git to submit all source code files. The expectation is that your code will be graded by cloning your repo and then executing it within a modern browser (Chrome, Firefox, etc.)

Please provide a README.md file that provides a text description of how to run your program and any parameters that you used. Also document any idiosyncrasies, behaviors, or bugs of note that you want us to be aware of.

To summarize, my expectation is that your repo will contain:

  1. A README.md file
  2. A index.html file
  3. An a02.js file
  4. All other Javascript files necessary to run the code (including iris.js, scores.js, and d3.js plus any others you require)
  5. Any .css files containing style information

Grading

Deductions

Reason Value
Bugs or syntax errors Up to -10 each bug at grader's discretion to fix


Point Breakdown of Features

Requirement Value
External documentation (README.md) following the template provided in the base repository 5
Consistent modular coding style, indentation, etc. 5
Header documentation and internal documentation (Block for functions and Inline descriptive comments). Wherever applicable / for all files 5
Expected output / behavior based on the assignment specification, including

Correctly implementing the SPLOM grid layout 10
Correct implementation of makeScatterplot() 20
Properly annotating axes and labels 10
Implementing brushing and linking interaction 10

50
Written Questions

Written Question 15
Written Question 210
Written Question 310

25
Total 90


Cumulative Relationship to Final Grade

Worth 9% of your final grade

Extra Credit

Implementing features above and beyond the specification may result in extra credit, please document these in your README.md. Notably, you may want to consider additional visual elements to improve the utility of your SPLOM as well as additional interactions, particularly to support the visualization of the two additional datasets included with the assignment repository.