- Part 1: Data access and layout
- Part 2: Creating the scatterplots
- Part 3: Annotations
- Part 4: Brushing and linking
- Written Questions (worth 15/80 points)
- Extra Credit
In this assignment, we’ll be working on building interactive visualizations for multidimensional datasets. Our main tool for this will be to create scatterplot matrices (also sometimes abbreviated as “SPLOMs”).
Your main focus will be on generalizing the ideas of a scatterplot from the previous assignment, so that we can encapsulate it for repeated usage. In addition, you will get a first taste of linked visualizations that enable interaction through brushing.
The code that you write should be generic and make no assumptions about the dataset other than the format it is provided in. Thus, in addition to the Calvin College 2004 seniors, I’ve also provided a second dataset, the well known Iris Dataset for you to experiment with. Your visualization should support visualizing both with minimal changes.
This assignment is designed to get comfortable with more sophisticated plotting techniques as well as introduce you to brushing and linking in d3. Specific objectives include:
- Practicing the design of multiple views visualizations
- Defining generic implementations in d3 that can be used repeatedly for each view through
- Learning how to define interactions in d3, particularly through implementing click events for selection and
d3.brush’s for brushing and linking.
- Experimenting with d3 brush callbacks
- Practicing the use of d3 filters for capturing selected data objects
In this assignment, you will create a scatterplot matrix viewer that supports brushing in each plot to select and highlight elements. Many of the ideas of this assignment are modeled off of this d3 example: Brushable Scatterplot Matrix. You may also find Becker and Cleveland’s 1987 paper Brushing Scatterplots to be of interest.
While I recommend reviewing this example and its source code, your implementation will be decidedly different even if it takes into account a few of the tricks used in this example. Particularly, that example uses a different data format, and it only supports one brush. Your visualization should work for dataset provided in the format I provide, and it will support a brush in each plot. Combined, your brush interaction should produce a selected set that is the intersection of all brushes.
All code for your implementation should be in a file
a02.js, which is included at the end of a file
index.html. Like in your last assignment,
index.html should (minimally) include
d3.v5.js and the datasets in the
<head> tag, and it should define a single
<div> for which you will create a d3 selection.
As in A01, you will also insert the answers to the three written questions in
Part 1: Data access and layout
I recommend you develop your SPLOM in phases. Phase 1 will use d3 selections to build a collection of groups associated with the grid of plots for the matrix.
The Iris data has been provided in the file
iris.js that defines the variable
iris, and the Calvin College data has been provided in the file
scores.js which defines the variables
In your implementation, to be generic I would create a new variable called
a02.js and set it equal to whichever dataset you are experimenting with.
Your first task is to extract the list of attributes from this data. To do so, you will need to access the defined keys that have numeric types. I did this using a single line that produced an array of attributes names by examining the first data element in the array:
Once your have a list of attributes, your first task is to create a rectangular grid of locations for each of the scatterplots. For each plot, you will create an svg group
<g> tag that you have appropriately positioned using SVG transforms. If you’re following along in the Brushable Scatterplot Matrix example, there is some clever shorthand for this that you can use based on the
d3.cross() function to produce an \(n\times n\) grid for \(n\) attributes.
Notably, you’ll also want to set up some variables for the width and height of your svg canvas, and then use these to define how big your scatterplots will be based on how many attributes you are working with. These can be used to define the
size of each scatterplot. Since we’ll keep everything square,
size will serve for both the width and height of any individual scatterplot.
If done right, you’ll create a d3 selection of
<g> tags, where there is one
<g> tag for every pair of possible data attributes. This is an instance where you will do a data join on something that is indirectly the “dataset”. To verify these that these groups are working, I would use the
.text() to make sure that you’re clear on which attribute pair you’ve assigned to which SVG group, and you can test this by drawing simple shapes in each group (e.g. a rectangle of a different color).
Part 2: Creating the scatterplots
Your next task will be to code a single function,
makeScatterplot(), that will be used to create the scatterplot shown in each cell of the matrix. This function should be generic, in that it will produce a scatterplot for any pair of attributes, depending on which column/row in the scatterplot matrix you are working with.
The idea here is we will call
makeScatterplot() once for each group, utilizing the joined data of the group (i.e. which pair of attributes) to dictate which pair of attributes are plotted. This will be accomplished using d3’s
.each() method, which accepts a function to call once per every element in a selection. Moreover, it can accept anonymous functions so that one can pass the associated pair of attributes into
makeScatterplot() should accept a selection (to insert the plot into) as well as a pair of attributes. Next, to create the scatterplot, it requires another d3 data join, this time to create the objects we’ll draw once per data point. We will create circle elements for each data point. You will then set the
cy for each circle based on associate attribute pair passed into the function. Other visual properties of the circles are for you to decide. You can set the radius as you wish, and you might want to consider setting the fill color to be based on other properties (for example, the Iris dataset has a
species attribute that takes on three values: “setosa”, “versicolor”, and “virginica”).
For this to work well, there are two concepts you need to generalize: First, for any item
d in the dataset, you need to use accessors to access the appropriate attribute. By this I mean, you need to be able to pass in sufficient information to
makeScatterplot() so that when you get to this stage in the function:
You know how to go from
d to the appropriate attribute of
d for the give plot you are drawing.
Second, you also need to know how to convert from the appropriate attribute of
d to the specific visual space you are assigning to the plot. To do this, you will need to make use of scales. I recommend creating one scale for each attribute defined in the
attribs array. You should do this by taking the min/max of data values for each attribute (
d3.extent() is particularly helpful here) and then set them to the range of visual space associated with the plot itself (luckily, this is fixed as the
size of the plot). SVG transforms will save you here, as once you’ve transformed the group for which the plot is in, you can work in the coordinate space where \((0,0)\) is the top left corner of the group rather than the canvas. This means you can define just two scales (a horizontal and vertical one) for each attribute, rather than redefining scales in each call to
Part 3: Annotations
Once you have a basic visualization up, you’ll realize the plot is quit difficult to read because you will be missing axes for each of the rows/columns. You must correct for this by adding
d3.axis objects in appropriate places. Luckily, you can rely on the d3 scales you created for positioning elements in
makeScatterplot() to do this.
There are many ways to do this, and you may want to experiment to see what looks best and makes the most sense to you. Minimally, the way I did this was by organizing my svg canvas into three top-level groups. The first group was the SPLOM itself (which then contains the matrix of groups for the plots). I then next added a group on the left size for the left axes of each row and a group for the bottom for each column axis.
Within each of these two new groups, I did data join on the list of columns, and created groups that I transformed to line up with the rows/columns. These groups were then used to place each of the
d3.axis objects, that I created using another round of
Feel free to experiment with additional marks such as grid lines. Your final visualization must include, somewhere, an indication of what position is associated with what data value and what data attribute pair is being shown in each matrix.
Part 4: Brushing and linking
Finally, your code should support a selection mechanism that allows the user to brush in one plot, select a set of data points, and then brush in additional plots to refine this selection.
Specifically, when the user drags the mouse on any scatterplot, a rectangular brush is drawn on that scatterplot, indicating the region of interest. All the points with attributes inside the brushed region are considered selected. To link the visualization, selected points should be drawn in a unique stroke color while unselected points should be drawn with no stroke. This change in visual encoding should be reflected in all plots. I recommend using a stroke color that is more saturated to help the selection visually pop out.
Next, if the user brushes on a second scatterplot, the selected points must satisfying the intersection of the selections. This means you should only highlight points that are within both selections. If no brush is active, all points should be drawn in their original style (e.g. if you color by species). When more than two brushes are active, the selected points should be contained within all brushes.
To accomplish this, you will implement one common
onBrush() function. When
makeScatterplot() is called, it will create a
d3.brush object assigned with the group and sized appropriately. This brush will be linked to a function
updateBrush() that captures and updates the selected region for the brush. I have provided code that configures the brush in
makeScatterplot() and sets up the approach callbacks for the
.on("end",...) events. The callback updateBrush should store the
d3.event.selection object for the brush in a global variable. These objects store the range for each brush in pixel coordinates.
Using the stored selection ranges,
onBrush() should create a filter function,
isSelected(), that will return true if a data element is contained within the active selection regions of all brushes. This will allow you to use the d3’s selection filters. With it,
onBrush() can be implemented so that it selects all circles (from all plots) and applies this filter to identify the data elements that are selected. Selected elements should be have their borders set to a highlight stroke color, while not selected elements should be reverted to the visual appearance based on their species.
You may want to skim Documentation for d3’s brushes before starting this assignment. Note that although d3’s brushes are available as a separate library, you do not need to include d3-brush explicitly yourself: the brush library is packaged with the version of d3 included in the template repository.
Written Questions (worth 15/80 points)
Each written question should be answered with a brief paragraph (approximately 100 words or less) and appear below your SPLOM in
index.html. I will not read extra material if I deem the answer too long.
Scatterplot matrices work well because they follow many of the design principles discussed in class. Are they an instance of small multiples? Explain why or why not.
Is adjusting the stroke color an effective choice for employing popout to highlight the selected points? If so, explain why and if there are any requirements for it to work well. If not, explain why not and what you might choose to do instead to improve it.
Explain the differences between data attributes that are nominal, ordinal, and quantitative. For the datasets provided in this assignment, how would you classify each of the attributes?
You should use git to submit all source code files. The expectation is that your code will be graded by cloning your repo and then executing it within a modern browser (Chrome, Firefox, etc.)
Please provide a
README.md file that provides a text description of how to run your program and any parameters that you used. Also document any idiosyncrasies, behaviors, or bugs of note that you want us to be aware of.
To summarize, my expectation is that your repo will contain:
d3.v5.jsplus any others you require)
.cssfiles containing style information
|Bugs or syntax errors||Up to -10 each bug at grader's discretion to fix|
Point Breakdown of Features
|External documentation (README.md) following the template provided in the base repository||5|
|Consistent modular coding style, indentation, etc.||5|
|Header documentation and internal documentation (Block for functions and Inline descriptive comments). Wherever applicable / for all files||5|
|Expected output / behavior based on the assignment specification, including|
Cumulative Relationship to Final Grade
Worth 8% of your final grade
Implementing features above and beyond the specification may result in extra credit, please document these in your README.md. Notably, you may want to consider additional visual elements to improve the utility of your SPLOM as well as additional interactions.