EXPLORN:
DESIGN CONSIDERATIONS PAST AND PRESENT
BY:
DANIEL B. CARR, EDWARD WEGMAN AND QIANG LUO.
Send you comment to
Moustafa

Here is the online decoumentation for ExplorN
ExplorN is a software package designed to provide geometrically-based
visualization of multivariate data. The software incorporates concepts
and methods from statistical graphics and computer science.
The design follows guidance from human perception and human-computer
interface communities. The integration and extension of methods in
ExplorNTM makes it a unique program for the visual analysis of
multivariate data.
This paper provides design considerations and history behind the
development of ExplorN. It then covers current users options and
indicates future development areas. The authors anticipate the many
future papers will emphasize specific features of ExplorN and address
applications.
For example a paper in progress (Somogyi et al 1996) investigates
nine variables associated witht timing and expression genes controlling
the development of the spinal cord in the rat.
Motivated by the visual analysis of proprietary banking data,
Wegman and Luo (1996) have already provided a paper that emphasizes
alpha blending in a parallel coordinates view of multivariate data.
ExplorN is a product of the Center for Computational Statistics.
The development versions run in the Center's virtual reality laboratory,
the Holodeck.
This laboratory provides a modern environment for the further
development of ExplorN as it evolves to provide additional direct
manipulation tools and to add techniques for the visual analysis of
massive data sets.
This paper is part of efforts are to provide access to ExplorN.
ExplorN utilizes modern technology. Presentions at conventions to
static views and the impoverished quality of NTSC videos.
To provide hands on experience, Dr. Carr has moved and demonstrated
selected versions of ExplorN at various sites around the world.
A distribution version is available for research purposes and collaboration
is underway with staff at a few sites such as Oregon State University.
A major distribution restriction is that ExplorN only runs on Silicon
Graphics workstation and Iowa State University. The usefully distribution
of ExplorN is limited because ExplorN depends on both on GL,
a propriety Silicon graphics language, and on specialized Silicon Graphics
hardware. To provide wider access in the near term, Dr. Wegman developed an
NFS infrastructure to support workshops that will provide hands on experience
and exchange among researchers. Visitors to the Center of Computional Statistics
are always welcome.
In the longer term researchers can expect the appearance of OpenGL versions of
ExplorN.Since Windows NT and other operating systems support OpenGL this will
increase distribution. A few hardware features may not port easily and have
to be emululated in software or omitted in early distribution versions.
Access to ExplorN capabilties will increase over time.
The organization of this paper is as follows. Section 2 provides a discussion
of multivariate graphics.
The multivariate graphics considerations motivated the development of
Explor4 (Carr and Nicholson 1987, 1988) and its sequel, ExplorN.
Section 3 briefly describes features of Explor4.
Section 4 provides introduction to ExplorN capabilities.
The section serves as a brief user manual.
Section 5 indicates directions for further development.
An appendix credits those who contributed to ExplorN and provides some of
the implementation history that is not mentioned in the above sections.

2. Multivariate Design Considerations.
2. Multivariate Design Considerations
The world of multivariate graphical representation expands as researchers
attempt to develop better methods.
A few guidelines help characterize the graphics the we find to be preferable.

2. 2.1 Univariate Guidelines.
2.2 Bivariate Guidelines.
Multivariate Design Considerations
Main: Documont Items:
Cleveland and McGill (1984) discuss perceptual accuracy of extractionand
indicates preferred methods for univariate comparisons.
Their research had subjects judge relative magnitudes of graphically encoded
continuous univariate variables.
Their results ranked the graphical encoding methods into three classes
described here as best,good, and poor.
Encoding that provided comparisons along a common scale and comparisons along
identical non-aligned scales ranked best.
Encoding that used length, angle, and orientation ranked good.
The encodings that used area, volume, point density, and saturation
ranked poor.

2. 2.2 Bivariate Guidelines.
The scatterplot is the standard for representing continuous bivariate data.
The orthogonal axes allow the separate coordinates to be represented (encoded)
as a position along a common scale. There are many enhancement methods for
scatterplot that bring out either functional relationships or densites.
When the y coordinates is considered a function of x, common practice is
the fit a smooth to the data. In the density context, Carr 1986, 1987 and
Scott 1986, 1990 provide a variety of representations.
Parallel coordinate plots (Wegman,) provide an alternative that also uses
position along a common scale for both axes. He notes that one can readily
assess the correlation for adjacent varibles and describes the foundations
for interpretation of other features. For two variables Carr and Olsen (1995)
found parallel coordinates particularly useful in map legend design when space
was at a premium. They noted that the parallel coordinate representation
hadparticular merit in terms of reading paired values.
The scatterplot remains provides the preferred way of providing the gestalt of
a functional relationship.

2.3 Multivariate Guidelines and Distance Judgements.
In the multivariate the context seems obvious that accurate interpoint distance
judgements are crucial to geometrically-based visual interpretation.
In dimensions above one, interpoint distance judgments are no longer
equivalentto judging values along a single identical scale.
The literature in graphical perception accepts the familiar Cartesian
coordinate plot scatterplot encoding scheme as the preferred method for
representing 2-D data.
The judging the distance between points amounts to judging the length of
line segments.
Given the minor variations in human vertical and horizontal perception
this is a simple task.
Comparing two interpoint distances is closely related to
judging values along identical nonaligned scales.
(This assumes the axes are suitable scaled.)
Assessing the distance between points is more complicated with the parallel
coordinates encoding.
At first thought 3-D Cartesian coordinate plot seems ideal for representation
3-D data.
Judging 8the distance between points again amounts to judging the length of
a line segment.
However, the step from 3-D perceptions is different than 2-d perception.
There is a difference between human depth perception and human judgements
of vertical and horizontal position.
While depth perception relates to horizontal visual acuity (Valysz),
depth perception is largely driven by binocular discrepancies (parallax) and
eye convergence.
In the description of Friedhoff and Benzon ( ) humans have three different
visual processing channels.
The binocular channel involves parallax and eye convergence. Cues from the
high resolution monocular shape channel also provide important depth information.
These cue include interposition or oculation, shadow, and detail perspective.
The third channel, devoted to color, also provides a little information,
apparently in terms of areal perspective (blue shift).
Simply put 3-D distance judgement are not as accurate as 2-D distance judgements.
Despite this fact, we live in a 3-D world and have strong intuitions about
relationships in a 3-D environment. Our depth judgement is calibrated by
much of experience.
For this reason we believe 3-D Cartesian coordinate plots to be the best
choice for three continuous coordinates.
There is no natural generalization to 4-D encodings.
Common alternatives use glyphs to encode coordinates, links to join lower
dimensional views of coordinates, or conditioning to show lower dimensions
relationships for restricted intervals or factor levels of other coordinates.

2.4 Glyphs.
The glyph encodings fall into two classes, those that represent coordinates
using glyph position in the plot and those that don't.
The 3-D stereo plot falls in the first class since the x and y coordinates
determine the glyph position and the glyph consists of two point with a given
parallel.
The two points are routed so that each eye sees one point.
That glyph encoding that do not encode coordinates using position include
Chernoff faces,profile plots, trees and castles, star plots and cone plots.
The opinion of Carr et al (1986) was that glyph encoding represent
coordinates by their spatial position allowed better judgements of
interpoint distances than other glyph encodings.
To represent a 4th coordinate they selected from the Cleveland and
McGill class of good encodings. They chose the ray angle over line
length for encoding for a 4-th coordinate because this creates fewer
ambiguities created in overplotting situations.
The ray glyph many uses. Ray angle may be perceived more accurately
than stereo depth and certainly avoids the complications associated
with stereo production and perception.
For maps Carr (1991) has used the a bivariate ray glyph to show two
non-spatial coordinates along with two spatial coordinates.
A ray pointing to right encodes one variable (for small values the
ray point down)and a ray points to the left encode the other.
Carr, Olsen and White 1992 use provide confidence regions for rays using arcs.
In a low overplotting context,
using small reference wheels at the base of the ray provides unobtrusive
identical nonaligned scales for assessing ray angle.
Thus the simple ray plot can show more information more accurate than some
may have realized.
For a 5-D symbol Carr et all (1986) use ray angle and ray length.
For a 6-D symbol for continuous data they dip into the class of
poor encodings and suggest using a carefully selected color scale.
They are openly discouraging about 6-D representation and prefer to used
color (different hues) to encoded six or few levels a factor.
This adds a dimension but it is not the same as representing a continuous variable.
People do not handle the decoding of glyph coordinates is with uniform accuracy.
Clearly humans interpret angles and lengths differently than spatial position.
This complicates the assessment of interpoint distance.
While training can help people to improve their judgements about interpoint
distances between stereo ray glyphs, both ease and accuracy are lost in the
step from 3-D to high dimensions.
Further, the information between length and angle does not combine as
readily as the information does from two spatial coordinates.
The Carr et all (1986) emphasized the stereo ray glyph as frontier
were significant progress can be made in assessing geometric relationships
and implicitly suggest that the distance judgements rapidly degrade in moving
to 5-D and higher plots.
For 4-D data one of the variables sometimes is a dependent.
The a natural approach is to use spatial position for the independent
variables and the other glyph encodings for the dependent variables.
Researchers have observed that one could use ray angle to encode two variables.
The is two variables could equate to two angles in spherical coordinates.
However, human decoding of this representation is problematic.
Rays are typically small, the ray depth information is totally encoded in
the parallax difference between the two ends of the ray.
There is in fact little resolution avail for the second angle.
It is easier work with rays being of different length. Carr et al (1986)
use only one angle and adopt the convention that ray angles are in the
plane of the display.
The ability to comparison and integrated non-spatial glyph coordinates
across the plot is an important consideration.
Ray angles work well in this regard. Glyphs that do no use positional
encoding have more elements to consider and their binding together
in a single symbol complicates comparisons.
For example it can be difficult to focus attention on the second axis
of all points in profile plot.
Further the closeness of juxtaposition also makes a difference.
After sampling 800 points from a hyperplane one can immediately see the
existence of the linear constraint using stereo ray glyph plots (see Carr
and Nicholson 1988).
Conventional software using Chernoff faces uses several pages to show
the 800 faces.
Analysts would find it difficult to discover that there is a local
constraint and that the data is not fully four dimensional.
Current versions of Chernoff faces are a failure for representing
information in four dimensions when compared to a stereo ray glyph.
However, humans have special face recognition "hardware"
so the is yet hope for face representations.
Our recognition of faces is better if people are smiling (get the people
the line up to smile),
if they have the same ethic origin as ourselves and if the faces are
turned 15 degrees.
In regard to viewing orientation, studies have shown that the nose is
a feature of low importance in a face on view. Clearly the nose becomes
an important feature in a profile view.
Whether or not optimally encoded faces can compete in low dimensions remains
to be seen.
The answer may well be yes for some tasks such as finding outliers and no
for others such as assessing local dimensionality.

2.5 Linked Plots.
Linking points between plots can occur in several ways.
The ways include linking by lines, colors, names, pointers, and partial
linking by juxtaposition.
The following discussion emphasizies line linking and color linking.
Diaconis and Friedman () proposed linking plots of multivariate coordinates
with lines.
For example they might represented 4-D data using two 2-D scatterplots.
The first plot represented the two coordinates and the second plot
represented the remaining two coordinates. Lines between the first
points in the two plots indicated the binding between two vectors of
length two into one vector of length four.
Alternatively one could handle 4 coordinates by linking a 1-D plot
to a 2-D plot to a 1-D and so on.
The M and N plot paradigm was quite general.
The linked plot paradigm has several weaknesses. First, coordinates are not
treated symmetrically.
That is, assessing relationships withing plots are easier to assessing
than relationship between plots.
To provide symmetry requires a sequence of plots that covers all
non-equivalent ways of assigning variables to encodings.
Asymmetry can be an advantageous feature when communicating a particular
result, but is often undesirable in exploratory stages of analysis.
The second undesirable feature is the overplotting of linking lines.
This could hide lines and make the plots look complicated or messy.
One option was to thin the line density by connection coordinates of only
one point of each small cluster of points in the p-D space.
Wegman and Miller () took a different approach by calculating and
representing line density.
The line densities plots convey much more information about clusters than
overplotted lines.
The methods extend to large data sets and are used in ExplorN as described
by Wegman and Luo.
Extensions of M and N plots are pretty obvious.
In view of Dr. Wegman's work in parallel coordinate
1-D plots, Dr. Carr developed a stereo parallel planes plot for his
interview at George Mason University.
The connecting lines appeared like pick-up sticks in space and could be
explored in more detail by sectioning at a sliding interval between planes.
So far only the only variation on M and N plots that has caught on at all
is the parallel planes plot.
Even parallel coordinates plots have been reviewed negatively by some
(Splus manual).
This likely reflects a lack of experience in a dynamic graphics setting.
Carr and Nicholson (1987) found unlinked parallel coordinates plots useful
as a coordinate input device and useful for showing marginal univariate
densities. The lines density views of
Wegman and Luo (1996) not only show correlation patterns and they show clusters.
Brushing to pick out clusters is particularly easy in parallel coordinant views.
When applied to categorical data, the density of connecting lines between
factor levels represented on adjacent axes indicate the counts in a very
compact, graphically accessible fashion.
With direct manipulation enabling the reordering of axes, one can gain
insights into parallelcoordinate encoded tables.
Views of data take on new meaning in a direct manipulation context.
There is a reason that point-linked plots have not caught on for understanding
higher dimensional geometric structure.
Human's are exceedingly poor at tomographic reconstruction.
Clear structure in 2-D plots can be hard to fathom by looking at
1-D margin plots.
Having many 1-D margin plots produced by projecting from different
viewing angles did not usually help except for some special degenerate
situations.
Only with great effort can humans begin to understand nontrivial
3-D geometric structure by looking at a scatterplot matrices of
pairwise coordinates.
The perception of the structure can be trivial in a stereo view.
The many views provide by grand tour sequences tends to overwhelm
the mind rather than assist human attempts at tomographic reconstruction.
Projection methods provide insights when the structure appears in lower
dimensions than the display.
The perception of a 1-D curve can be obvious in 2-D scatterplot
while often impossible in a 1-D margin plot.
People can readily observe the edges of 2-D bounded objects in a 2-D display.
This suggested using the plots of the highest dimensional possible to capture
geometric structure that resides in higher dimensions.
Stereo 3-D plots provide the natural environment for viewing surfaces.
4-D plots are provide logical environment for viewing solids, and so on.?
and Buju indicated that degenerate views found by projection pursuit and
exploration methods may bepathological. Local rocking shed insight.
Coloring linking via brushing in a scatterplot matrix is one of the most
popular linking techniques. When the user brushes a group of points,
the linking really applies to the subset.
It may be very difficult to identify the all the coordinates of on
observations.
Further responding to more that six groups represented in different
colors gets difficulty.
Thus brushing or color linking have limitations.
However, brushing also serves as a condition technique that often
lower dimensionality.
Brushing can serve and a direct manipulation slicing tool.
Thus brushing has more going for it than line linking and should
be considered in a broader context.

2.6 Nested Views.
The classic example of nesting is the casement display (Tukey and Tukey).
As in the M and N plots early examples focused attention of 4-D plots.
The basic casement display is a matrix of scatter plots each with the
same scale.
The procedure partitions the data into a crossed two-way layout
using two coordinates.
This determines the matrix of cells for plotting.
The remaining two coordinates become the points in the scatterplot
for the given cell.
The casement display is not symmetric in the coordinates,
sacrificing resolution for the two coordinates defining the layout.
Carr 1995 provides a variant of a casement display for 5-D data.
The data sets concerned a dependent variable, protein folding energy,
and for independent variables, folding angles.
The casement display to represented four folding angle variables.
Since the variables had seven levels each,the 7 x 7 casement display
did not loose any resolution The display represented energy using ray angle.
Nesting of the angle scale provided additional resolution.
A dark ray on a light background indicated that an angle
was to be interpreted on one scale from 10 to 20.A white ray
on a black background was to interpreted on a scale of 0 to 10.
Values near 0 were of greatest interest.
One could scale the plot for regions with a black background.
Inspection of the ray angles reveal local minima and saddle
point troughs through space.
Recently, Mihalison() has been a strong exponent
of nested graphics and has obtained patent covering his methodology.
In the nested approach plots appear with plots which appear in plots.
We don't know extend of the patent but likely it applies when beyond
the single layer of nesting provide in the casement display.
With increased nesting the asymmetry noted above increases. Nested graphs
can make it hard to judge interpoint distances.
Still the nested approach has merit because it gets the data in one display
and because humans have the capacity to follow the systematic nesting.
The does not mean people can easily Integrating the pieces, but then
nothing works very well in high dimensions.

2.7 Conditioned Views.
Cleveland picks up the idea of casement displays and calls them conditioned
plots or coplots. Conditioned views are different than the nested views
when there are more than two conditioning factors. In Trellis graphics,
Cleveland uses the conditioning factors to lay out low-dimensional conditioned
views in rows and columns across as many pages as necessary.
The conditioned plots are typically 2-D plots. Trellis automates the labeling
of these plots by factor names and can represent the factor
levels graphically.
Cleveland prefer 2-D plots for the conditioned plots for perceptual reasons.
However, the concepts and software are not restrictive.
One could shows sequences of stereo-ray glyph plots.
Conditioned view does not have to strictly partition the data.
Cleveland introduced the notion of shingles that allows the same
observations to appear in more than one plot.
This is particularly helpful when smoothing a scatterplot because it
increases the number of points in the plots and poor fitting at
the plot edges mostly involves redundant data. The inclusion of points
in more than one plot also provides continuity in moving from plot to plot.
As the dimensionality increases, the number of plots increases.
Well-conditioned view can allow humans to gain insight in one dimension higher,
understanding higher-dimensional data in all its glory is not a thing humans
were meant to do.
The combinatorics may prohibit looking at all the conditioned plots can
be too much.
Researcher have developed algorithms (Carr 1991) to prioritize plots for
review in terms of their potential interest.
Most insights occur concerning local relationships.
Conditioned views provide one of the basic and productive approaches of
obtaining local insights and is indicated above brushing is a interactive
conditioning technique.

2.7 Multiple Views.
In most fields of endeavor, one has the obligation to find the obvious.
Failing to find the extremely complex is forgivable in the rare cases that
it is noticed.
In the graphics context, is it important to look at low dimensional
views of the data to see if there are obvious patterns.
Frequently data sets consist of low-dimensional geometric structure
embedded in much higher-dimensional data.
With appropropriate graphics we can readily identify the existence low
dimensional structure.
As systematic approach involves looking at 1-D margin views,
scatterplot matrices, stereo-scatterplot triples, and four-D plots
such as the stereo ray glyph plots. Carr et al 1987 advocate looking at
low dimensional margin views and this theme appears in ExplorN.
Given several variables, say 10, the combinatorics lead to an increasing
number of plots as viewing dimensionality increases.
For example 10 chose 2 (45) is smaller than 10 chose 3 (120).
The number of variables and the viewing dimensional induce
layout and resolutiont problems.
If restricted to a single screen, the plots in a scatterplot
matrix can get too small to be useful.
For 3-D arrays of 3-D plots sectioning is usually necessary
to reveal the plots a layer of stereo plots at time.
The space restriction for a matrix of of 2-D or stereo scatterplots
motivates adding pan and zoom options in addition to the slicing capability.
ExplorN has not yet implemented this.
Currently there are no plans for 4-D arrays for 4-D plots.
ExplorN allows the user to pick the variables and does not provide
a mass production treatment.
There are many merits to multiple views. Consider first the parallel
coordinates view as show in Figure 1.
The view makes it very easy to brush on any variable.
In Explor, the parallel coordinate view provide the method of choice
for defining multivariate points. The parallel coordinates view is advantages
for user interactions with individual coordinates axes.
one view for 4DThere are number methods of extracting more information, slicing,
animations, denstiy represention, brushing, masking,
touring, projection pursuit, on so on. This paper provide a background
for ExplorN but the By 4-D the combinatoric become very bad.
ExplorN provides focuses attention on a one plot 4-D.
Wegman provide guidance concerning multivariate interpretation of
parallel coordinate views.
Some facets of this analysis are based on point-line duality.
Learning is required and most analysts are more comfortable working with
scatterplots.
In terms of assessing interpoint distance we note that
went the difference between multivariate points concentrate in one coordinate,
then the parallel coordinate view will like prove the superior representation
for comparing the distance between points.

2.8 Density Representations.
To assess multivariate structure in data, it is highly desirable to have a
large number of cases,and a large number of cases motivates the use of
density represenations. The NSF proposal emphasized the investigation of
bin-based density representations.
The argument was that if a proposed analysis of visualization
method was slower than 0(n)
(or had a large coefficient), it would be faster to bin the data (an 0(n)
operation) and then use a count-weighted version of the method.
For example one could bin the data and then use weighted
kernel methods for density estimation. Thus the strategy was
to trade off a controlled amount of resolution to produce a regularity
that would speed processing.
The binning approach breaks down as the dimensions get large.
Binning is advantageous when it results in a compressed representation of
the data.
A sparse bin representation omits empty cells and simply involves a cell id
and a cell count for occupied cells.
Thus a sparse bin representation will be no larger than smaller than
2n numbers and ideally is much smaller than n numbers.
(The sparse representation is amenable to further compression.)
If one maintains good univariate resolution, the number of potential
bin cells grows exponentially as the dimensions increase.
As the number of potential cells surpasses the number of cases,
the chances diminish that the sparse representation will be much smaller
than n.
However, NSF proposal emphasized at low-dimensional margin views and that
is precisely where binning provides a reasonable strategy for coping with
a large number of cases.
A number papers provide approaches to visualizing densities.
For 2-D domains, Carr et all (1986) represented a density surface using
randomly placed (x and y) stereo line segments.
The depth z-coordinate gave the local density. The line segment orientation
was orthogonal to steepest assent and related to contouring at
fixed density levels.
Carr et all (1987) demonstrate real time contouring of bivariate densities
using color table methods and discuss graphics hardware based
density-estimation using an add-to-pixel operation.
Graphics workstations the time period such as the
Amiga often supported a wide variety of logical and bound-based
operations on pixels, but unfortunately not an add-to-pixel operation.
The alpha channel on the SGI's provided the equivalent to an add-to-pixel
an the ExplorN implementation was to exploit this capability along with other
facets color mixture control.
The comparison of groups is a fundamental graphics task. Carr et al (1986)
discuss the subset painting operations and color mixing control that
lead to the discovery of which symmetry in particle physics data.
Carr et all (1987) show a plot representing density difference for two groups
using a binned representation.
The ExplorN implementation was to allow comparison of densities for more
than one group. In 1992 Luo implemented color selection with three colors
and provided a compile time intensity setting for global alpha blending.
In 1993 Dr. Carr tasked Takacs interactive control of color mixtures.
While this work was was not completed it provided general color selection
and interactive control of the global alpha.
Working with color and exploiting the hardware is non-trivial.
One may want to use alpha blending, color table control and Z-buffer
control in concert, but such is not always possible. Further implementation
will help cover what we know how to do and continuing research is needed
to delimit what one can and cannot do utilizing graphics hardware.
For example, the hardware computation of densities for two groups
requires two alphas that are a function of group size.
The current use of a global alpha provides a representation
proportional to counts, not densities.
The convenient implementation of alpha channel-based count
estimation saturates at 255 plots on a pixel for a mininal intensity alpha
and sooner for a larger alpha.
If red, blue and green channels can be used as a single accumlations register,
over 224 overplots plots could be accomodated. It is hard to
distinguish 256 among intensities, alone handle a density range of over
two million discrete values.
The real time contouring let methodology referenced is particularly
relevant since it allowed focusing on an density interval and scaling
within that interval. ExplorN is not yet complete with respect to
previously demonstrated methodology.
Massive data sets motivates thinking about visualization.
One approach represents pixel counts by color.
Figure 4 is a scatterplot matrix, the represents 54 million
observations per plot.
Carr first presented this picture at a session of Massive Data
Sets at the ASA annual meeting in 1995.
The variables (spectral intensities) were discrete with a
possibility of 256 values.
The 2-D binning in a 256 x 256 grid involved no resolution loss.
Some 2-D cells had over 50,000 counts.
Selection a best color for a static plot is a research issue.
The one of the figure is a heat based encoding adopted directly from Splus
after taking a log of the counts.
Again anipulations of the colors as a function of counts provides
a way to investigate the density surface.
Other approaches show the 2-D density surfaces as perspective
views or as contours.
Fishnet plots have been popular historically but that in part
reflects accessibility.
Rendered surfaces would seem preferable since they show
the whole surface and not just lines on the surface.
Wegman and Luo point out that specular reflections in surface
renderings can call attention to density details.
An addition, the interactive manipulation
of view position removes occulation problems often
encountered in static fishnet views.
(Luo has implemented 2-D density surface rendering
in one version of ExplorN.)
Tufte and others show fishnet plots and contour plots
together because the provide complementary information.
With rendered perspective views combining the information is easy.
Colors can accentuate regions between contour lines.
Of course stereo help provide depth information ordinary
rendered surface views.
Carr (1986) provides a views that blends surface viewing with contours.
One can show contours in stereo and with depth-cued stereo (see Carr)
analysts can read contour locations directly from x and y axies.
In a point plot variant, Carr draws line seqments that are orthogonal
to steepest ascent at data locations.
The z cooridinate of the point represents the density at the point.
The relates to later work by Wegman and Luo using ridge traces.
Of greater interest is looking at 3-D densities although Carr et
all (1986) uses univariate contouring for slices of a 3-D estimate
and show the contours in 3-D.
Scott's (1986) methods are similar but slice in more than one direction
and use perspective views.
In subsequent years Scott (1993) producted more elegant views using
3-D rendered surfaces.,
coloring for In Carr also show a 2-D domain Latter Wegman and Luo (1995)
and Hall and () would look this in a different way and produced ridge
traces based on following steepest ascent.
demonstrates stareIn a tradition cartesions coordinate plotdensity
estimation context Carr (1986 et all 1986, Carr et all (1987), Scott (1986)
and Scott (1986) and Carr (1991) provide manyWhile a sparse represention
If an observation fell in a un a sparse representation only
storThe binned representation conincreasing dimwhen maintaining larger
coefficient.) The idea was to sacrifice resolution for speed.
Binning did not preclude the use of kernal methods methods round regions.
M section describes some of the density representations NSF developments
included implementation of the different density representations.
The early portions of the research developed algorithms and separate
visualizations tools for evaluations purpose.
The research evovled more slowy than
anticipated. Only the most primitive form of alpha blending made it into
ExplorN and representations based on kernal and binning density estimates
and Consequently, the alpha blending options were all that made it into
explorN and aevaluations The research is not progress as fast as anticiplated
and Progress much of this did not occur, project did produces algorithms
and visualization tools, as precusors to ExplorN implementation.
The primary methods that did not make it included, contour and median ridge
(skeleton) density representions derived from binned data.

Explor4.
Explor4 (Carr and Nicholson, 1987, 1988) was the direct precursor to ExplorN.
Review some of it capabilities shows many of the connections to ExplorN.
While some of the above discussion is motivate development of ExplorN current and future,
part provides the direct motivation for the much older Explor4.
Explor4 dates from another era with it even the pull down menus being written in Fortran,
but included several advanced ideas. In the mid-1980's scatterplot matrices had been
recognized as a power plotting paradigm and grand tour was emerging as a way of looking at
higher dimensional data using scatterplots. The 4-D plotting in Explor4 was
a purposeful effort to get around the limitations of such methodology for
understanding higher dimensional relationships.
Carr and Nicholson created Explor4 because in their experience the four dimensional plots
using the stereo ray glyph remained within the domain of comprehension.
This did not mean that analysts will readily understand solids shown 4-D plots
or appreciate 3-D relations like they understand surfaces shown in 3-D plots,
but there is partial understanding.
However, as indicated above, analysts can identify the existence of local
constraints and hence lower dimensionality by looking for smooth variation
of the rays as a function of position in 3-D, or by observing that the rays
fall on a degenerate structure such as a surface.
Explor4 represented the stereo ray glyph using red/cyan color anaglyphs.
The basic view include. 1-D density views of each display axis laid out as parallel
coordinates.
The four axes also served to provide coordinates input for four dimensional
points and constraints.

3.1 Explor4 Masking.
Masking was Explor4's primary tool for focusing on data subsets.
Since Explor4 depended on anaglyph stereo for the stereo ray glyphs color brushing.
The masking equated a variable values to points along a line. and include several
way of computing the variable. These included distance from a user selected hyperplane,
distance from a user defined point in 4-D, and the ordered densities of the 4-D points.
A four button on the mouse facilitated partitioning the line into three intervals.
The left button defined and dragged the left boundary. The right button defined
and dragged the right boundary. and the top button dragged both boundaries.
The points whose masking variable values fell between the left and right
boundaries were displayed. If the boundaries were reversed, the left
begin greater than the right, points where shown if their masking variable
values were outside the boundaries.
Explor4 supported the several methods for computing masking variables.
The three of greatest interest were 4-D point density, distance from
a hyperplane, and distance from a 4-D cursor.
The computation of point density is straight forward using kernel methods.
Explor4 used a heuristically computed bandwidth while more modern interactive
software makes the bandwidth parameters controllable by sliders.
With computed densities, one could immediately mask out low density points to
focus on modes or mask out high density points to look for candidate outliers.
The distance to a hyperplane masking or slicing was perhaps more novel.
There are four motivations for masking. First, projection does not reveal
structure internal to the data cloud, such as holes. Slicing provide a way
of seeing inside the data cloud. Second, hyperplane masking is way adding
of a linear constraint. This lowers the dimensionality of the viewed data
by one dimension and reveal patterns when viewed data dimension drops below
the plotting dimension. Third, with large data sets, masking provide a way
to control overplotting Fourth slicing in depth or in the angle dimension
helps perception since these two coordinates are perceived less accurately
the x and y coordinates.
Point definition provide control of hyperplane masking and distance
from a point masking.
Explor4 used the parallel axis view to define points in 4-D and to control cursor
in the 4-D display. A click on each axis defined a point (x,y,z,w).
Dragging with the mouse down along an axis altered the coordinate of the point
in the 4-D plot. Explor4 stored sets of user defined points (called constraint
points sets) and used them to provide a frame of reference for masking operations.
Hyperplane definition requires only two 4-coordinate objects, a point in the
hyperplane and a normal vector. Two points suffice by using their difference
as the normal vector and the first point as the point in the hyperplane.
A single point sufficed for distance form a point masking. In a modern
environment catalog.
For example two points, a point in a hyperplane and a point giving the other in
of a normal vector defined a hyperplane for hyperplane masking.

3.2 Explor4 Rotation.
For rotation Explor4 provide direct manual control via an Atari tracker ball.
The tracker ball provided rotation in 3-D. Since the display was 4-D the user
fixed a 1-D subspace to be represented by the angle of the stereo ray glyph.
The tracker ball then moved the base of the rays in 3-D. This variant of this
idea appear in ExplorN as described later. The Explor4 display showed lines
segments indicating how the four selected variables mapped into the 4-D
display coordinates.
Carr and Nicholson (1988) provide more details. The important fact was that
the linear combination were visible and that the linear combinations and masking
status could be dumped to a file as part of the history mechanism.
This allows displays to be reproduced externally.
Many of the above capabilities were unique to Explor4.
Explor4 also contained additional tools that common today.
For example it provide for data file selection, variable selection and mapping
of data variables into the display variables.

4. ExplorN.
Dr. Carr's 1990 NSF proposal entitled "Statistical Graphics for Binned 2-D,
3-D and 4-D Data" called for the comparison different multivariate viewing
models: scatterplot matrix, parallel coordinate and stereo ray-glyph viewing
models.
The motivation included assessing the ability of users to judge interpoint
distances and ratios of interpoint distances using the three views.
As might be inferred from the about discussion,
Dr. Carr's perhaps biased conjectures were that the parallel coordinates
would be superior when the difference among points was isolated in one
coordinate, that scatterplot would be superior when the differences were
isolated in two coordinates, and that the stereo ray glyph plots would
be superior in most of the other cases.
In the summer of 1992, NSF project funded Qiang Luo, a student of
Dr. Wegman, to develop ExplorN in a Silicon Graphics workstation.
The software development goals include adding scatterplot matrix and
parallel coordinate views, color brushing, stereo viewing via the
shuttering glass technology available on the SGI's, density representations
using alpha blending (see Carr et all 1987) and binning (Carr 1991).
Qiang Luo had access to code produced by Dr. Wegman's students and was
keenly aware of Dr. Wegman's research.
Thus he was able to adapt existing parallel coordinates code.
Further the added Dr. Wegman's generalized grand tour methodology.
The grand tour capabilities were beyond the scope of the software
development plan.
Thus ExplorN represents blending of methodology developed by
Dr. Carr, Dr. Wegman and the implementational creativity of Qiang Luo.
Dr. Wegman reclaimed the talents of Qiang Luo at the end of the summer.
Nontheless Qiang provided implementation guidance to Dr. Carr and
his students as they continued the development efforts.
Dr. Carr's NSF supported students that contributed include Ji Shen,
and Barna Takacs. Students in Dr. Carr's visualization class also contributed.
Li Li contributed directly and Barbara Lambird work on advance 4-D methods yet
to be integrated.
Dr. Carr's students EPA and NASS supported students are all developing
methodology for future implementation: Kwang-Su Yang (local lattice
line smoothing), Cynthia Kriger (projection pursuit methods in 3-D and 4-D)
and Qi Zhu (texture representations of confidence measures).
worked on facets that have yet to be integrated in ExplorN.
The developement of ExplorN has been and continues to be
collaborative effort, but primary developers being
Dr. Carr, Qiang Luo and Dr. Wegman.

4.1 Connections To Other Software.
As indicated about ExplorN is a sequal to Explor4.
Other software packages also influenced the an early development of ExplorN.
It is not practical to list all such software packages, but mention of
a few sources of inspiration is appropriate.
First, Minigraph (Littlefield and Carr 1986) included many capabilities,
such as glyph drawing,anaglyph stereo, pan and zoom and color table animation,
and color mixture control. Second Dr. Wegman and Masood developed PC software
thatimplemented the parallel coordinates plot and other more general
data viewing software.
Third, software developed at the University of Washington by Andreas Buja,
John McDonald, Werner Stuetzle and Catherine Hurley contained many features
worth emulation. John McDonald is known for introducing painting (brushing)
to the statistics community.
The collective contribution includes many other tools. For example the
representation of linear combinations in Explor4, was an immediate result
of seeing the University of Washington software.
Forth, software developed at AT&T Bell Laboratories has provided inspiration.
The team of John Chambers, Rick Becker, Bill Cleveland and Alan Wilks were
behind many software developments.
The color brushing demonstrated on an SGI at an ASA annual meeting by Becker and
Cleveland provide a strong impetus to upgrade ExplorN in an SGI environment.
Finally, Xgobi, developed by Andreas Buja, Di Cook an Debbie Swayne,
retains an active influence on ExplorN.
ExplorN is not yet as mature in terms of providing point representations,
archiving of views,projection pursuit methods and so one.
Perhaps some of the ExplorN innovations will also influence XGobi .

4.3 ExplorN Menus.
The right mouse controls the appearance ExplorN basic menu. Dragging down to an
item selects it. Moving right is required for submenus.
In version 3C there are seven basic options, The options are views, masking,
data files, variables, windows, brushing, and exit The ordering to a large extend
reflects the usage frequency. The first step is to select a data file,
but typically that is only done once during an interactive sessions.
The most common option is to change views of the data.

4.3.1 Views.

this page will be completed soon