In many situations, an analyst must choose subsets of data prior to processing it. Some examples of these are in experimental design, robust estimation of multivariate location and scatter, and density estimation. The problem can be stated as follows: given a data set of size n, select a subset of these points of size h, where h < n, using a suitable selection criterion. In this talk, I will present several methods of selecting subsets of data, and I provide supporting theory for the algorithms. These methods will optimize the determinant, the trace or a single eigenvalue of the Fisher Information Matrix. These methods will be applied to three applications: selecting the subset of data for estimating the Minimum Volume Ellipsoid, determining the number of groups and initial parameters in finite mixture densities, and finding E-optimal designs.
This seminar is in a series of talks by alumni of the GMU CSI Computational Statistics program. These talks are based on their dissertation research, along with later work. Wendy L. Martinez, along with Jeff Solka, holds the distinction of being the first Ph.D. student graduated from the CSI Program.