George Mason University
AES/SCS Statistics Colloquium Series
Seminar Announcement



The Fellegi-Holt Model of Statistical Data Editing:
Computational Algorithms and Research Problems


William E. Winkler

U.S. Bureau of the Census


ABSTRACT

Fellegi and Holt (JASA 1976) provide a model that can be used for production edit/imputation systems. Two advantages of the model are that all edits are contained in easily modified tables and that each edit-failing record can be "corrected" in one pass through the data. If implicit edits are available, heuristic algorithms (Winkler 1995, Draper and Winkler 1997) yield 100+ fold speed increases in contrast to branch-and-bound with negligible loss in accuracy. Implicit edits (those logically derived from explicit edits) summarize information needed for determining the minimum number of fields to change. Algorithms that utilize implicit edits can be 1000 times as fast as those that do not (Winkler and Chen 2002). Set-covering algorithms for generating implicit edits (Garfinkel, Kunnathur, Liepins 1986, IBM-ISTAT 1996) are extrapolated to need more than 800 days for generation for an Italian labour force survey. A faster heuristic (Winkler 1997, Chen 1998) generates more than 90% of the implicit edits is less than 24 hours. Background is in research report srd99/01 at www.census.gov/srd/www/byyear.html.


Friday, May 3, 2002
George W. Johnson Center, Assembly Room B
Seminar at 10:45 a.m.
Refreshments at 10:30 a.m.
For the 2002 Spring Seminar Schedule, go to
www.science.gmu.edu/statseminars