NSF-ITR Synthetic Data Workshop at CISER

The NSF-ITR confidentiality work group will hold a workshop at CISER on September 8, 2006. Participation is by invitation-only. Interested participants should contact John Abowd. The workshop is part of and financed through NSF Grant #0427889.



Agenda

Sponsored by NSF Grant #0427889.
Hosted by the Cornell Institute for Social and Economic Research

Workshop Participant List

  • John Abowd, Cornell University and U.S. Census Bureau (LEHD), PI and organizer
  • Lars Vilhuber, Cornell University and U.S. Census Bureau (LEHD), coordinator
  • Fredrik Andersson, Cornell University and U.S. Census Bureau (LEHD)
  • Gary Benedetto, University of Maryland and U.S. Census Bureau (LEHD)
  • Rob Creecy, U.S. Census Bureau (SRD)
  • Josep Domingo-Ferrer, Univ. Rovira i Virgili
  • Lisa Dragoset, Cornell University and U.S. Census Bureau (LEHD)
  • Kaj Gittings, Cornell University and U.S. Census Bureau (LEHD)
  • Sam Hawala, U.S. Census Bureau (SRD)
  • Daniel Kifer, Cornell University
  • Ron Jarmin, U.S. Census Bureau (CES), Co-PI
  • Saki Kinney, Duke University
  • Karen Masken, Internal Revenue Service
  • Kevin McKinney, U.S. Census Bureau (LEHD)
  • Javier Miranda, U.S. Census Bureau (CES)
  • Ashwin Machanavajjhala, Cornell University
  • Kerry Papps, Cornell University
  • Corinne Prost, INSEE and Cornell University
  • Trevillore Raghunathan, University of Michigan, Co-PI
  • Jerry Reiter, Duke University
  • Arnie Reznek, U.S. Census Bureau (CES)
  • Bryan Ricchetti, Cornell University and U.S. Census Bureau (LEHD)
  • Rolando Rodriguez, U.S. Census Bureau (SRD)
  • Stephen Roehrig, Carnegie Mellon University, Co-PI
  • Ian Schmutte, Cornell University
  • Martha Stinson, U.S. Census Bureau (LEHD)
  • Vicenc Torra, University of Barcelona, Artificial Intelligence Research Institute
  • Simon Woodcock, Simon Fraser University

Thursday, September 7, 2006

  • Dinner: 7:30pm John Abowd’s home (directions sent to all invitees)
  • After dinner discussion: Introductions and where we are
Hotel: Hilton Garden Inn (downtown Ithaca). The group room rate for the NSF-ITR Workshop is $134/night. There is a block of rooms available for September 7th. Reservations can be made now through August 7th by calling 1-877-STAY-HGI or 607-277-8900 or on-line at www.ithaca.stayhgi.com and entering group/convention code ABOWD.

Friday, September 8, 2006

Location: Ives 109 Distance Learning Room (Cornell Campus, a shuttle bus is available from the Hilton Garden Inn.)

Breakfast

  • 8:00am (ILR Conference Center Room 329. The hotel shuttle bus will bring you directly to the ILR Conference Center; go to the third floor. Food is not allowed in the distance learning room)

Morning Sessions (simulcast to the Census Bureau, room G-316/Building 3, and Barcelona, Spain)

  • 8:30-9:20 Building synthesizers for different data structures
    • Household data structures (longitudinal: SIPP, HRS; cross-sectional: ACS)
    • Establishment data structures (longitudinal: LBD; cross-sectional: CBP)
    • Job data structures (longitudinal: LEHD)
    • Origin/destination data structures (cross-sectional: OTM)
    • Dynamically linked tabular data (longitudinal: QWI)
    • IPSO synthesizers; probabilistic record linkage; distance record linking
  • 9:30-10:20 Testing the validity of synthetic data
    • Univariate methods (KDE; MI combining formulae)
    • Propensity score methods
    • Other multivariate methods
  • 10:30-11:20 Roundtable discussion of Data Privacy and Confidentiality Protection Technologies
    • (Open session, joint with the Institute for Social Sciences Networks Team)
  • 11:30-12:20 Certifying the degree of protection: Re-identification models and techniques
    • Probabilistic record linking
    • Distance record linking
    • Estimating the probability of re-identification
    • Estimating the PPF for information and protection

Lunch

  • 12:30 (ILR Conference Center Room 329, same as breakfast)

Afternoon Sessions (simulcast to the Census Bureau, room G-316/Building 3)

  • 1:30-2:00 Testing the validity of synthetic data (continued)
    • Univariate methods (KDE; MI combining formulae)
    • Propensity score methods
    • Other multivariate methods
  • 2:00-2:20 Computational issues
    • Basic computational engines (SAS, Java, R)
    • Computational problems for synthesizers (Control of multithreading, implementing informative priors)
    • Computational problems for re-identification software (SAS callable, native SAS)
    • Open issues
  • 2:30-3:20 Getting data to the users
    • What should we support on the VRDC?
    • How can we best teach the users the combining formulae for multiply-imputed synthetic data?
  • 3:30-4:00 Wrap-up
    • Progress reports
    • Working papers and publications
    • VRDC files

Comments are closed.