Computational Methods for Linking Sets of National Files
- đ¤ Speaker: Bill Winkler (U.S. Census Bureau)
- đ Date & Time: Monday 12 September 2016, 13:30 - 14:30
- đ Venue: Seminar Room 1, Newton Institute
Abstract
A combination of faster hardware and new computational algorithms makes it possible to link two or more national files having suitable quasi-identifying information such as name, address, date-of-birth and other non-uniquely identifying information far faster than methods of a decade earlier. The methods (Winkler, Yancey, and Porter 2010) were used for matching 10^17 pairs (300 million x 300 million) using 40 cpus of an SGI machine (with 2006 Itanium chips) in less than 30 hours during the 2010 U.S. Decennial Census. The methods are 50 times as fast as PSwoosh parallel software (Kawai et al. 2006) from Stanford University. The methods are ~10 times as fast as recent parallel software that applies new methods of load balancing (Rahm and Kolb 2013, Yan et al. 2013, Karapiperis and Verykios 2014). This talk will describe how this software bypasses the needs for system sorts and provides highly optimized search-retrieval-comparison for a narrow range of situations needed for record linkage.
Related Links
- https://fcsm.sites.usa.gov/files/2014/05/J1_Winkler_2013FCSM.pdf - describes methods for clean-up of sets of national files
Series This talk is part of the Isaac Newton Institute Seminar Series series.
Included in Lists
- All CMS events
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- dh539
- Featured lists
- INI info aggregator
- Interested Talks
- Isaac Newton Institute Seminar Series
- ndk22's list
- ob366-ai4er
- rp587
- School of Physical Sciences
- Seminar Room 1, Newton Institute
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Bill Winkler (U.S. Census Bureau)
Monday 12 September 2016, 13:30-14:30