BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:A Bayesian Partitioning Approach to Duplicate Detection and Record
  Linkage - Mauricio Sadinle (Duke University)
DTSTART:20160914T103000Z
DTEND:20160914T110000Z
UID:TALK67353@talks.cam.ac.uk
CONTACT:INI IT
DESCRIPTION:Record linkage techniques allow us to combine different source
 s of  information from a common population in the absence of unique identi
 fiers.  Linking multiple files is an important task in a wide variety of a
 pplications\,  since it permits us to gather information that would not be
  otherwise available\,  or that would be too expensive to collect. In prac
 tice\, an additional  complication appears when the datafiles to be linked
  contain duplicates.  Traditional approaches to duplicate detection and re
 cord linkage output  independent decisions on the coreference status of ea
 ch pair of records\, which  often leads to non-transitive decisions that h
 ave to be reconciled in some  ad-hoc fashion. The joint task of linking mu
 ltiple datafiles and finding  duplicate records within them can be alterna
 tively posed as partitioning the  datafiles into groups of coreferent reco
 rds. We present an approach that targets  this partition as the parameter 
 of interest\, thereby ensuring transitive  decisions. Our Bayesian impleme
 ntation allows us to incorporate prior  information on the reliability of 
 the fields in the datafiles\, which is  especially useful when no training
  data are available\, and it also provides a  proper account of the uncert
 ainty in the duplicate detection and record linkage  decisions. We show ho
 w this uncertainty can be incorporated in certain models  for population s
 ize estimation. Throughout the document we present a case study  to detect
  killings that were reported multiple times to organizations recording  hu
 man rights violations during the civil war of El Salvador.&nbsp\;
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
