BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:A note on the F-measure for evaluating record linkage algorithms (
 and classification methods and information retrieval systems) - David Hand
  (Imperial College London)\; Peter Christen (Australian National Universit
 y)
DTSTART:20160908T143000Z
DTEND:20160908T153000Z
UID:TALK67286@talks.cam.ac.uk
CONTACT:INI IT
DESCRIPTION:Record linkage is the process of identifying and linking recor
 ds about the same entities from one more databases. If applied on a single
  database the process is known as deduplication. Record linkage can be vie
 wed as a classification problem where the aim is to decide if a pair of re
 cords is a match (the two records refer to the same real-world  entity) or
  a non-match (the two records refer to two different entities). Various cl
 assification techniques &ndash\; including supervised\, unsupervised\, sem
 i-supervised and active learning based &ndash\; have been employed for rec
 ord linkage. If ground truth data in the form of known true matches and no
 n-matches are available\, the quality of classified links can be evaluated
 . Due to the generally high class imbalance in record linkage problems\, s
 tandard accuracy or misclassification rate are not meaningful for assessin
 g the quality of a set of linked records.   Instead\, precision and recall
 \, as commonly used in information retrieval\, are used. These are often c
 ombined into the popular F-measure\, which is normally presented as the ha
 rmonic mean of precision and recall. We show that F-measure can be express
 ed as a weighted sum of precision and recall\, with weights which depend o
 n the linkage method being used. This reformulation reveals the measure to
  have a major conceptual weakness: the relative importance assigned to pre
 cision and recall should be an aspect of the problem and the user\, but no
 t of the particular instrument being used. We suggest alternative measures
  which do not suffer from this fundamental flaw.
LOCATION:Seminar Room 2\, Newton Institute
END:VEVENT
END:VCALENDAR
