BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Detecting Text Reuse in Large Historical Corpora and Authorship At
 tribution of Premodern Documents - Aleksi Vesanto (University of Turku)
DTSTART:20171003T100000Z
DTEND:20171003T110000Z
UID:TALK86051@talks.cam.ac.uk
CONTACT:Dimitri Kartsaklis
DESCRIPTION:This presentation covers two projects: a method to detect text
  reuse that can withstand extreme OCR noise\, and real world applications 
 of machine learning in authorship attribution. Detecting text reuse from h
 istorical documents is relevant to many\, as it can shed light on many que
 stions\, such as how certain news spread or whether authors have plagiariz
 ed others. Finding these repeated passages can be fairly hard\, as the doc
 uments are generally OCR transcribed and can contain extreme noise\, where
  the text is bordering on unreadable. Authorship attribution is in no way 
 a new field\, yet machine learning has only had a limited spotlight in rea
 l world applications. This presentation highlights a case where machine le
 arning provides new information that contradicts older manual attributions
 \, and a method to attribute a document with multiple possible authors wit
 h very little training data.
LOCATION: SR-24\, English Faculty Building\, 9 West Road (Sidgwick Site)
END:VEVENT
END:VCALENDAR
