BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Knowledge Representation and Extraction at Scale - Christos Christ
 odoulopoulos\, Amazon
DTSTART:20181109T120000Z
DTEND:20181109T130000Z
UID:TALK111586@talks.cam.ac.uk
CONTACT:Andrew Caines
DESCRIPTION:These days\, most general knowledge question-answering systems
  rely on large-scale knowledge bases comprising billions of facts about mi
 llions of entities. Having a structured source of semantic knowledge means
  that we can answer questions involving single static facts (e.g. "Who was
  the 8th president of the US?") or dynamically generated ones (e.g. "How o
 ld is Donald Trump?). More importantly\, we can answer questions involving
  multiple inference steps ("Is the queen older than the president of the U
 S?").\n\nIn this talk\, I'm going to be discussing some of the unique chal
 lenges that are involved with building and maintaining a consistent knowle
 dge base for Alexa\, extending it with new facts and using it to serve ans
 wers in multiple languages. I will focus on three recent projects from our
  group. First\, a way of measuring the completeness of a knowledge base\, 
 that is based on usage patterns. The definition of the usage of the KB is 
 done in terms of the relation distribution of entities seen in question-an
 swer logs. Instead of directly estimating the relation distribution of ind
 ividual entities\, it is generalized to the "class signature" of each enti
 ty. For example\, users ask for baseball players' height\, age\, and batti
 ng average\, so a knowledge base is complete (with respect to baseball pla
 yers) if every entity has facts for those three relations.\n\nSecond\, an 
 investigation into fact extraction from unstructured text. I will present 
 a method for creating distant (weak) supervision labels for training a lar
 ge-scale relation extraction system. I will also discuss the effectiveness
  of neural network approaches by decoupling the model architecture from th
 e feature design of a state-of-the-art neural network system. Surprisingly
 \, a much simpler classifier trained on similar features performs on par w
 ith the highly complex neural network system (at 75x reduction to the trai
 ning time)\, suggesting that the features are a bigger contributor to the 
 final performance.\n\nFinally\, I will present the Fact Extraction and VER
 ification (FEVER) dataset and challenge. The dataset comprises more than 1
 85\,000 human-generated claims extracted from Wikipedia pages. False claim
 s were generated by mutating true claims in a variety of ways\, some of wh
 ich were meaning-altering. During the verification step\, annotators were 
 required to label a claim for its validity and also supply full-sentence t
 extual evidence from (potentially multiple) Wikipedia articles for the lab
 el. With FEVER\, we aim to help create a new generation of transparent and
  interpretable knowledge extraction systems.
LOCATION:FW26\, Computer Laboratory
END:VEVENT
END:VCALENDAR
