BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Resilient Machine Learning: A Systems-Security Perspective - Roei 
 Schuster\, Cornell Tech
DTSTART:20211026T130000Z
DTEND:20211026T140000Z
UID:TALK162613@talks.cam.ac.uk
CONTACT:Jack Hughes
DESCRIPTION:The security and privacy of ML-based systems are becoming incr
 easingly difficult to understand and control\, as subtle information-flow 
 dependencies unintentionally introduced by the use of ML expose new attack
  surfaces in software. We will first present select case studies on data l
 eakage and poisoning in NLP models that demonstrate this problem. We will 
 then conclude by arguing that current defenses are insufficient\, and that
  this calls for novel\, interdisciplinary approaches that combine foundati
 onal tools of information security with algorithmic ML-based solutions.\n\
 nWe will discuss leakage in common implementations of nucleus sampling ---
  a popular approach for generating text\, used for applications such as te
 xt autocompletion. We show that the series of nucleus sizes produced by an
  autocompletion language model uniquely identifies its natural-language in
 put. Unwittingly\, common implementations leak nucleus sizes through a sid
 e channel\, thus leaking what text was typed\, and allowing an attacker to
  de-anonymize it.\n\nNext\, we will present data-poisoning attacks on lang
 uage-processing models that must train on "open" corpora originating in ma
 ny untrusted sources (e.g. Common Crawl). We will show how an attacker can
  modify training data to "change word meanings" in pretrained word embeddi
 ngs thus controlling outputs of downstream task solvers (e.g. NER or word-
 to-word translation)\, or poison a neural code-autocompletion system\, so 
 that it starts making attacker-chosen insecure suggestions to programmers 
 (e.g. to use insecure encryption modes). This code-autocompletion attack c
 an even target specific developers or organizations\, while leaving others
  unaffected.\n\nFinally\, we will briefly survey existing classes of defen
 ses against such attacks\, and explain that they are critically insufficie
 nt: they provide only partial protection\, and real-world ML practitioners
  lack the tools to tell whether and how to deploy them. This calls for new
  approaches\, guided by fundamental information-security principles\, that
  analyze security of ML-based systems in an end-to-end fashion\, and facil
 itate practicability of the existing defense arsenal.\n\nBio: Roei Schuste
 r is a computer science PhD candidate\, advised by Eran Tromer. For the pa
 st 4 years\, he has been a researcher at Cornell Tech\, where he is hosted
  by Vitaly Shmatikov. Previously\, he completed his B.Sc. in computer scie
 nce at the Technion\, and worked as a researcher in the information securi
 ty industry.\n\nRECORDING : Please note\, this event will be recorded and 
 will be available after the event for an indeterminate period under a CC B
 Y -NC-ND license. Audience members should bear this in mind before joining
  the webinar or asking questions.
LOCATION:Webinar - link on talks.cam page after 12 noon Tuesday
END:VEVENT
END:VCALENDAR
