BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Controlling and Muting Whisper: Universal Acoustic Adversarial Att
 acks on Speech Foundation Models - Vyas Raina
DTSTART:20240819T110000Z
DTEND:20240819T120000Z
UID:TALK219661@talks.cam.ac.uk
CONTACT:Simon Webster McKnight
DESCRIPTION:Speech-enabled foundation models\, such as the OpenAI Whisper 
 model\, are increasingly popular for their ability to perform various task
 s beyond automatic speech recognition (ASR) using appropriate prompts. The
 se models\, including audio-prompted large language models (LLMs)\, offer 
 significant flexibility\, allowing for tasks like speech transcription and
  translation. However\, this flexibility introduces susceptibility to adve
 rsarial attacks that can control the model's behavior by altering the audi
 o input. In our work\, we demonstrate two forms of adversarial control ove
 r Whisper. The first form\, "controlling Whisper\," shows that it is possi
 ble to prepend a short universal adversarial acoustic segment to any input
  speech signal\, overriding the prompt settings of an ASR foundation model
 . Specifically\, we successfully use this segment to force Whisper to alwa
 ys perform speech translation\, even when set to perform speech transcript
 ion. The second form\, "muting Whisper\," exploits Whisper's use of specia
 l tokens in its vocabulary. We propose a method to learn a universal acous
 tic realization of Whisper's special token\, which\, when prepended to any
  speech signal\, causes the model to transcribe only the token\, effective
 ly muting the model. Our experiments demonstrate that a universal 0.64-sec
 ond adversarial audio segment can mute a target Whisper ASR model for over
  97% of speech samples and often transfers to new datasets and tasks. Over
 all\, these works highlight the vulnerabilities of multi-tasking speech-en
 abled foundation models to adversarial attacks\, demonstrating significant
  risks and potential implications for real-world applications.
LOCATION:Hybrid: JDB Teaching Room\, Engineering Department or Zoom: https
 ://cam-ac-uk.zoom.us/j/81208506346?pwd=htfSCSr9PDFluWw7fJirGOM6c7EbTK.1
END:VEVENT
END:VCALENDAR
