University of Cambridge > Talks.cam > CUED Speech Group Seminars > Controlling and Muting Whisper: Universal Acoustic Adversarial Attacks on Speech Foundation Models

Log in

Google

Microsoft

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Controlling and Muting Whisper: Universal Acoustic Adversarial Attacks on Speech Foundation Models

Download to your calendar using vCal

Vyas Raina
Monday 19 August 2024, 12:00-13:00
Hybrid: JDB Teaching Room, Engineering Department or Zoom: https://cam-ac-uk.zoom.us/j/81208506346?pwd=htfSCSr9PDFluWw7fJirGOM6c7EbTK.1.

If you have a question about this talk, please contact Simon Webster McKnight .

Speech-enabled foundation models, such as the OpenAI Whisper model, are increasingly popular for their ability to perform various tasks beyond automatic speech recognition (ASR) using appropriate prompts. These models, including audio-prompted large language models (LLMs), offer significant flexibility, allowing for tasks like speech transcription and translation. However, this flexibility introduces susceptibility to adversarial attacks that can control the model’s behavior by altering the audio input. In our work, we demonstrate two forms of adversarial control over Whisper. The first form, “controlling Whisper,” shows that it is possible to prepend a short universal adversarial acoustic segment to any input speech signal, overriding the prompt settings of an ASR foundation model. Specifically, we successfully use this segment to force Whisper to always perform speech translation, even when set to perform speech transcription. The second form, “muting Whisper,” exploits Whisper’s use of special tokens in its vocabulary. We propose a method to learn a universal acoustic realization of Whisper’s special token, which, when prepended to any speech signal, causes the model to transcribe only the token, effectively muting the model. Our experiments demonstrate that a universal 0.64-second adversarial audio segment can mute a target Whisper ASR model for over 97% of speech samples and often transfers to new datasets and tasks. Overall, these works highlight the vulnerabilities of multi-tasking speech-enabled foundation models to adversarial attacks, demonstrating significant risks and potential implications for real-world applications.

This talk is part of the CUED Speech Group Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Controlling and Muting Whisper: Universal Acoustic Adversarial Attacks on Speech Foundation Models

📅 Download to calendar (vCal)

👤 Speaker: Vyas Raina
📅 Date & Time: Monday 19 August 2024, 12:00 - 13:00
📍 Venue: Hybrid: JDB Teaching Room, Engineering Department or Zoom: https://cam-ac-uk.zoom.us/j/81208506346?pwd=htfSCSr9PDFluWw7fJirGOM6c7EbTK.1

Questions? Contact Simon Webster McKnight

Abstract

Series This talk is part of the CUED Speech Group Seminars series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Controlling and Muting Whisper: Universal Acoustic Adversarial Attacks on Speech Foundation Models

This talk is included in these lists:

Controlling and Muting Whisper: Universal Acoustic Adversarial Attacks on Speech Foundation Models

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Controlling and Muting Whisper: Universal Acoustic Adversarial Attacks on Speech Foundation Models

This talk is included in these lists:

Other lists

Other talks

Controlling and Muting Whisper: Universal Acoustic Adversarial Attacks on Speech Foundation Models

Abstract

Included in Lists