| May 10, 2022

Do You Trust Your Speech Recognition System?

Fooling Deep Learning with (Inaudible) Audio Perturbations

By Jon Vadillo, Roberto Santana, and Jose A. Lozano

Ever-increasing advances in artificial intelligence systems have given rise to flourishing technologies, such as automatic speech recognition systems, self-driving vehicles, robotics, and computer-aided diagnosis systems. In addition to the availability of large volumes of data and advances in computing systems, the outstanding success of these technologies is primarily due to deep learning models that are capable of learning to perform highly complex tasks from data.

However, these models are not yet fully reliable or trustworthy. A main limitation is their intriguing vulnerability to adversarial examples: inputs that an adversary purposefully manipulates to change the model’s output classification while simultaneously ensuring that the modifications are imperceptible to humans [1, 5]. Figure 1 illustrates an adversarial attack.

Researchers first examined this vulnerability in the context of image classification problems. Such problems revealed a high instability in models that possess a remarkable capacity to generalize in regular (non-adversarial) scenarios — i.e., to accurately classify “new” inputs that were not present during the training process, or inputs with small corruptions (like random noise or rotations). The study of this phenomenon has since extended to many other classification scenarios, domains, and tasks, and consistently shows that current models can easily be fooled by imperceptible changes in input.

An Attack for Every Scenario

Recent extensive research has yielded uncountable attack methods — i.e., procedures that exploit different model vulnerabilities to generate adversarial examples. This wide repertoire enables the generation of attacks for multiple goals and scenarios.

For instance, one can design adversarial examples that force a classification model to produce a specific incorrect class or—in contrast—ensure that only the provided class is incorrect (without prioritizing any incorrect class). Most methods achieve these goals by generating perturbations that are specifically optimized for each individual input, since doing so allows for the employment of simple and highly efficient procedures. Nevertheless, it is also possible to produce universal perturbations that are capable of fooling the model in an input-agnostic fashion — but the tradeoff is higher computational costs and amounts of distortion. Universal perturbations allow for the deployment of attacks in real time, since the perturbation is already precomputed.

Most strategies generally assume a scenario wherein the adversary can access all of the model’s information—including its parameters and training details—to produce the perturbations. However, adversaries can also generate adversarial examples when they are only able to observe the model’s output (e.g., the confidence that is assigned to each possible class). This fact opens the door for attacks in even highly constrained practical scenarios.

Despite the multiple types of attack methods, the creation of adversarial examples is more challenging in certain scenarios than in others. Here we overview the challenges that accompany the generation of inaudible adversarial perturbations in the audio domain; such challenges stem from the difficulty that surrounds the mathematical quantification of the extent to which a perturbation is detectable by humans. We also highlight several (more general) challenges for which theoretical justifications remain elusive or incomplete.

Figure 1. Illustration of an adversarial attack. The addition of an adversarial perturbation slightly modifies the initial (clean) input signal, thus forming an adversarial example. The model incorrectly classifies the adversarial example, even though it is perceptually indistinguishable from the clean input to the human ear. Figure courtesy of the authors.

Challenges in Ensuring the Imperceptibility of Audio Perturbations

The overall goal when designing an adversarial example is to produce an incorrect output. Nevertheless, a fundamental requirement of adversarial perturbation is that it must be imperceptible to humans. It is therefore of paramount importance to be able to quantify a perturbation’s perceptual distortion (i.e., to what extent humans can or cannot detect the distortion).

Researchers can guarantee the imperceptibility of a perturbation in image data with simple metrics like the Euclidean norm \(\ell_2\) or the maximum norm \(\ell_\infty\) by ensuring that the amount of distortion falls below a sufficiently low threshold (according to these metrics). However, the assessment of human perception is more complex for other data types; such is the case with audio signals, for which more specific distortion metrics are necessary to obtain meaningful measurements [6].

Mathematically modeling the perceptual distortion of audio perturbations is particularly challenging due to the multiple psychoacoustic factors that condition the way in which human auditory systems perceive audio signals [8]. For example, the loudness of a signal (i.e., the perception or “subjective impression” of the sound’s intensity) depends on its frequency composition because the human auditory system is more sensitive to certain frequencies than others [3]. The loudness of two audio signals with different frequencies can thus be markedly different even if both signals are reproduced with the same intensity.

In addition, a sound can mask another (normally audible) sound with similar frequencies, meaning that the latter becomes inaudible in the presence of the former. This phenomenon is called frequency masking and can occur even if the two sounds are not produced simultaneously (e.g., certain spoken sounds can mask other posterior spoken sounds) — an effect that is known as temporal masking [2].

Due to the subjective nature of psychoacoustics and the fact that the aforementioned factors can vary considerably depending on the characteristics of the signals, it is still difficult to mathematically model precise or general perceptual distortion metrics that are suitable for audio adversarial perturbations. Indeed, current work finds that “conventional” metrics (commonly used in the literature) can yield misleading results when applied to speech signals [6]. Recent advances to overcome these limitations include the development of more fine-grained evaluation frameworks to better assess the perceptual distortion of audio adversarial perturbations by considering different signal properties or evaluating different metrics in different parts of the signals. Nevertheless, further investigation is required to establish suitable metrics for audio signals—such as ambient sounds or music tunes—that might require more specific treatments due to their different characteristics.

Over-the-air Audio Attacks

Researchers often study adversarial examples in digital environments, where they assume that the inputs are stored digitally (e.g., in a computer file) and fed to the model “as is.” Nonetheless, one can craft audio signals that are capable of triggering malicious outputs in a model even when they are reproduced and captured “over the air,” i.e., in a real (physical) environment. However, physically reproducing a signal inevitably leads to changes and corruptions that can inhibit the adversarial perturbation and make the attack fail. In such cases, scientists must thus ensure that despite the possible changes between the reproduced and captured signals, the latter will still be able to fool the model.

Researchers have proposed different attack approaches to overcome this loss of effectiveness, including the simulation of various possible transformations and corruptions that might occur when the audio is reproduced in practice (e.g., reverberations or noise). One can mathematically model these transformations with impulse response functions, which are widely studied in signal processing fields. Doing so optimizes the resulting adversarial audio to maximize the expectancy or “chance” of fooling the model even after corruption. At the same time, consideration of such scenarios adds an additional level of difficulty to the assessment of whether a perturbation is detectable by humans.

What Doesn’t Fool You Makes You Stronger

The ultimate goal of exposing adversarial attack strategies or new vulnerabilities is to alert the community of such flaws; doing so enables the analysis of any risks and therefore the more conscious (and safe) use of these technologies. The discovery of new vulnerabilities also promotes the study of defensive methods and vice versa, resulting in an arms race that leads to improvements in model robustness and effectiveness. However, fooling deep learning models is easier than correcting their vulnerabilities or explaining the causes of such vulnerabilities. Advances in defensive methods thus generally occur at considerably slower rates than advances in offensive methods.

Intriguing Properties with Elusive Theoretical Justifications

The ongoing research on adversarial machine learning remains focused on answering fundamental open questions, such as Why do adversarial examples exist? Nevertheless, researchers have proposed a number of promising hypotheses to address these challenges, many of which are based on the study of decision spaces that are learned by the models.

Since one can view classification models as functions \(f:X\rightarrow Y\) that map instances from an input space \(X \in \mathbb{R}^d\) to an output class in a finite discrete space \(Y\), a model represents a partition of this input space into classification regions (each region corresponds to one class). Researchers can use this perspective to study adversarial examples via the properties of decision regions, such as the geometry of decision boundaries (the hyperplanes that separate different regions) around inputs; this geometry determines a model’s adversarial robustness. For instance, the robustness of the model increases when the minimum distance between the inputs and those decision boundaries increases. Recent work has shown that decision boundaries tend to have similar geometries around multiple inputs, demonstrating both theoretically and empirically that a single perturbation can fool the model in an input-agnostic fashion [4, 7]. Even so, the large number of parameters of deep learning classifiers and the high dimensionality of their input spaces means that a complete characterization of their properties is often intractable, thereby slowing advances in the understanding of these phenomena.

Finally, several theoretical works claim that deep learning models are inherently vulnerable to adversarial attacks, and that such vulnerability is a price for maximizing the classification performance on regular samples. The extent to which defensive mechanisms, more robust model designs, or better training paradigms can reduce such a tradeoff is another critical research question.

References
[1] Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., ... Roli, F. (2013). Evasion attacks against machine learning at test time. In H. Blockeel, K. Kersting, S. Nijssen, & F. Železný (Eds.), Machine learning and knowledge discovery in databases (pp. 387-402). Berlin/Heidelberg: Springer.
[2] Bosi, M., & Goldberg, R.E. (2003). Introduction to digital audio coding and standards. In Springer international series in engineering and computer science (Vol. 721). Boston, MA: Springer.
[3] Fletcher, H., & Munson, W.A. (1933). Loudness, its definition, measurement and calculation. Bell Syst. Tech. J., 12(4), 377-430.
[4] Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. In 2017 IEEE conference on computer vision and pattern recognition (pp. 86-94). Institute of Electrical and Electronics Engineers.
[5] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In International conference on learning representations. OpenReview.
[6] Vadillo, J., & Santana, R. (2022). On the human evaluation of universal audio adversarial perturbations. Comput. Secur., 112, 102495.
[7] Vadillo, J., Santana, R., & Lozano, J.A. (2022). Analysis of dominant classes in universal adversarial perturbations. Knowl. Based Syst., 236, 107719.
[8] Zwicker, E., & Fastl, H. (2013). Psychoacoustics: Facts and models. In Springer series in information sciences (Vol. 22). Berlin/Heidelberg: Springer Science & Business Media.

Jon Vadillo is a Ph.D. student in computer science in the Intelligent Systems Group (ISG) at the University of the Basque Country (UPV/EHU) in Spain. His research interests comprise the vulnerabilities of deep learning models in adversarial scenarios, with a particular focus on the audio domain. Roberto Santana is a tenured researcher in the ISG at UPV/EHU. His research interests encompass the use of probabilistic graphical models in evolutionary algorithms and the application of machine learning methods to problems in bioinformatics and neuroinformatics. Jose A. Lozano is a full professor at UPV/EHU—where he currently leads the ISG—as well as the scientific director of the Basque Center for Applied Mathematics. His research interests include the fields of statistical machine learning and combinatorial optimization.