10th Speech in Noise Workshop, 11-12 January 2018, Glasgow

Effects of global brightness on salience and auditory foreground perception

Francesco Tordini(a)
McGill University, Centre for Interdisciplinary Research in Music Media and Technology, Montreal, Canada

Albert S. Bregman
McGill University, Department of Psychology, Montreal, Canada

Jeremy R. Cooperstock
McGill, Department of Electrical and Computer Engineering, Montreal, Canada

(a) Presenting

The word salience describes the attention-grabbing quality of a sound. A salient sound stands out in the presence of other competing ones, which become the background of the scene. The concept of salience is ubiquitous in auditory sciences.

However, despite the emphasis placed on the topic, computational models of auditory salience are far from mature. Progress in the field is hampered by the little agreement on behavioral paradigms that truly probe the involuntary, stimulus-driven nature of salience. As a by-product, there is a lack of standard datasets available to the research community. Moreover, current approaches reduce the problem of salience modeling to that of change detection, that is, matching the detection response of a human listener to brief auditory events.

We propose to distinguish between the salience of sound events and that of streams, and we introduce a paradigm to study the latter using repetitive patterns in a competitive, spatial scenario.

We suggest that global descriptors of perceptual features can be used to characterize the streams and predict their salience, hence the perceptual organization of the auditory scene into foreground and background.

However, there are many possible independent features that can be used to describe sounds. Our hypothesis, deemed biologically and computationally reasonable, is that only a few of them cause a sound object to stand out from the scene. Adopting a data-driven approach, we select the features that emerge from the analysis of the data collected using our competitive streaming paradigm and feature-rich natural sounds, such as bird chirps and human voice, and synthetic sounds. The result is a parsimonious interpretation of the rules guiding foreground formation: after loudness, tempo and brightness are the dimensions that have higher priority.

Our data show that, under equal-loudness conditions, patterns with fast tempo and those from certain "preferred" brightness bands tend to emerge from the scene. Moreover, the interaction between tempo and brightness in foreground selection seems to increase with scene complexity or task difficulty.

We propose to use the relations we uncovered as the underpinnings of a computational model of foreground selection, and also, as recommendations for sonic information design, particularly that of auditory warning systems.

Last modified 2017-11-17 15:56:08