[论文翻译]首届Cadenza信号处理挑战赛:为听障人士改善音乐体验


原文地址:https://arxiv.org/pdf/2310.05799v1


The First Cadenza Signal Processing Challenge: Improving Music for Those With a Hearing Loss

首届Cadenza信号处理挑战赛:为听障人士改善音乐体验

Abstract

摘要

The Cadenza project aims to improve the audio quality of music for those who have a hearing loss. This is being done through a series of signal processing challenges, to foster better and more inclusive technologies. In the first round, two common listening scenarios are considered: listening to music over headphones, and with a hearing aid in a car. The first scenario is cast as a demixing-remixing problem, where the music is decomposed into vocals, bass, drums and other components. These can then be intelligently remixed in a personalized way, to increase the audio quality for a person who has a hearing loss. In the second scenario, music is coming from car loudspeakers, and the music has to be enhanced to overcome the masking effect of the car noise. This is done by taking into account the music, the hearing ability of the listener, the hearing aid and the speed of the car. The audio quality of the submissions will be evaluated using the Hearing Aid Audio Quality Index (HAAQI) for objective assessment and by a panel of people with hearing loss for subjective evaluation.

Cadenza项目旨在通过一系列信号处理挑战,提升听障人士的音乐音频质量,以推动更具包容性的技术创新。首轮研究聚焦两种常见聆听场景:耳机听歌和车内助听器使用。第一场景被定义为音轨分离-重混问题,音乐将被分解为人声、贝斯、鼓点及其他组分,随后进行个性化智能重混以优化听障用户的听觉体验。第二场景针对车载扬声器播放的音乐,需通过综合考量音乐本身、听者听力水平、助听器参数及车速等因素,克服行车噪音的掩蔽效应进行音质增强。参赛方案将采用助听器音频质量指数(HAAQI)进行客观评估,并由听障人士组成评审团进行主观评价。

Keywords

关键词

hearing loss, hearing aids, inclusive music, music quality, machine learning, signal processing, challenge

听力损失、助听器、包容性音乐、音乐质量、机器学习、信号处理、挑战

1. Introduction

1. 引言

There are many causes of hearing loss, including congenital hearing loss, chronic middle ear infections, noise exposure and age-related hearing loss [1]. The World Health Organization estimates that over 1.5 billion people worldwide have hearing loss. This is projected to rise to 2.5 billion by 2050. In the UK, nearly 12 million people – 1 in 5 – have hearing loss, with more than $40%$ of cases affecting people over 50 years old, and this figure rises to more than $70%$ for people over 70 years old [2].

听力损失的成因众多,包括先天性听力损失、慢性中耳炎、噪声暴露及年龄相关性听力损失 [1]。世界卫生组织估计全球有超过15亿人存在听力障碍,预计到205年这一数字将增至25亿。在英国,近1200万人(占总人口1/5)患有听力损失,其中40%以上病例为50岁以上人群,70岁以上人群的患病比例更是超过70% [2]。

Hearing loss can have a major impact on a person’s quality of life, making it difficult to communicate, participate in social activities, and enjoy music. Despite this, only $40%$ of people who could benefit from hearing aids actually have them and use them often enough. This is partly because people perceive hearing aids as performing poorly [3, 4, 5, 6] or find little benefit in using them [7, 8]. Historically, hearing aids have focused on speech communication. However, music listening is also important as it benefits health and well-being. Music is a universal human phenomenon that exists in many contexts and has a powerful impact on our emotions [9]. Hearing loss can make it difficult to appreciate music. To take two examples, it can affect the ability of listeners to pick out the lyrics and melody lines, as well as to hear the high frequencies that give the music its richness and detail. As a result, music can sound dull, which can lead to disengagement from music.

听力损失会严重影响一个人的生活质量,导致沟通困难、难以参与社交活动及欣赏音乐。然而,仅有40%的潜在助听器受益者实际拥有并经常使用助听器。部分原因在于人们认为助听器性能不佳 [3, 4, 5, 6] 或使用收益有限 [7, 8]。传统上,助听器主要聚焦于语音交流功能。但音乐聆听同样重要,因其对健康与幸福感具有积极影响 [9]。作为人类社会的普遍现象,音乐存在于多元场景中,并对情绪产生深刻影响。听力损失会削弱音乐欣赏能力:例如导致听众难以辨识歌词与旋律线条,或听不见赋予音乐丰富细节的高频成分,从而使音乐显得沉闷,最终降低音乐参与度。

There are several spectro-temporal differences between speech and music that makes hearing aids optimised for speech perform poorly for music [10]. Although manufacturers have been developing special programs for music listening, the effectiveness has been mixed, with $68%$ of users reporting difficulty listening to music through their hearing aids [11]. There is a pressing need for better and more inclusive technology to enhance music accessibility to people with hearing loss.

语音和音乐在频谱和时间维度上存在诸多差异,导致针对语音优化的助听器在音乐场景下表现不佳 [10]。尽管制造商已开发专用音乐聆听程序,但效果参差不齐——约68%用户反馈通过助听器欣赏音乐存在困难 [11]。当前亟需开发更优质、更具包容性的技术来提升听障人士的音乐可及性。

The Cadenza project aims to improve the audio quality of music for people with hearing loss who wear hearing aids, using signal processing and machine learning challenges. These challenges are designed to bring together various research communities to make music more accessible to everyone, taking into consideration the diversity of listeners and making it more inclusive. In the first round (CAD1), we focused on two common scenarios for listening to music: over headphones and in a car in the presence of noise. Firstly, we introduce the general structure and design of CAD1 challenge. Sections 3 and 4 describe the specifics of Task 1 and Task 2. We conclude in Section 6. More details can be found on the challenge website1.

Cadenza项目旨在通过信号处理和机器学习挑战,提升佩戴助听器的听力受损人群的音乐音质。这些挑战旨在汇聚不同研究群体,让音乐更具包容性,同时兼顾听众的多样性。在第一轮挑战(CAD1)中,我们聚焦于两种常见音乐聆听场景:耳机聆听和车载环境下的噪声场景。首先介绍CAD1挑战的整体架构与设计,第3、4节分别详述任务1与任务2的具体内容,第6节进行总结。更多细节可查阅挑战官网1。

2. Overview of the Challenge Tasks

2. 挑战任务概述

In the first scenario (Task 1), a person with hearing loss listens to music through headphones without using their hearing aids. In the second scenario (Task 2), the listener is inside a moving car, listening to the music that is coming from the car stereo in the presence of noise, while wearing their hearing aids. Entrants to the challenges are tasked with personalizing the music signals to improve the audio quality.

在第一种场景(Task 1)中,听力损失者在不使用助听器的情况下通过耳机听音乐。在第二种场景(Task 2)中,听者佩戴助听器坐在行驶的汽车内,通过车载音响在噪声环境中聆听音乐。挑战赛参与者需要对音乐信号进行个性化处理以提升音质。

Figure 1 shows a diagram with the general structure of the challenges. Entrants must develop a Music Enhancer that takes in clean music and listener characteristics as input. The Evaluation Processor then takes the improved music signals, applying any acoustic conditions that are relevant to the task. Finally, the signals are evaluated using the Hearing Aid Audio Quality Index (HAAQI) [12] and a listener panel.

图 1: 展示了挑战赛总体结构示意图。参赛者需开发一个音乐增强器 (Music Enhancer),其输入为纯净音乐和听者特征。评估处理器 (Evaluation Processor) 随后接收优化后的音乐信号,并施加与任务相关的声学条件。最终使用助听器音频质量指数 (HAAQI) [12] 和听者小组对信号进行评价。

2.1. Listener character is ation databases

2.1. 监听器特征化数据库

Listeners are characterised by bilateral pure-tone audiograms. This give the audible thresholds at standardised frequencies ([250, 500, 1000, 2000, 3000, 4000, 6000, 8000] Hz) as measured by an audiometer [13]. While a wider frequency range might have been useful for music, we were restricted to this range because these are the standard frequencies that have been tested in the available databases.

听者的特征表现为双侧纯音听力图。该图表通过听力计[13]测量标准化频率([250, 500, 1000, 2000, 3000, 4000, 6000, 8000] Hz)下的可听阈值。虽然更宽的频率范围可能对音乐更有用,但我们仅限于此范围,因为这些是现有数据库中测试过的标准频率。


Figure 1: Diagram of the structural design of the first challenge

图 1: 第一个挑战的结构设计示意图

For the training (train) set, we used the 83 audiograms employed by the 2nd Clarity Enhancement Challenge [14] from the Clarity Project . These correspond to anonymised examples of real audiograms drawn from the Scottish Section of Hearing Sciences at the University of Nottingham dataset.

在训练集(train set)中,我们使用了Clarity项目第二届清晰度增强挑战赛[14]提供的83份听力图。这些数据来自诺丁汉大学听力科学苏格兰分部的真实匿名听力图样本。

For the development (dev) set, we selected 50 audiograms from [15]. We first filtered the audiograms to better-ear 4-frequency hearing loss between 20 and 75 dB. We then randomly chose the necessary number of audiograms to maintain the same distribution per band as in the original Clarity dataset. This dev set has an equal male-female distribution.

在开发(dev)集构建中,我们从[15]中选取了50份听力图。首先筛选出较好耳四频平均听阈介于20至75分贝的听力图,随后按原始Clarity数据集各频段分布比例随机抽取对应数量的样本。该开发集保持了男女比例均衡的特征。

For the evaluation (eval) set, we recruited 52 bilateral hearing aid users, with symmetric or asymmetric hearing loss. The listeners were recruited via the University of Leeds. Hearing loss severity was mild for 15 listeners, moderate for 17 listeners, moderately severe for 18 listeners and severe for 2 listeners. In the train, dev and eval sets, hearing loss levels, at each frequency, were limited to 80 dB Hearing Level (HL).

在评估(eval)集中,我们招募了52名双侧助听器使用者,包括对称性和非对称性听力损失患者。受试者通过利兹大学招募。听力损失程度为轻度15人、中度17人、中重度18人、重度2人。在训练集(train)、开发集(dev)和评估集中,各频率的听力损失水平均限制在80分贝听力级(dB HL)以内。

2.2. Challenge Evaluations

2.2. 挑战评估

Both scenarios will be subjected to two evaluation processes. The first is an objective evaluation using HAAQI. This is an intrusive metric in which the processed and reference signals are compared. In the evaluation, the HAAQI function is configured so that the reference signal has an amplification applied to it, so that all frequency bands contribute equally to its loudness. This amplification is the NAL-R hearing aid prescription [16]. This prescribes the gain to apply based on the individual’s audiogram thresholds (in dB HL). This linear amplification improves audibility; there is no dynamic range compression.

两种场景都将经过两个评估流程。首先是使用HAAQI (Hearing Aid Audio Quality Index) 进行的客观评估。这是一种侵入式指标,通过比较处理后的信号与参考信号来实现。评估中配置了HAAQI函数,使参考信号应用了NAL-R助听器处方 [16] 规定的增益放大,确保所有频段对响度的贡献均等。该增益基于个体听力图阈值 (以dB HL为单位) 进行线性放大以提升可听度,不涉及动态范围压缩。

The second evaluation consists of a listener panel of 52 listeners (the eval listeners) who will rate the audio quality of the music samples. The panel will use a number of scales: clarity, harshness, distortion, frequency balance, overall audio quality, and liking. Overall audio quality captures whether the audio quality is poor or good, and liking is how much the listener liked the specific piece they just listened to. These dimensions have been developed for this purpose, through a sensory evaluation study [17].

第二次评估由52名听众(评估听众)组成的小组对音乐样本的音频质量进行评分。该小组将使用多个维度:清晰度、刺耳度、失真度、频率平衡、整体音频质量和喜好度。整体音频质量反映音频质量的好坏,喜好度则是听众对刚听过的特定曲目的喜爱程度。这些维度是通过感官评估研究[17]专门开发的。

Overall audio quality captures whether the audio quality is poor / good, and liking is how much the participant liked the specific piece they just listened to

整体音质 (audio quality) 反映音频质量的好坏程度,喜好度 (liking) 表示参与者对刚听完的特定片段的喜爱程度

3. Design of Task 1

3. 任务1的设计

This is presented as a demixing-remixing problem. The demixing stage follows the same design as previous music separation challenges [18, 19]. The goal is to decompose stereo music into vocal, drums, bass, and other (VDBO). However, unlike past demixing challenges, we use HAAQI for the evaluation instead of the signal-to-dis