As part of the LOCATA Challenge, an extensive data corpus will be released, targeted at sound source localization and tracking in general and at the above 5 tasks in particular. The corpus will be open access, distributed under the Open Data Commons license. The corpus aims at providing a wide range of scenarios encountered in acoustic signal processing with an emphasis on dynamic scenarios. All recordings contained in the corpus were made in a realistic, reverberant acoustic environment. Ground truth positions, trajectories, and orientations of sources and sensors were obtained by means of an optical tracking system that uses 10 infrared cameras to localize and track moving objects. Ground truth positional data will be made available to the participants. Ground truth positions of the sources will be used for evaluation of the challenge results, and released as part of the data corpus after completion of the challenge. Due to the installation of the OptiTrack system, recordings are limited to a single room. To ensure different acoustic conditions between recordings, source-sensor distances and angles were changed, thereby enforcing varying Direct-to-Reverberant Ratios (DRRs) between the recordings.
Tasks 1 and 2, involving static loudspeakers, are based on the CSTR VCTK1 database. The VCTK database provides over 400 newspaper sentences spoken by 109 native English talkers, recorded in a semi-anechoic environment at 96 kHz and down-sampled to 48 kHz. The database is distributed under the Open Data Commons license, therefore permitting open access for participants. As a result, the Challenge corpus will also be distributed under the Open Data Commons license to facilitate open access. Tasks 3 to 6 will use speech recordings of live talkers reading randomly selected VCTK sentences. The talkers are equipped with DPA microphones near their mouths to record the close-talking speech signals. Participants will be provided with the close-talking speech signals only for the development phase. The corresponding signals for the evaluation dataset will be released as part of the corpus once the challenge is completed. These recordings are representative of the practical challenges, including natural speech inactivity during sentences, sporadic utterances as well as dialogues between talkers.
Acoustic Sensor Configurations
The following microphone arrays were used for the recordings:
- Planar microphone array with 15 microphones includes different linear uniform sub-arrays
- 32-channel spherical Eigenmike of the manfacturer mh-acoustics
- 12-channel pseudo-spherical microphone array integrated in the prototype head of the humanoid robot NAO
- Binaural recordings from a pair of hearing aid dummies (Siemens Signia) mounted on a dummy head (HeadAcoustic).
These recordings are representative of the practical challenges, including variation in orientation, position, and speed of the microphone arrays as well as the talkers. Measurements of head-related transfer functions (HRTFs) or information about the room acoustics (T60, DRR etc) are not provided to the participants of the challenge to stimulate the development of algorithms that require a minimum of a priori information.