Multi-laboratory evaluation of forensic voice comparison systems

Geoff Morrison and I are running a Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01).

There is increasing pressure on forensic laboratories to validate the performance of forensic analysis systems before they are used to assess strength of evidence for presentation in court. Different forensic voice comparison systems may use different approaches, and even among systems using the same general approach there can be substantial differences in operational details. From case to case, the relevant population, speaking styles, and recording conditions can be highly variable, but it is common to have relatively poor recording conditions and mismatches between the known- and questioned-speaker recordings. In order to validate a system intended for use in casework, a forensic laboratory needs to evaluate the degree of validity and reliability of the system under forensically realistic conditions. We have released a set of training and test data representative of the relevant population and reflecting the conditions of an actual forensic voice comparison case, and operational forensic laboratories and research laboratories are invited to use these data to train and test their systems. The details below include the rules for the evaluation, a description of the data, and a description of the evaluation metrics and graphics. The name of the evaluation is: forensic_eval_01

Papers reporting on the results of the evaluation of each system will be published in a Virtual Special Issue (VSI) of Speech Communication.

Details (draft of introductory paper for the VSI).