What Is Audio Biometrics? — A Framework for the Field

In current usage, audio biometrics refers to the verification or identification of a person through characteristics of their voice. 1 The term is treated as synonymous with voice biometrics, voice authentication, or voiceprint technology. It describes a security function.

But the term itself suggests something broader.

Audio · Bio · Metrics

Audio Latin audire — to hear. Sound.

Bio Greek bios — life. The living body.

Metrics Greek metron — a measure. Quantifiable data.

Audio biometrics, taken literally, describes the relationship between sound and biological data. Nothing in the term limits it to identity verification, and nothing requires the relationship to flow in only one direction. Sound can be analysed to extract biological information. Biological data can be used to generate sound. Both are audio biometrics.

Even within the current, narrow definition — audio as a source of biological data — distinct fields have already emerged. Voice-based identity verification and voice-based health diagnostics both analyse audio for biological information, but they serve different purposes, operate in different industries, and draw from different research traditions. One asks who are you. The other asks how are you. The fact that two separate disciplines already exist within what is treated as a single category suggests the need for a broader framework.

Add the inverse direction — biological data generating audio — and a third field appears: generative audiobiometrics, encompassing therapeutic sonification, 2 adaptive sound environments, the body as musical instrument, and early-stage research into decoding neural signals into speech.

This article proposes a framework that maps the full scope of the field.

Two directions

The relationship between audio and biological data runs in two directions. Each implies different methods, applications, and communities.

The two directions of audio biometrics

Audio → Data

Analysis

Audio is the input. Sounds produced by the body — voice, heartbeat, respiration, coughing, joint movement — are analysed to extract biological information. This direction encompasses two distinct disciplines: audiobiometric verification and audiobiometric diagnostics.

Data → Audio

Generation

Biological data is the input. Physiological signals — heart rate, respiration, EEG, movement — are translated into sound for therapeutic, functional, or expressive purposes. This is generative audiobiometrics.

The framework

The audio biometrics framework

Audiobiometric analysis

Audio → Data

Verification

Who are you?

Voice-based identity verification, authentication, and identification. Voiceprint matching, liveness detection, anti-spoofing, deepfake defence.

Audio → Data

Diagnostics

How are you?

Detection and monitoring of health conditions through body-produced sound. Vocal biomarkers, cardiac auscultation, respiratory analysis, cough classification.

Generative audiobiometrics

Data → Audio

Generative

What can the body become?

Translation of biological data into sound. Therapeutic sonification, adaptive environments, biofeedback, EEG-to-speech, the body as instrument.

Analysis: the body as audio source

Audiobiometric verification

Audiobiometric verification is the most commercially developed area and the one that currently owns the term in industry usage. It treats the voice as a biometric identifier — comparable to a fingerprint or iris pattern — and uses it for verification (confirming a claimed identity) or identification (determining identity from a pool). 1

Each voice carries a signature determined by physiological factors (vocal tract dimensions, vocal fold characteristics, oral cavity shape) and behavioural factors (accent, rhythm, intonation). 3 Systems extract these features to produce a voiceprint — a mathematical representation, not a recording — which is stored and compared against future samples.

The primary challenge is the advancement of synthetic voice technology. As voice cloning becomes more accessible, the focus has shifted from verification to liveness detection — distinguishing live speech from recorded or synthesised reproductions. 4 This creates a persistent adversarial dynamic between generation and detection.

Audiobiometric diagnostics

Audiobiometric diagnostics is, in one sense, the oldest discipline in the field. Listening to the body for signs of health or illness predates modern technology by centuries — from a hand placed on a chest to feel a heartbeat, to the invention of the stethoscope in 1816, to percussion diagnostics that interpret the body's resonance. The practice of extracting health data from body-produced sound is foundational to medicine itself.

What has changed is the precision and scale of analysis. Machine learning models can now detect patterns in body-produced audio that fall below the threshold of human perception. The scope extends well beyond the voice: cardiac sounds, breathing patterns, cough characteristics, and even joint acoustics are being investigated as diagnostic signals. 5

Voice remains the most extensively researched source within this discipline. Vocal changes — shifts in pitch variability, speech timing, articulation precision, harmonic-to-noise ratio, and fundamental frequency — have been identified as potential biomarkers for a range of conditions. 5 Parkinson's disease is the most studied case: between 70 and 90 percent of patients experience voice disorders, and these changes have been observed up to five years before motor symptoms reach clinical threshold. 6, 7 Depression, Alzheimer's disease, frontotemporal dementia, bipolar disorder, and respiratory conditions have also been investigated through vocal analysis. 5, 8

But the discipline is not limited to voice. Respiratory sound classification — distinguishing healthy breathing from wheezing, crackling, or stridor — is an active area of research. Cough analysis has been explored as a screening tool for conditions including COVID-19, tuberculosis, and asthma. Cardiac auscultation, one of the oldest diagnostic practices in medicine, is being augmented with machine learning for automated detection of murmurs and valve abnormalities.

The common thread is the use of body-produced sound as a non-invasive diagnostic signal. The voice is the most studied source, but it is one source among several.

What connects verification and diagnostics

Both disciplines sit within audiobiometric analysis — both take audio as input and extract biological data as output. The distinction is in purpose. Verification asks who. Diagnostics asks how. But the techniques overlap. A system analysing vocal characteristics for identity verification is already processing the same features that diagnostics uses to detect health conditions. A voice authentication system that detects signs of cognitive decline in a caller's speech is already operating across both disciplines simultaneously.

Generation: biological data as audio source

Generative audiobiometrics

Generative audiobiometrics inverts the analysis direction. Rather than extracting data from sound, it uses biological data to create sound. Heart rate, respiration, galvanic skin response, EEG, and body movement become inputs for audio systems.

This is the least consolidated area of the field. Its applications span domains that do not yet share a common identity.

Biological language. Research on brain-computer interfaces is exploring the decoding of neural signals into speech — restoring communication to patients who have lost the ability to speak. This is among the most consequential applications of the generative direction: biological data in, language out. It is early-stage, but it is being actively pursued across neuroscience and engineering.

Therapeutic sonification. Biofeedback protocols that use sound to reflect a patient's physiological state back to them. Research indicates that musical biofeedback can modulate physiological arousal more effectively than standard sonification or passive listening alone. 9 Heart rate variability, respiration, and skin conductance have all been used as control signals. 10

Adaptive experience. Real-time audio environments that respond to the user's biological state — adaptive game soundtracks driven by player physiology, fitness audio that communicates heart rate zones through musical parameters, generative soundscapes that adjust to biometric input from wearables.

The body as instrument. Artistic and performative applications where biological signals are mapped to sound generation — installations driven by audience biometrics, performances where physiology becomes the compositional source, work at the intersection of data sonification and composition. 11 This may be the least commercially obvious application, but it serves a critical function: it expands the imagination of the field. Ideas that emerge from artistic experimentation routinely find their way into functional applications.

What connects these applications is not their purpose. Restoring speech, managing anxiety, scoring a game, and composing music are very different goals. What connects them is the direction: biological data in, audio out.

Conclusion

Audio biometrics, as currently used, describes one application within a larger field. The term contains a broader definition that the industry has not yet adopted.

This article proposes that the full scope encompasses two branches: audiobiometric analysis and generative audiobiometrics. Analysis divides further into verification and diagnostics — two established disciplines sharing the same direction but asking different questions. Generative audiobiometrics remains a single, emerging space.

These areas are not as separate as they appear. A voice authentication system (verification) that detects signs of cognitive decline in a caller's speech is already operating in diagnostics. A therapeutic audio environment (generative) that adapts to a patient's vocal biomarkers (diagnostics) is drawing from two branches at once. As deepfake audio improves, verification systems will increasingly need to assess not just identity but biological liveness — the real-time physiological signatures that synthetic voices cannot yet replicate. The branches are converging.

A unified framework does not flatten the differences between these domains. It makes the convergence visible — and creates conditions for exchange between fields that have, until now, developed separately.

References

Phonexia, "Voice Biometrics: The Essential Guide." phonexia.com
Hermann, T., Hunt, A. & Neuhoff, J.G. (Eds.) (2011). The Sonification Handbook. Logos Verlag, Berlin.
Picovoice, "What's Voice Biometrics? How Does It Work?" picovoice.ai
ISO/IEC 30107 — Biometric presentation attack detection. International Organization for Standardization.
Fagherazzi, G. et al. (2025). "Listening to the Mind: Integrating Vocal Biomarkers into Digital Health." PMC. pmc.ncbi.nlm.nih.gov
Singh, S. et al. (2025). "Voice-Based Detection of Parkinson's Disease Using Machine and Deep Learning Approaches: A Systematic Review." Bioengineering, 12(11), 1279.
Hlavnička, J. et al. (2017). "Automated analysis of connected speech reveals early biomarkers of Parkinson's disease." Scientific Reports, 7, 12.
Low, D.M. et al. (2024). "A Review of Studies Using Machine Learning to Detect Voice Biomarkers for Depression." J. Technology in Behavioral Science. springer.com
Bergstrom, I. et al. (2013). "Using music as a signal for biofeedback." Int. J. Psychophysiology. sciencedirect.com
Frontiers in Computer Science (2023). "Digital music interventions for stress with bio-sensing: a survey." frontiersin.org
IntechOpen (2026). "Musical Data Sonification: Expanding the Boundaries of Data Representation." intechopen.com

Definition

Audio biometrics — Sound as both a carrier and product of biological data.