Biometrics 简明教程

Voice Recognition

语音识别生物特征模态是生理模态和行为模态的结合。语音识别只不过是声音识别。它依赖于影响以下因素的特征−

Voice recognition biometric modality is a combination of both physiological and behavioral modalities. Voice recognition is nothing but sound recognition. It relies on features influenced by −

Physiological Component − Physical shape, size, and health of a person’s vocal cord, and lips, teeth, tongue, and mouth cavity.
Behavioral Component − Emotional status of the person while speaking, accents, tone, pitch, pace of talking, mumbling, etc.

Voice Recognition System

语音识别也被称为说话人识别。在注册时，用户需要对麦克风说出单词或短语。这对于获取候选人的语音样本是必要的。

Voice Recognition is also called Speaker Recognition. At the time of enrollment, the user needs to speak a word or phrase into a microphone. This is necessary to acquire speech sample of a candidate.

来自麦克风的电信号由模数转换器 (ADC) 转换成数字信号。它以数字化样本的形式记录在计算机内存中。然后，计算机比较并尝试将候选人的输入语音与存储的数字化语音样本进行匹配，并识别出候选人。

The electrical signal from the microphone is converted into digital signal by an Analog to Digital (ADC) converter. It is recorded into the computer memory as a digitized sample. The computer then compares and attempts to match the input voice of candidate with the stored digitized voice sample and identifies the candidate.

Voice Recognition Modalities

语音识别有两种变体− speaker dependent 和 speaker independent 。

There are two variants of voice recognition − speaker dependent and speaker independent.

说话者相关的语音识别依赖于候选人的特定语音特征的知识。此系统通过语音训练（或注册）学习这些特征。

Speaker dependent voice recognition relies on the knowledge of candidate’s particular voice characteristics. This system learns those characteristics through voice training (or enrollment).

The system needs to be trained on the users to accustom it to a particular accent and tone before employing to recognize what was said.
It is a good option if there is only one user going to use the system.

说话者无关系统能够通过限制单词和短语等语音环境识别不同用户的语音。这些系统用于自动电话界面。

Speaker independent systems are able to recognize the speech from different users by restricting the contexts of the speech such as words and phrases. These systems are used for automated telephone interfaces.

They do not require training the system on each individual user.
They are a good choice to be used by different individuals where it is not required to recognize each candidate’s speech characteristics.

Difference between Voice and Speech Recognition

说话者识别和语音识别很容易被误认为是同一技术；但它们是不同的技术。让我们看看原因：

Speaker recognition and Speech recognition are mistakenly taken as same; but they are different technologies. Let us see, how −

Speaker Recognition (Voice Recognition)

Speech Recognition

The objective of voice recognition is to recognize WHO is speaking.

The speech recognition aims at understanding and comprehending WHAT was spoken.

It is used to identify a person by analyzing its tone, voice pitch, and accent.

It is used in hand-free computing, map, or menu navigation.

Merits of Voice Recognition

It is easy to implement.

Demerits of Voice Recognition

It is susceptible to quality of microphone and noise.
The inability to control the factors affecting the input system can significantly decrease performance.
Some speaker verification systems are also susceptible to spoofing attacks through recorded voice.

Applications of Voice Recognition

Performing telephone and internet transactions.
Working with Interactive Voice Response (IRV)-based banking and health systems.
Applying audio signatures for digital documents.
In entertainment and emergency services.
In online education systems.