Dynamic passwords database should contain words that are difficult to synthesize and pronounce. Analysis of speech intelligibility and semantics are also important. Analyzed signal should not be too smooth and containing unnatural noises or sharp interruptions changes in the signal level. It is necessary to adjust the user’s voiceprint regularly. Identification features should include emotional state and cepstral characteristics of voice. In this paper, we provide the analysis of existing speech synthesis technologies and the most promising attacks detection methods for banking and financial organizations.
![register cepstral voices register cepstral voices](https://ars.els-cdn.com/content/image/1-s2.0-S1746809421000124-ga1.jpg)
ASV security remains an unsolved problem, because there is no universal solution that does not depend on the speech synthesis methods used by the attacker. Anti-spoofing approaches can be based on searching for phase and tone frequency anomalies appearing during speech synthesis and on a preliminary knowledge of the acoustic differences of specific speech synthesizers. Speech synthesis attacks are the most dangerous as the technologies of speech synthesis are developing rapidly (GAN, Unit selection, RNN, etc.). Automatic speaker verification systems (ASV) are vulnerable to various types of spoofing attacks: impersonation, replay attacks, voice conversion, and speech synthesis attacks. Voice biometrics security is a large-scale problem significantly raised over the past few years. The paper considers methods of countering speech synthesis attacks on voice biometric systems in banking.