This work focuses on the use of methods and algorithms from the area of speech processing and recognition and from the area of machine vision for designing of system for automatic audio-visual broadcast transcription. The resulting audio-visual system has been designed and created mainly for transcription of huge video databases with TV recordings in this work.
In this work, a system for digits to words conversion for almost all Slavic languages is proposed. This system was developed for improvement of text corpora which we are using for building of a lexicon or for training of language models and acoustic models in the task of Large Vocabulary Continuous Speech Recognition (LVCSR).
The SpeechLab's main research domain is speech recognition. In 1990s we were focusing our activities on solving the easier task: the recognition of isolated words and short phrases. Later our focus moved on the most challenging task - continuous speech recognition. Recently our own home-made systems are capable of classifying discrete words from large vocabularies (tens of thousands) in speaker-independent way, in real time (shorter than 1 s) with recognition rate higher than 95 %.