Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

[ad_1]

Latest advances in deep studying and computerized speech recognition have boosted the accuracy of end-to-end speech recognition to a brand new stage. Nevertheless, recognition of non-public content material equivalent to contact names stays a problem. On this work, we current a personalization answer for an end-to-end system primarily based on connectionist temporal classification. Our answer makes use of class-based language mannequin, through which a basic language mannequin offers modeling of the context for named entity courses, and private named entities are compiled in a separate finite state transducer. We additional introduce a phoneme-to-wordpeice mannequin to map uncommon named entities to extra frequent homophonic wordpieces, and likewise wordpiece prior normalization to bias for uncommon wordpieces, main to a different 48.9% relative enchancment in private named entity accuracy on prime of an already customized baseline. This work permits our techniques to match extremely aggressive customized hybrid techniques on private named entity recognition.

[ad_2]

Source link