[ad_1]
Current advances in deep studying and computerized speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted its accuracy to a brand new degree. The E2E methods implicitly mannequin all typical ASR parts, such because the acoustic mannequin (AM) and the language mannequin (LM), in a single community skilled on audio-text pairs. Regardless of this less complicated system structure, fusing a separate LM, skilled completely on textual content corpora, into the E2E system has confirmed to be useful. Nonetheless, the appliance of LM fusion presents sure drawbacks, comparable to its incapacity to handle the area mismatch subject inherent to the interior AM. Drawing inspiration from the idea of LM fusion, we suggest the mixing of an exterior AM into the E2E system to handle the area mismatch higher. By implementing this novel strategy, we’ve got achieved a major discount within the phrase error fee, with a formidable drop of as much as 14.3% throughout diversified take a look at units. We additionally found that this AM fusion strategy is especially useful in enhancing named entity recognition.
[ad_2]
Source link