Flexible Keyword Spotting based on Homogeneous Audio-Text Embedding

[ad_1]

Recognizing user-defined versatile key phrase in real-time is difficult as a result of
the key phrase is represented in textual content. On this work, we suggest a novel structure
to effectively detect the versatile key phrases primarily based on the next concepts. We contsruct the consultant acousting embeding of a key phrase utilizing graphene-to-phone conversion. The phone-to-embedding conversion is completed by wanting up the embedding dictionary which is constructed by averaging the corresponding embeddings (from audio encoder) of every cellphone through the coaching. The important thing good thing about our strategy is that each textual content embedding and audio embedding are in the identical house; therefore its comparability is semantically extra correct than the case the place impartial textual content encoder is employed. Due to this fact, we undertake the closest neighbor search within the embedding house to seek out out the almost certainly key phrase from the user-defined versatile key phrase checklist.

[ad_2]

Source link