Monday, February 3, 2025
HomeAmazon PrimeA fast information to Amazon's papers at Interspeech 2023

A fast information to Amazon’s papers at Interspeech 2023

[ad_1]

Amazon’s papers at Interspeech 2023, sorted by analysis subject.

Automated speech recognition

A metric-driven method to conformer layer pruning for environment friendly ASR inference
Dhanush Bekal, Karthik Gopalakrishnan, Karel Mundnich, Srikanth Ronanki, Sravan Bodapati, Katrin Kirchhoff

Conmer: Streaming Conformer with out self-attention for interactive voice assistants
Martin Radfar, Paulina Lyskawa, Brandon Trujillo, Yi Xie, Kai Zhen, Jahn Heymann, Denis Filimonov, Grant Strimel, Nathan Susanj, Athanasios Mouchtaris

DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer
Goeric Huybrechts, Srikanth Ronanki, Xilai Li, Hadis Nosrati, Sravan Bodapati, Katrin Kirchhoff

Distillation methods for discriminative speech recognition rescoring
Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yi Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

Efficient coaching of attention-based contextual biasing adapters with artificial audio for personalised ASR
Burin Naowarat, Philip Harding, Pasquale D’Alterio, Sibo Tong, Bashar Awwad Shiekh Hasan

Human transcription high quality enchancment
Jian Gao, Hanbo Solar, Cheng Cao, Zheng Du

Studying when to belief which instructor for weakly supervised ASR
Aakriti Agrawal, Milind Rao, Anit Kumar Sahu, Gopinath (Nath) Chennupati, Andreas Stolcke

Mannequin-internal slot-triggered biasing for area enlargement in neural transducer ASR fashions
Edie Lu, Philip Harding, Kanthashree Mysore Sathyendra, Sibo Tong, Xuandi Fu, Jing Liu, Feng-Ju (Claire) Chang, Simon Wiesler, Grant Strimel

Multi-view frequency-attention different to CNN frontends for computerized speech recognition
Belen Alastruey Lasheras, Lukas Drude, Jahn Heymann, Simon Wiesler

Multilingual contextual adapters to enhance customized phrase recognition in low-resource languages
Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati

PATCorrect: Non-autoregressive phoneme-augmented transformer for ASR error correction
Ziji Zhang, Zhehui Wang, Raj Kamma, Sharanya Eswaran, Narayanan Sadagopan

Personalization for BERT-based discriminative speech recognition rescoring
Jari Kolehmainen, Yi Gu, Aditya Gourav, Prashanth Gurunath Shivakumar, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

Customized predictive ASR for latency discount in voice assistants
Andreas Schwarz, Di He, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow

File deduplication for entity distribution modeling in ASR transcripts
Tianyu Huang, Chung Hoon Hong, Carl Wivagg, Kanna Shimizu

Scaling legal guidelines for discriminative speech recognition rescoring fashions
Yi Gu, Prashanth Gurunath Shivakumar, Jari Kolehmainen, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

Selective biasing with trie-based contextual adapters for personalised speech recognition utilizing neural transducers
Philip Harding, Sibo Tong, Simon Wiesler

Streaming speech-to-confusion community speech recognition
Denis Filimonov, Prabhat Pandey, Ariya Rastrow, Ankur Gandhe, Andreas Stolcke

Knowledge illustration

Don’t cease self-supervision: Accent adaptation of speech representations through residual adapters
Anshu Bhatia, Sanchit Sinha, Saket Dingliwal, Karthik Gopalakrishnan, Sravan Bodapati, Katrin Kirchhoff

Dialogue administration

Parameter-efficient low-resource dialogue state monitoring by immediate tuning
Mingyu Derek Ma, Jiun-Yu Kao, Shuyang Gao, Arpit Gupta, Di Jin, Tagyoung Chung, Violet Peng

Grapheme-to-phoneme conversion

Bettering grapheme-to-phoneme conversion by studying pronunciations from speech recordings
Sam Ribeiro, Giulia Comini, Jaime Lorenzo Trueba

Key phrase recognizing

On-device constrained self-supervised speech illustration studying for key phrase recognizing through information distillation
Gene-Ping Yang, Yue Gu, Qingming Tang, Dongsu Du, Yuzong Liu

Pure-language understanding

Quantization-aware and tensor-compressed coaching of transformers for pure language understanding
Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang

Sampling bias in NLU fashions: Influence and mitigation
Zefei Li, Anil Ramakrishna, Anna Rumshisky, Andy Rosenbaum, Saleh Soltan, Rahul Gupta

Understanding disrupted sentences utilizing underspecified summary which means illustration
Angus Addlesee, Marco Damonte

Paralinguistics

In direction of paralinguistic-only speech representations for end-to-end speech emotion recognition
George Ioannides, Michael Owen, Andrew Fletcher, Viktor Rozgic, Chao Wang

Utility-preserving privacy-enabled Speech embeddings for emotion detection
Chandrashekhar Lavania, Sanjiv Das, Xin Huang, Kyu Han

Query answering

Query-context alignment and answer-context dependencies for efficient reply sentence choice
Minh Van Nguyen, Kishan Ok C, Toan Nguyen, Thien Nguyen, Ankit Chadha, Thuy Vu

Speaker diarization

Lexical speaker error correction: Leveraging language fashions for speaker diarization error correction
Rohit Paturi, Sundararajan Srinivasan, Xiang Li

Speech translation

Data distillation on joint activity end-to-end speech translation

Khandokar Md. Nayem, Ran Xue, Ching-Yun (Frannie) Chang, Akshaya Vishnu Kudlu Shanbhogue

Textual content-to-speech

Evaluating normalizing flows and diffusion fashions for prosody and acoustic modelling in text-to-speech
Guangyang Zhang, Tom Merritt, Sam Ribeiro, Biel Tura Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo Trueba

Cross-lingual prosody switch for expressive machine dubbing
Jakub Swiatkowski, Duo Wang, Mikolaj Babianski, Patrick Tobing, Ravi chander Vipperla, Vincent Pollet

Diffusion-based accent modelling in speech synthesis
Kamil Deja, Georgi Tinchev, Marta Czarnowska, Marius Cotescu, Jasha Droppo

eCat: An end-to-end mannequin for multi-speaker TTS & many-to-many fine-grained prosody switch
Ammar Abbas, Sri Karlapati, Bastian Schnell, Penny Karanasou, Marcel Granero Moya, Amith Nagaraj, Ayman Boustati, Nicole Peinelt, Alexis Moinet, Thomas Drugman

Expressive machine dubbing by way of phrase-level cross-lingual prosody switch
Jakub Swiatkowski, Duo Wang, Mikolaj Babianski, Giuseppe Coccia, Patrick Tobing, Ravi chander Vipperla, Viacheslav Klimkov, Vincent Pollet

Multilingual context-based pronunciation studying for text-to-speech
Giulia Comini, Sam Ribeiro, Fan Yang, Heereen Shim, Jaime Lorenzo Trueba



[ad_2]

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments