Consolidating Speech Tasks with Spoken Language Models
Talk, Australian National University, ANU School of Computing, Canberra ACT, Australia
Talk, Australian National University, ANU School of Computing, Canberra ACT, Australia
Talk, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Talk, Nvidia, CA, USA
Abstract: Recent Large Language Models (LLMs) show great improvements in text processing and natural language processing applications. Spoken language modeling in comparison is a very recent research area. Speech, in contrast to text, has many different components - speaker characteristics, emotional cues, pausing, pitch variation, etc. Moreover, speech signals are of longer sequence length than text. In this talk, I will focus on two parts: First, explore the utility for spoken Language Models for speech evaluation. Second, we will discuss how to build a multi-modal voice and text language model for consolidating speech recognition/ synthesis with text and speech continuation tasks.
Talk, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Tutorial, 10 Bayfront Avenue, Singapore, Singapore