Talk: Consolidating speech tasks with Spoken Language Models

Date:

Abstract: Recent Large Language Models (LLMs) show great improvements in text processing and natural language processing applications. Spoken language modeling in comparison is a very recent research area. Speech, in contrast to text, has many different components - speaker characteristics, emotional cues, pausing, pitch variation, etc. Moreover, speech signals are of longer sequence length than text. In this talk, I will focus on two parts: First, explore the utility for spoken Language Models for speech evaluation. Second, we will discuss how to build a multi-modal voice and text language model for consolidating speech recognition/ synthesis with text and speech continuation tasks.