Multimodal Emotion Recognitions: Needs, challenges, and Opportunities

Emotion plays an important role in diverse real-life applications. Applications include: Video gaming, Medical diagnosis, Education, Employee safety, Patient care, Autonomous car, among others.  The emotion of a person can be identified by various sources of information like speech, transcript, facial expression, brain signal (EEG), and a combination of two or more of these signals. Among these sources, speech is seen at the most common attribute and the easiest to acquire, and use. Speech attributes are not substantially affected by side information such as physical movement, visual occlusion, beard, etc. Moreover, speech features for emotion recognition are quite invariant to language. However, recent research efforts have shown that the accuracy of speech based emotion recognition systems can still be enhanced using visual cues such as facial expressions. With the advances made in computing power and the availability of large amounts of data, it is becoming now possible to combine and analysis huge amounts of data using advanced neural networks such as deep networks. In this presentation, we will discuss the fundamental concept of emotion recognition. We will then analysis current research using speech and facial expression separately, then we move to more recent multimodal emotion recognition systems. Finally, we will provide a perspective on future research directions in this area with some major challenges and potential applications in this era of multimedia and smart living.