M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

Mittal, Trisha; Bhattacharya, Uttaran; Chandra, Rohan; Bera, Aniket; Manocha, Dinesh

Electrical Engineering and Systems Science > Signal Processing

arXiv:1911.05659 (eess)

[Submitted on 9 Nov 2019 (v1), last revised 22 Nov 2019 (this version, v2)]

Title:M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

Authors:Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha

View PDF

Abstract:We present M3ER, a learning-based method for emotion recognition from multiple input modalities. Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities. M3ER models a novel, data-driven multiplicative fusion method to combine the modalities, which learn to emphasize the more reliable cues and suppress others on a per-sample basis. By introducing a check step which uses Canonical Correlational Analysis to differentiate between ineffective and effective modalities, M3ER is robust to sensor noise. M3ER also generates proxy features in place of the ineffectual modalities. We demonstrate the efficiency of our network through experimentation on two benchmark datasets, IEMOCAP and CMU-MOSEI. We report a mean accuracy of 82.7% on IEMOCAP and 89.0% on CMU-MOSEI, which, collectively, is an improvement of about 5% over prior work.

Subjects:	Signal Processing (eess.SP); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1911.05659 [eess.SP]
	(or arXiv:1911.05659v2 [eess.SP] for this version)
	https://doi.org/10.48550/arXiv.1911.05659

Submission history

From: Trisha Mittal [view email]
[v1] Sat, 9 Nov 2019 01:58:03 UTC (7,260 KB)
[v2] Fri, 22 Nov 2019 18:48:47 UTC (9,438 KB)

Electrical Engineering and Systems Science > Signal Processing

Title:M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Signal Processing

Title:M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators