The 1st CogMAEC Workshop

Cognition-oriented Multimodal Affective and Empathetic Computing

27-31 October 2025 | Dublin, Ireland | ACM Multimedia 2025

While multimodal systems excel at basic emotion recognition, they struggle to understand why we feel and how emotions evolve. This workshop pioneers cognitive AI that interprets human affect through multimodal context and causal reasoning. Join us in redefining emotional intelligence for healthcare robots, empathetic chatbots, and beyond.

Stay updated: https://CogMAEC.github.io/MM2025

Learn More

About the CogMAEC

Welcome to the 1st CogMAEC Workshop, proudly co-located with ACM Multimedia 2025!

As human-computer interaction evolves, emotional intelligence and empathy are becoming essential capabilities of intelligent systems. The CogMAEC Workshop (Cognition-oriented Multimodal Affective and Empathetic Computing) aims to push the boundaries of traditional affective computing by exploring the next frontier: cognitive emotional understanding.

While previous work in multimodal affective computing has focused on recognizing basic emotions from facial expressions, speech, and text, this workshop sets its sights on deeper challenges — understanding the "why" behind emotions, reasoning over context, and simulating human-like empathetic responses. With the recent advances in Multimodal Large Language Models (MLLMs), the time is ripe to rethink how machines perceive, reason, and respond to human emotions.

CogMAEC'25 brings together researchers and practitioners working on:

  • Traditional Multimodal Affective Computing
  • MLLM-based Multimodal Affective Computing
  • Cognition-oriented Multimodal Affective Computing

The workshop will cover both traditional multimodal emotion recognition techniques and cutting-edge cognition-driven methodologies. We aim to foster meaningful discussion and collaboration at the intersection of affective computing, cognitive modeling, and multimodal AI.

Join us as we collectively reimagine what emotional AI can become — not just smarter, but more human.

All workshop details, schedules, and updates can be found on our website.

Schedule

CogMAEC'25, co-organized with the MuSe workshop, runs in a hybrid format so that onsite and remote participants can engage together. The afternoon program combines invited keynotes with oral and poster presentations to spotlight cognition-oriented affective computing.

Monday, 27 October · 13:30–17:00 · Dublin Royal Convention Centre (Google Maps) · Royal CC · Higgins 2

Time Session (All times are in Dublin, Winter Time, UTC+0) Presenter
13:30–14:15
Keynote Talk I: Social Intelligence with LLMs: on Emotion, Mind and Cognition

Bio: Dr. Minlie Huang is a professor of Tsinghua University and the deputy director of its Foundation Model Center. He was supported by the National Distinguished Young Scholar project and has won several awards in Chinese AI and information processing societies, including the Wuwenjun Technical Advancement Award and the Qianweichang Technical Innovation Award. His research fields include large-scale language models, language generation, AI safety and alignment, and social intelligence. He authored the Chinese book "Modern Natural Language Generation," published more than 200 papers in premier venues (ICML, ICLR, NeurIPS, ACL, EMNLP, etc.) with over 29,000 citations, and has been named both an Elsevier China Highly Cited Scholar since 2022 and an AI 2000 influential AI scholar since 2020. He has won several best paper awards or nominations at major international conferences (IJCAI, ACL, SIGDIAL, NLPCC, etc.) and was a key contributor to large foundation models such as ChatGLM, GLM-4.5, GLM4.1v-thinking, and CharacterGLM. He serves as an associate editor for TNNLS, TACL, CL, and TBD, has acted as senior area chair of ACL/EMNLP/IJCAI/AAAI more than ten times, and maintains a homepage at http://coai.cs.tsinghua.edu.cn/hml/.

Abstract: Today’s LLM is designed as a machine tool to facilitate the efficiency, productivity, and creativity of human works. However, social intelligence, which is a significant feature of human intelligence, has been largely neglected in current research. Future AGI must have not only machine intelligence but also social intelligence. In this talk, the speaker will discuss how to embrace social intelligence with LLMs for emotion understanding, emotional support, behavior simulation, modeling cognition and theory of mind, and he will also present real-world applications for mental health.

Prof. Minlie Huang
Tsinghua University
14:15–15:00
Keynote Talk II: 10 Open Challenges Steering the Future of Vision-Language-Action Models

Bio: Dr. Soujanya Poria is an Associate Professor at Nanyang Technological University (NTU), Singapore. His research explores large language models, reasoning, AI safety, embodied AI, multimodal AI, and natural language processing. He completed his Ph.D. in Computer Science at the University of Stirling, UK. Before joining NTU, he worked at the Singapore University of Technology and Design as an Associate Professor and at the Institute of High Performance Computing (IHPC), A*STAR, as a Senior Scientist.

Abstract: Vision-language-action (VLA) models are quickly becoming central to embodied AI, building on the breakthroughs of large language models and vision-language models. Their promise lies in something simple yet profound: the ability to follow natural language instructions and turn them into real-world actions. In this talk, Prof. Poria will walk through ten milestones that mark the progress and challenges ahead for VLA models—ranging from multimodality and reasoning to data, evaluation, generalization across robots, efficiency, whole-body coordination, safety, intelligent agents, and human collaboration. Each of these represents both a technical challenge and a stepping stone toward truly capable embodied systems. He will also highlight emerging trends that are shaping the future: spatial understanding, modeling world dynamics, post-training refinements, and synthetic data generation. Together, these directions point to a roadmap for accelerating VLA models toward real-world deployment and broader societal impact, sparking discussion on how the community can bring VLA models from promising prototypes to widely adopted, trustworthy, and useful embodied intelligence.

Prof. Soujanya Poria
Nanyang Technological University
Coffee Break & Poster Session
15:30–16:15
Keynote Talk III: Diffusion beats autoregressive in data-constrained settings.

Bio: Dr. Amir Zadeh is a Staff ML Researcher at Lambda. He received his Ph.D. in Artificial Intelligence from Carnegie Mellon University with a focus on multimodal machine learning. Dr. Zadeh has published in top machine learning venues including NeurIPS, ICLR, CVPR, and ACL, and has served as an organizer, senior area chair, and committee member for leading conferences and workshops.

Abstract: Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings—where training involves repeated passes over limited data—and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We find new scaling laws for diffusion models and derive a closedform expression for the critical compute threshold at which diffusion begins to outperform AR. Finally, we explain why diffusion models excel in this regime: their randomized masking objective implicitly trains over a rich distribution of token orderings, acting as an implicit data augmentation that AR’s fixed left-toright factorization lacks. Our results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm.

Dr. Amir Zadeh
Lambda
16:15–16:30
Oral I: Multimodal Trait and Emotion Recognition via Agentic AI: An End-to-End Pipeline
Om Dabral
16:30–16:45
Oral II: Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation
Joonwoo Kwon
16:45–17:00
Oral III: Leveraging Concept Annotations for Trustworthy Multimodal Video Interpretation through Modality Specialization
Elisa Ancarani

Accepted Papers

# Title Authors
1. PetChat: An Emotion-Aware Pet Communication System Powered by LLMs and Wearable Devices Ziqiao Zhu, Jiachun Du, Kejun Zhang, Jingyuan Li
2. Disentangled Representation Learning via Transformer with Graph Attention Fusion for Depression Detection Luntian Mou, Siqi Zhen, Shasha Mao, Nan Ma
3. Commanding the Debate Stage: Multimodal Emotion Analysis of Trump's Storytelling Strategies in the 2016 Presidential Debates Xiuchuan Ding, Qiqi Gao
4. Emotion Understanding under Naturalistic Stimuli via Neural Encoding and Decoding Guandong Pan, Shaoting Tang, Zhiming Zheng, Yang Yangqian, Xin Wang, Liu Longzhao, Shi Chen
5. Talk to Me, Like Me: Modular Personalization of Emotional AI via Behavioral Metadata, Fine-Tuning, RAG, Prompts, and Agentic Reasoning Om Dabral, Jaspreet Singh, Hardik Sharma, Bagesh Kumar
6. Multimodal Trait and Emotion Recognition via Agentic AI: An End-to-End Pipeline Om Dabral, Swayam Bansal, Mridul Maheshwari, Hardik Sharma, Jaspreet Singh, Bagesh Kumar
7. Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation Joonwoo Kwon, Heehwan Wang, Jinwoo Lee, Sooyoung Kim, Shinjae Yoo, Yuewei Lin, Jiook Cha
8. Fine-grained Structured Multimodal Textural Representation for Natural Human-Computer Conversation Yansong Liu, Yuxin Lin, Yinglin Zheng, Wangzheng Shi, Mingyi Xu, Yuhang Lin, Xinqi Cai, Dong Chen, Ming Zeng
9. Leveraging Concept Annotations for Trustworthy Multimodal Video Interpretation through Modality Specialization Elisa Ancarani, Julie Tores, Rémy Sun, Lucile Sassatelli, Hui-Yin Wu, Frederic Precioso
10. Unveiling Genuine Emotions: Integrating Micro-Expressions and Physiological Signals for Enhanced Emotion Recognition Chuang Ma
11. A Transformer-Based Multimodal Framework for Hidden Emotion Recognition through Micro-Expression and EEG Fusion Chuang Ma

Call for Papers

We invite contributions in three categories:

1. Novel Position or Perspective Papers (4–8 pages, excl. references, archival). Forward-looking works that propose new ideas, conceptual frameworks, or identify open challenges aligned with the workshop themes. Accepted papers will appear in the CogMAEC 2025 Proceedings (co-located with MM '25).

2. Non-archival Featured Papers (title + abstract + original manuscript). Influential papers already published in top venues, or well-curated summaries of substantial prior work. These submissions are presentation-only and will not be included in the proceedings.

3. Demonstration Papers (≤ 2 pages, excl. references, archival). Short papers describing prototypes, tools, or systems that showcase practical implementations or evaluation methodologies. Accepted demos will be published in the CogMAEC 2025 Proceedings (co-located with MM '25).

All accepted submissions will be invited to present their work at the workshop.

The workshop welcomes submissions on the following topics (but not limited to):

1) Traditional Multimodal Affective Computing

  • Facial Expression Recognition
  • Speech Emotion Recognition
  • Audio-visual Emotion Recognition
  • Body Gesture Emotion Detection
  • Micro-expression Recognition
  • Multimodal Sentiment Analysis
  • Multimodal Emotion Recognition in Conversation
  • Multimodal Stance Detection
  • Multimodal Emotion Analysis in Memes
  • Multimodal Sarcasm and Irony Detection
  • Cross-cultural Emotion Recognition
  • Physiological Signal-based Emotion Recognition
  • Emotion-aware Dialogue Generation
  • Emotional Speech Synthesis
  • Multimodal Affective Storytelling
  • Affective Music Generation
  • Affective Facial Animation
  • Emotion-controlled Avatar Generation

2) MLLM-based Multimodal Affective Computing

  • Few-shot Emotion Recognition
  • Multimodal Emotion Reasoning
  • Multimodal Affective Hallucination Mitigation
  • Emotion-aware Self-supervised Representation Learning
  • Multimodal Affective In-context Learning
  • Affective Instruction Tuning for MLLMs
  • Multimodal Feature Extraction and Fusion
  • Cross-modal Affective Alignment
  • Cross-domain Affective Transfer Learning
  • Emotion-aware Visual Question Answering
  • Emotion-guided Text-to-Image/Video Generation
  • Multimodal Empathetic Dialogue Systems
  • Persona-driven Emotion-aware Conversational AI

3) Cognition-oriented Multimodal Affective Computing

  • Multimodal Implicit Sentiment Analysis
  • Multimodal Emotion Cause Analysis in Conversations
  • Multimodal Aspect-based Sentiment Analysis
  • Neuro-symbolic Reasoning for Emotion Understanding
  • Theory of Mind-based Empathy Modeling
  • Cognitive Load and Affect Interaction Modeling
  • Cross-modal Cognitive Bias Detection

Important Dates

Workshop Date

October 27-28, 2025 (AoE)

Camera Ready

August 13, 2025 (AoE)

Paper Notification

August 5, 2025 (AoE)

Paper Submission Deadline

June 30, 2025 (AoE)

Paper Submission Start

April 15, 2025 (AoE)

Website Preparation

March 30, 2025 (AoE)

Submission Guidelines

All submissions must be written in English and follow the current ACM two-column conference format. Page limits are inclusive of all content, including figures and appendices. Submissions must be anonymized by the authors for review.

Authors should use the appropriate ACM templates: the "sigconf" LaTeX template or the Interim Word Template, both available on the ACM Proceedings Template page. Alternatively, authors can prepare their submissions using Overleaf's official ACM templates.

Please use \documentclass[sigconf, screen, review, anonymous]{acmart} when preparing your LaTeX manuscript for submission and review.

Invited Speakers

We have invited the following renowned scholars in the field of cognition and affective computing

Prof. Minlie Huang

Prof. Minlie Huang

Tsinghua University

Prof. Soujanya Poria

Prof. Soujanya Poria

Nanyang Technological University

Organizers

Hao Fei

Hao Fei

National University of Singapore

Bobo Li

Bobo Li

National University of Singapore

Meng Luo

Meng Luo

National University of Singapore

Qian Liu

Qian Liu

University of Auckland

Lizi Liao

Lizi Liao

Singapore Management University

Fei Li

Fei Li

Wuhan University

Min Zhang

Min Zhang

Harbin Institute of Technology (Shenzhen)

Björn W. Schuller

Björn W. Schuller

Imperial College London

Mong-Li Lee

Mong-Li Lee

National University of Singapore

Erik Cambria

Erik Cambria

Nanyang Technological University

Contact

For any questions about the workshop, please contact us through:

Email:

Google Group: https://groups.google.com/g/cogmaec