The 1st CogMAEC Workshop

Cognition-oriented Multimodal Affective and Empathetic Computing

27-31 October 2025 | Dublin, Ireland | ACM Multimedia 2025

While multimodal systems excel at basic emotion recognition, they struggle to understand why we feel and how emotions evolve. This workshop pioneers cognitive AI that interprets human affect through multimodal context and causal reasoning. Join us in redefining emotional intelligence for healthcare robots, empathetic chatbots, and beyond.

Stay updated: https://CogMAEC.github.io/MM2025

Learn More

About the CogMAEC

Welcome to the 1st CogMAEC Workshop, proudly co-located with ACM Multimedia 2025!

As human-computer interaction evolves, emotional intelligence and empathy are becoming essential capabilities of intelligent systems. The CogMAEC Workshop (Cognition-oriented Multimodal Affective and Empathetic Computing) aims to push the boundaries of traditional affective computing by exploring the next frontier: cognitive emotional understanding.

While previous work in multimodal affective computing has focused on recognizing basic emotions from facial expressions, speech, and text, this workshop sets its sights on deeper challenges — understanding the "why" behind emotions, reasoning over context, and simulating human-like empathetic responses. With the recent advances in Multimodal Large Language Models (MLLMs), the time is ripe to rethink how machines perceive, reason, and respond to human emotions.

CogMAEC'25 brings together researchers and practitioners working on:

Traditional Multimodal Affective Computing
MLLM-based Multimodal Affective Computing
Cognition-oriented Multimodal Affective Computing

The workshop will cover both traditional multimodal emotion recognition techniques and cutting-edge cognition-driven methodologies. We aim to foster meaningful discussion and collaboration at the intersection of affective computing, cognitive modeling, and multimodal AI.

Join us as we collectively reimagine what emotional AI can become — not just smarter, but more human.

All workshop details, schedules, and updates can be found on our website.

Schedule

CogMAEC'25, co-organized with the MuSe workshop, runs in a hybrid format so that onsite and remote participants can engage together. The afternoon program combines invited keynotes with oral and poster presentations to spotlight cognition-oriented affective computing.

Monday, 27 October · 13:30–17:00 · Dublin Royal Convention Centre (Google Maps) · Royal CC · Higgins 2

Time	Session (All times are in Dublin, Winter Time, UTC+0)	Presenter
13:30–14:15	Keynote Talk I: Social Intelligence with LLMs: on Emotion, Mind and Cognition Bio: Dr. Minlie Huang is a professor of Tsinghua University and the deputy director of its Foundation Model Center. He was supported by the National Distinguished Young Scholar project and has won several awards in Chinese AI and information processing societies, including the Wuwenjun Technical Advancement Award and the Qianweichang Technical Innovation Award. His research fields include large-scale language models, language generation, AI safety and alignment, and social intelligence. He authored the Chinese book "Modern Natural Language Generation," published more than 200 papers in premier venues (ICML, ICLR, NeurIPS, ACL, EMNLP, etc.) with over 29,000 citations, and has been named both an Elsevier China Highly Cited Scholar since 2022 and an AI 2000 influential AI scholar since 2020. He has won several best paper awards or nominations at major international conferences (IJCAI, ACL, SIGDIAL, NLPCC, etc.) and was a key contributor to large foundation models such as ChatGLM, GLM-4.5, GLM4.1v-thinking, and CharacterGLM. He serves as an associate editor for TNNLS, TACL, CL, and TBD, has acted as senior area chair of ACL/EMNLP/IJCAI/AAAI more than ten times, and maintains a homepage at http://coai.cs.tsinghua.edu.cn/hml/. Abstract: Today’s LLM is designed as a machine tool to facilitate the efficiency, productivity, and creativity of human works. However, social intelligence, which is a significant feature of human intelligence, has been largely neglected in current research. Future AGI must have not only machine intelligence but also social intelligence. In this talk, the speaker will discuss how to embrace social intelligence with LLMs for emotion understanding, emotional support, behavior simulation, modeling cognition and theory of mind, and he will also present real-world applications for mental health.	Prof. Minlie Huang Tsinghua University
14:15–15:00	Keynote Talk II: 10 Open Challenges Steering the Future of Vision-Language-Action Models Bio: Dr. Soujanya Poria is an Associate Professor at Nanyang Technological University (NTU), Singapore. His research explores large language models, reasoning, AI safety, embodied AI, multimodal AI, and natural language processing. He completed his Ph.D. in Computer Science at the University of Stirling, UK. Before joining NTU, he worked at the Singapore University of Technology and Design as an Associate Professor and at the Institute of High Performance Computing (IHPC), ASTAR, as a Senior Scientist. Abstract:* Vision-language-action (VLA) models are quickly becoming central to embodied AI, building on the breakthroughs of large language models and vision-language models. Their promise lies in something simple yet profound: the ability to follow natural language instructions and turn them into real-world actions. In this talk, Prof. Poria will walk through ten milestones that mark the progress and challenges ahead for VLA models—ranging from multimodality and reasoning to data, evaluation, generalization across robots, efficiency, whole-body coordination, safety, intelligent agents, and human collaboration. Each of these represents both a technical challenge and a stepping stone toward truly capable embodied systems. He will also highlight emerging trends that are shaping the future: spatial understanding, modeling world dynamics, post-training refinements, and synthetic data generation. Together, these directions point to a roadmap for accelerating VLA models toward real-world deployment and broader societal impact, sparking discussion on how the community can bring VLA models from promising prototypes to widely adopted, trustworthy, and useful embodied intelligence.	Prof. Soujanya Poria Nanyang Technological University
Coffee Break & Poster Session
15:30–16:15	Keynote Talk III: Diffusion beats autoregressive in data-constrained settings. Bio: Dr. Amir Zadeh is a Staff ML Researcher at Lambda. He received his Ph.D. in Artificial Intelligence from Carnegie Mellon University with a focus on multimodal machine learning. Dr. Zadeh has published in top machine learning venues including NeurIPS, ICLR, CVPR, and ACL, and has served as an organizer, senior area chair, and committee member for leading conferences and workshops. Abstract: Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings—where training involves repeated passes over limited data—and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We find new scaling laws for diffusion models and derive a closedform expression for the critical compute threshold at which diffusion begins to outperform AR. Finally, we explain why diffusion models excel in this regime: their randomized masking objective implicitly trains over a rich distribution of token orderings, acting as an implicit data augmentation that AR’s fixed left-toright factorization lacks. Our results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm.	Dr. Amir Zadeh Lambda
16:15–16:30	Oral I: Multimodal Trait and Emotion Recognition via Agentic AI: An End-to-End Pipeline	Om Dabral
16:30–16:45	Oral II: Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation	Joonwoo Kwon
16:45–17:00	Oral III: Leveraging Concept Annotations for Trustworthy Multimodal Video Interpretation through Modality Specialization	Elisa Ancarani

Accepted Papers

#	Title	Authors
1.	PetChat: An Emotion-Aware Pet Communication System Powered by LLMs and Wearable Devices	Ziqiao Zhu, Jiachun Du, Kejun Zhang, Jingyuan Li
2.	Disentangled Representation Learning via Transformer with Graph Attention Fusion for Depression Detection	Luntian Mou, Siqi Zhen, Shasha Mao, Nan Ma
3.	Commanding the Debate Stage: Multimodal Emotion Analysis of Trump's Storytelling Strategies in the 2016 Presidential Debates	Xiuchuan Ding, Qiqi Gao
4.	Emotion Understanding under Naturalistic Stimuli via Neural Encoding and Decoding	Guandong Pan, Shaoting Tang, Zhiming Zheng, Yang Yangqian, Xin Wang, Liu Longzhao, Shi Chen
5.	Talk to Me, Like Me: Modular Personalization of Emotional AI via Behavioral Metadata, Fine-Tuning, RAG, Prompts, and Agentic Reasoning	Om Dabral, Jaspreet Singh, Hardik Sharma, Bagesh Kumar
6.	Multimodal Trait and Emotion Recognition via Agentic AI: An End-to-End Pipeline	Om Dabral, Swayam Bansal, Mridul Maheshwari, Hardik Sharma, Jaspreet Singh, Bagesh Kumar
7.	Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation	Joonwoo Kwon, Heehwan Wang, Jinwoo Lee, Sooyoung Kim, Shinjae Yoo, Yuewei Lin, Jiook Cha
8.	Fine-grained Structured Multimodal Textural Representation for Natural Human-Computer Conversation	Yansong Liu, Yuxin Lin, Yinglin Zheng, Wangzheng Shi, Mingyi Xu, Yuhang Lin, Xinqi Cai, Dong Chen, Ming Zeng
9.	Leveraging Concept Annotations for Trustworthy Multimodal Video Interpretation through Modality Specialization	Elisa Ancarani, Julie Tores, Rémy Sun, Lucile Sassatelli, Hui-Yin Wu, Frederic Precioso
10.	Unveiling Genuine Emotions: Integrating Micro-Expressions and Physiological Signals for Enhanced Emotion Recognition	Chuang Ma
11.	A Transformer-Based Multimodal Framework for Hidden Emotion Recognition through Micro-Expression and EEG Fusion	Chuang Ma

Call for Papers

We invite contributions in three categories:

1. Novel Position or Perspective Papers (4–8 pages, excl. references, archival). Forward-looking works that propose new ideas, conceptual frameworks, or identify open challenges aligned with the workshop themes. Accepted papers will appear in the CogMAEC 2025 Proceedings (co-located with MM '25).

2. Non-archival Featured Papers (title + abstract + original manuscript). Influential papers already published in top venues, or well-curated summaries of substantial prior work. These submissions are presentation-only and will not be included in the proceedings.

3. Demonstration Papers (≤ 2 pages, excl. references, archival). Short papers describing prototypes, tools, or systems that showcase practical implementations or evaluation methodologies. Accepted demos will be published in the CogMAEC 2025 Proceedings (co-located with MM '25).

All accepted submissions will be invited to present their work at the workshop.

The workshop welcomes submissions on the following topics (but not limited to):

1) Traditional Multimodal Affective Computing

Facial Expression Recognition
Speech Emotion Recognition
Audio-visual Emotion Recognition
Body Gesture Emotion Detection
Micro-expression Recognition
Multimodal Sentiment Analysis
Multimodal Emotion Recognition in Conversation
Multimodal Stance Detection
Multimodal Emotion Analysis in Memes
Multimodal Sarcasm and Irony Detection
Cross-cultural Emotion Recognition
Physiological Signal-based Emotion Recognition
Emotion-aware Dialogue Generation
Emotional Speech Synthesis
Multimodal Affective Storytelling
Affective Music Generation
Affective Facial Animation
Emotion-controlled Avatar Generation

2) MLLM-based Multimodal Affective Computing

Few-shot Emotion Recognition
Multimodal Emotion Reasoning
Multimodal Affective Hallucination Mitigation
Emotion-aware Self-supervised Representation Learning
Multimodal Affective In-context Learning
Affective Instruction Tuning for MLLMs
Multimodal Feature Extraction and Fusion
Cross-modal Affective Alignment
Cross-domain Affective Transfer Learning
Emotion-aware Visual Question Answering
Emotion-guided Text-to-Image/Video Generation
Multimodal Empathetic Dialogue Systems
Persona-driven Emotion-aware Conversational AI

3) Cognition-oriented Multimodal Affective Computing

Multimodal Implicit Sentiment Analysis
Multimodal Emotion Cause Analysis in Conversations
Multimodal Aspect-based Sentiment Analysis
Neuro-symbolic Reasoning for Emotion Understanding
Theory of Mind-based Empathy Modeling
Cognitive Load and Affect Interaction Modeling
Cross-modal Cognitive Bias Detection

Important Dates

Workshop Date

October 27-28, 2025 (AoE)

Camera Ready

August 13, 2025 (AoE)

Paper Notification

August 5, 2025 (AoE)

Paper Submission Deadline

June 30, 2025 (AoE)

Paper Submission Start

April 15, 2025 (AoE)

Website Preparation

March 30, 2025 (AoE)

Submission Guidelines

All submissions must be written in English and follow the current ACM two-column conference format. Page limits are inclusive of all content, including figures and appendices. Submissions must be anonymized by the authors for review.

Authors should use the appropriate ACM templates: the "sigconf" LaTeX template or the Interim Word Template, both available on the ACM Proceedings Template page. Alternatively, authors can prepare their submissions using Overleaf's official ACM templates.

Please use \documentclass[sigconf, screen, review, anonymous]{acmart} when preparing your LaTeX manuscript for submission and review.

Submission Site