27-31 October 2025 | Dublin, Ireland | ACM Multimedia 2025
While multimodal systems excel at basic emotion recognition, they struggle to understand why we feel and how emotions evolve. This workshop pioneers cognitive AI that interprets human affect through multimodal context and causal reasoning. Join us in redefining emotional intelligence for healthcare robots, empathetic chatbots, and beyond.
Stay updated: https://CogMAEC.github.io/MM2025
Welcome to the 1st CogMAEC Workshop, proudly co-located with ACM Multimedia 2025!
As human-computer interaction evolves, emotional intelligence and empathy are becoming essential capabilities of intelligent systems. The CogMAEC Workshop (Cognition-oriented Multimodal Affective and Empathetic Computing) aims to push the boundaries of traditional affective computing by exploring the next frontier: cognitive emotional understanding.
While previous work in multimodal affective computing has focused on recognizing basic emotions from facial expressions, speech, and text, this workshop sets its sights on deeper challenges — understanding the "why" behind emotions, reasoning over context, and simulating human-like empathetic responses. With the recent advances in Multimodal Large Language Models (MLLMs), the time is ripe to rethink how machines perceive, reason, and respond to human emotions.
CogMAEC'25 brings together researchers and practitioners working on:
The workshop will cover both traditional multimodal emotion recognition techniques and cutting-edge cognition-driven methodologies. We aim to foster meaningful discussion and collaboration at the intersection of affective computing, cognitive modeling, and multimodal AI.
Join us as we collectively reimagine what emotional AI can become — not just smarter, but more human.
All workshop details, schedules, and updates can be found on our website.
CogMAEC'25, co-organized with the MuSe workshop, runs in a hybrid format so that onsite and remote participants can engage together. The afternoon program combines invited keynotes with oral and poster presentations to spotlight cognition-oriented affective computing.
| Time | Session (All times are in Dublin, Winter Time, UTC+0) | Presenter |
|---|---|---|
| 13:30–14:15 |
Keynote Talk I: Social Intelligence with LLMs: on Emotion, Mind and CognitionBio: Dr. Minlie Huang is a professor of Tsinghua University and the deputy director of its Foundation Model Center. He was supported by the National Distinguished Young Scholar project and has won several awards in Chinese AI and information processing societies, including the Wuwenjun Technical Advancement Award and the Qianweichang Technical Innovation Award. His research fields include large-scale language models, language generation, AI safety and alignment, and social intelligence. He authored the Chinese book "Modern Natural Language Generation," published more than 200 papers in premier venues (ICML, ICLR, NeurIPS, ACL, EMNLP, etc.) with over 29,000 citations, and has been named both an Elsevier China Highly Cited Scholar since 2022 and an AI 2000 influential AI scholar since 2020. He has won several best paper awards or nominations at major international conferences (IJCAI, ACL, SIGDIAL, NLPCC, etc.) and was a key contributor to large foundation models such as ChatGLM, GLM-4.5, GLM4.1v-thinking, and CharacterGLM. He serves as an associate editor for TNNLS, TACL, CL, and TBD, has acted as senior area chair of ACL/EMNLP/IJCAI/AAAI more than ten times, and maintains a homepage at http://coai.cs.tsinghua.edu.cn/hml/. Abstract: Today’s LLM is designed as a machine tool to facilitate the efficiency, productivity, and creativity of human works. However, social intelligence, which is a significant feature of human intelligence, has been largely neglected in current research. Future AGI must have not only machine intelligence but also social intelligence. In this talk, the speaker will discuss how to embrace social intelligence with LLMs for emotion understanding, emotional support, behavior simulation, modeling cognition and theory of mind, and he will also present real-world applications for mental health. |
Prof. Minlie Huang Tsinghua University |
| 14:15–15:00 |
Keynote Talk II: 10 Open Challenges Steering the Future of Vision-Language-Action ModelsBio: Dr. Soujanya Poria is an Associate Professor at Nanyang Technological University (NTU), Singapore. His research explores large language models, reasoning, AI safety, embodied AI, multimodal AI, and natural language processing. He completed his Ph.D. in Computer Science at the University of Stirling, UK. Before joining NTU, he worked at the Singapore University of Technology and Design as an Associate Professor and at the Institute of High Performance Computing (IHPC), A*STAR, as a Senior Scientist. Abstract: Vision-language-action (VLA) models are quickly becoming central to embodied AI, building on the breakthroughs of large language models and vision-language models. Their promise lies in something simple yet profound: the ability to follow natural language instructions and turn them into real-world actions. In this talk, Prof. Poria will walk through ten milestones that mark the progress and challenges ahead for VLA models—ranging from multimodality and reasoning to data, evaluation, generalization across robots, efficiency, whole-body coordination, safety, intelligent agents, and human collaboration. Each of these represents both a technical challenge and a stepping stone toward truly capable embodied systems. He will also highlight emerging trends that are shaping the future: spatial understanding, modeling world dynamics, post-training refinements, and synthetic data generation. Together, these directions point to a roadmap for accelerating VLA models toward real-world deployment and broader societal impact, sparking discussion on how the community can bring VLA models from promising prototypes to widely adopted, trustworthy, and useful embodied intelligence. |
Prof. Soujanya Poria Nanyang Technological University |
| Coffee Break & Poster Session | ||
| 15:30–16:15 |
Keynote Talk III: Diffusion beats autoregressive in data-constrained settings.Bio: Dr. Amir Zadeh is a Staff ML Researcher at Lambda. He received his Ph.D. in Artificial Intelligence from Carnegie Mellon University with a focus on multimodal machine learning. Dr. Zadeh has published in top machine learning venues including NeurIPS, ICLR, CVPR, and ACL, and has served as an organizer, senior area chair, and committee member for leading conferences and workshops. Abstract: Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings—where training involves repeated passes over limited data—and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We find new scaling laws for diffusion models and derive a closedform expression for the critical compute threshold at which diffusion begins to outperform AR. Finally, we explain why diffusion models excel in this regime: their randomized masking objective implicitly trains over a rich distribution of token orderings, acting as an implicit data augmentation that AR’s fixed left-toright factorization lacks. Our results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm. |
Dr. Amir Zadeh Lambda |
| 16:15–16:30 |
Oral I: Multimodal Trait and Emotion Recognition via Agentic AI: An End-to-End Pipeline
|
Om Dabral |
| 16:30–16:45 |
Oral II: Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation
|
Joonwoo Kwon |
| 16:45–17:00 |
Oral III: Leveraging Concept Annotations for Trustworthy Multimodal Video Interpretation through Modality Specialization
|
Elisa Ancarani |
| # | Title | Authors |
|---|---|---|
| 1. | PetChat: An Emotion-Aware Pet Communication System Powered by LLMs and Wearable Devices | Ziqiao Zhu, Jiachun Du, Kejun Zhang, Jingyuan Li |
| 2. | Disentangled Representation Learning via Transformer with Graph Attention Fusion for Depression Detection | Luntian Mou, Siqi Zhen, Shasha Mao, Nan Ma |
| 3. | Commanding the Debate Stage: Multimodal Emotion Analysis of Trump's Storytelling Strategies in the 2016 Presidential Debates | Xiuchuan Ding, Qiqi Gao |
| 4. | Emotion Understanding under Naturalistic Stimuli via Neural Encoding and Decoding | Guandong Pan, Shaoting Tang, Zhiming Zheng, Yang Yangqian, Xin Wang, Liu Longzhao, Shi Chen |
| 5. | Talk to Me, Like Me: Modular Personalization of Emotional AI via Behavioral Metadata, Fine-Tuning, RAG, Prompts, and Agentic Reasoning | Om Dabral, Jaspreet Singh, Hardik Sharma, Bagesh Kumar |
| 6. | Multimodal Trait and Emotion Recognition via Agentic AI: An End-to-End Pipeline | Om Dabral, Swayam Bansal, Mridul Maheshwari, Hardik Sharma, Jaspreet Singh, Bagesh Kumar |
| 7. | Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation | Joonwoo Kwon, Heehwan Wang, Jinwoo Lee, Sooyoung Kim, Shinjae Yoo, Yuewei Lin, Jiook Cha |
| 8. | Fine-grained Structured Multimodal Textural Representation for Natural Human-Computer Conversation | Yansong Liu, Yuxin Lin, Yinglin Zheng, Wangzheng Shi, Mingyi Xu, Yuhang Lin, Xinqi Cai, Dong Chen, Ming Zeng |
| 9. | Leveraging Concept Annotations for Trustworthy Multimodal Video Interpretation through Modality Specialization | Elisa Ancarani, Julie Tores, Rémy Sun, Lucile Sassatelli, Hui-Yin Wu, Frederic Precioso |
| 10. | Unveiling Genuine Emotions: Integrating Micro-Expressions and Physiological Signals for Enhanced Emotion Recognition | Chuang Ma |
| 11. | A Transformer-Based Multimodal Framework for Hidden Emotion Recognition through Micro-Expression and EEG Fusion | Chuang Ma |
We invite contributions in three categories:
1. Novel Position or Perspective Papers (4–8 pages, excl. references, archival). Forward-looking works that propose new ideas, conceptual frameworks, or identify open challenges aligned with the workshop themes. Accepted papers will appear in the CogMAEC 2025 Proceedings (co-located with MM '25).
2. Non-archival Featured Papers (title + abstract + original manuscript). Influential papers already published in top venues, or well-curated summaries of substantial prior work. These submissions are presentation-only and will not be included in the proceedings.
3. Demonstration Papers (≤ 2 pages, excl. references, archival). Short papers describing prototypes, tools, or systems that showcase practical implementations or evaluation methodologies. Accepted demos will be published in the CogMAEC 2025 Proceedings (co-located with MM '25).
All accepted submissions will be invited to present their work at the workshop.
The workshop welcomes submissions on the following topics (but not limited to):
1) Traditional Multimodal Affective Computing
2) MLLM-based Multimodal Affective Computing
3) Cognition-oriented Multimodal Affective Computing
October 27-28, 2025 (AoE)
August 13, 2025 (AoE)
August 5, 2025 (AoE)
June 30, 2025 (AoE)
April 15, 2025 (AoE)
March 30, 2025 (AoE)
All submissions must be written in English and follow the current ACM two-column conference format. Page limits are inclusive of all content, including figures and appendices. Submissions must be anonymized by the authors for review.
Authors should use the appropriate ACM templates: the "sigconf" LaTeX template or the Interim Word Template, both available on the ACM Proceedings Template page. Alternatively, authors can prepare their submissions using Overleaf's official ACM templates.
Please use \documentclass[sigconf, screen, review, anonymous]{acmart} when preparing your LaTeX manuscript for submission and review.
We have invited the following renowned scholars in the field of cognition and affective computing
For any questions about the workshop, please contact us through:
Email:
Google Group: https://groups.google.com/g/cogmaec