A Unified View on Emotion Representation in Large Language Models

Abstract

Interest in leveraging Large Language Models (LLMs) for emotional support systems motivates the need to understand how these models comprehend and represent emotions internally. While recent works show the presence of emotion concepts in the hidden state representations, it’s unclear if the model has a robust representation that is consistent across different datasets. In this paper, we present a unified view to understand emotion representation in LLMs, experimenting with diverse datasets and prompts. We then evaluate the reasoning ability of the models on a complex emotion identification task. We find that LLMs have a common emotion representation in the later layers of the model, and the vectors capturing the direction of emotions extracted from these representations can be interchanged among datasets with minimal impact on performance. Our analysis of reasoning with Chain of Thought (CoT) prompting shows the limits of emotion comprehension. Therefore, despite LLMs implicitly having emotion representations, they are not equally skilled at reasoning with them in complex scenarios. This motivates the need for further research to find new approaches.

Motivation

Although several works have used interpretability techniques to understand emotion identification capabilities of LLMs, there are contradictions in the findings regarding the layers early / middle where emotion representation exists. We opine that these contradictions arise from the use of different prompts and datasets with different difficulty levels. Our intent is to analyze the setup using prompts with varying expressivity across datasets of different difficulty levels, to gain an overall understanding of LLMs' ability to identify emotions at the representation level. We then assess the reasoning ability of such models with an emotion comprehension task.

Key Findings

1) Using probing techniques, we show that layers with emotion representation depend on the instruction prompt and the clarity with which emotion is expressed in the input data. Probing Llama-3.1-8B 6-emotion results
2) There exist intrinsic emotion reading vectors that are similar across datasets (in later layers) and can be used interchangeably, revealing their foundational nature.
3) LLMs' performance on emotion reasoning tasks remains poor. We observe that CoT mostly generates reasoning traces to increase it confidence in its original answer, especially when the model is confident in it.
4) This motivates the need for methods that leverage implicit emotion representations to improve LLMs' explicit reasoning capabilities.