Authors
Vishal Gandhi and Sagar Gandhi, Joyspace AI, USA
Abstract
Advancements in emotion-aware language processing increasingly shape vital NLP applications ranging from conversational AI and affective computing to computational psychology and creative content genera- tion. Existing emotion datasets either lack emotional granularity or fail to capture necessary stylistic diver- sity, limiting the advancement of effective emotion-conditioned text generation systems. Seeking to bridge this crucial gap between granularity and style diversity, this paper introduces a novel systematically con- structed dataset named ELSA (Emotion and Language Style Alignment Dataset)1 leveraging fine-grained emotion taxonomies adapted from existing sources (dair ai/emotion dataset and GoEmotions taxonomy). This dataset comprises multiple emotionally nuanced variations of original sentences regenerated across distinct contextual styles (conversational, formal, poetic, and narrative) using advanced Large Language Models (LLMs). Rigorous computational evaluation using metrics such as perplexity, embedding variance, readability, lexical diversity, and semantic coherence measures validates the datasets emotional authentic- ity, linguistic fluency, and textual diversity. Comprehensive metric analyses affirm its potential to support deeper explorations into emotion-conditioned style-adaptive text generation. By enabling precision-tuned emotionally nuanced language modeling, our dataset creates fertile ground for research on fine-grained emotional control, prompt-driven explanation, interpretability, and style adaptive expressive language gen- eration with LLMs.
Keywords
Emotion-aware language modeling, fine-grained emotion recognition, stylistic variation, emotion-conditioned text generation, large language models (LLMs), text augmentation, emotion and style transfer, affective text generation, emotion-centric NLP, multistyle text synthesis, Natural Language Generation (NLG)