How Well Do Large Language Models Capture Human Personality?

Aanisha Bhattacharyya; Yaman Kumar Singla; Rajiv Ratn Shah; Changyou Chen; Jitendra Ajmera

How Well Do Large Language Models Capture Human Personality?

Aanisha Bhattacharyya^*, Yaman Kumar Singla^*, Rajiv Ratn Shah, Changyou Chen, Jitendra Ajmera

* Equal contribution

Media and Data Science Research (MDSR), Adobe

Paper

Get in touch with us at behavior-in-the-wild@googlegroups.com

Motivation

Persona-based simulation with LLMs rests on several foundational assumptions that are rarely questioned but are central to how such systems are designed, interpreted, and deployed:

Expressivity: Increasing the descriptive richness of a persona — by adding more demographic, psychographic, or behavioral attributes — will improve behavioral fidelity and realism.
Attribute Fidelity: All combinations of persona attributes of the same size are equally simulatable; the model should handle any set of three attributes as well as any other set of three.
Specificity: Adding more attributes provides more specific behavioral grounding, improving simulation fidelity rather than degrading it.
Task Generalization: Persona definitions that work for one task or domain should generalize and remain effective across other tasks and domains.

Research Question

Do these core assumptions actually hold in practice? Specifically:

Does increasing persona complexity reliably improve behavioral fidelity and diversity in LLM-based simulation?
Are all attribute combinations of the same size equally simulatable?
Does adding more attributes always produce more specific and faithful behavioral grounding?
Do persona definitions generalize across tasks and domains?

Abstract

Large language models (LLMs) are increasingly used to simulate human populations via persona prompting, often under the assumptions that richer persona descriptions improve behavioral fidelity, similarly sized attribute combinations are equally simulatable, and persona definitions generalize across tasks. In this work, we formalize these assumptions and systematically evaluate them across multiple architectures, scales, and simulation settings. We identify a fundamental limitation we term persona manifold collapse, where increasingly expressive persona specifications lead to systematic contraction of representational and behavioral diversity. Across models, increasing persona complexity consistently reduces inter-persona separation in latent space and weakens behavioral differentiation in downstream simulation tasks. These effects persist across multiple analyses as richer personas fail to preserve human subgroup disagreement, performance varies across attribute combinations of similar size, and adding descriptive detail often degrades rather than improves simulation fidelity. Surprisingly, simple Age–Gender personas consistently outperform richly specified Ideal Customer Profiles (ICPs) across industries, achieving substantially higher downstream prediction accuracy. We find that collapse is not uniform across attributes. Certain combinations remain behaviorally stable and preserve stronger alignment with human responses, forming localized regions we term alignment bridges. Together, our results provide empirical and conceptual foundations for understanding the limits of persona-conditioned simulation, highlighting the need for representation-aware persona construction rather than increasing persona expressivity alone.

Key Results

Persona Manifold Collapse is model-agnostic: Increasing persona complexity consistently contracts the representation manifold across all tested architectures and scales. The magnitude of collapse ranges from 22.90% on Qwen3-8B-Base to 58.93% on Qwen-72B-Vision-Instruct. On Qwen-72B-Vision-Instruct, mean persona distance drops from 14.38 at Level 1 (Age–Gender) to 5.90 at the richest configuration — a reduction of nearly 60%.
Alignment amplifies collapse: Instruction-tuned models exhibit substantially stronger collapse than their base counterparts — increasing from ~35% to 59% in Qwen-72B and from ~29% to 55% in LLaMA-3.2-90B.
LLMs fail to preserve human behavioral variation: Human–LLM correlations remain consistently weak or negative across socio-political opinion (OpinionQA), moral reasoning (Moral Machine), and aesthetic preference (Website Likability) tasks. For example, GPT-4o reaches −0.37 on Website Likability, and LLaMA-3.2-90B-Vision-Instruct reaches −0.30 on Moral Machine.
Simple Age–Gender personas outperform rich ICPs: In email CTR prediction, Age–Gender personas achieve 70.00% accuracy vs. 58.57% for auto-generated ICP agents and 50.74% for expert-defined brand ICPs. This trend holds across all tweet engagement domains (technology, airlines, fashion).
No reliable generalization across tasks: Personas that appear effective in one domain do not consistently transfer to others. Similarly sized attribute combinations can have substantially different simulation fidelity depending on the task, confirming that attribute fidelity and task generalization do not hold as general properties.
Alignment Bridges: Collapse is not uniform. Certain attribute combinations — such as Education + Gender and Gender + Religious — remain behaviorally stable and preserve stronger alignment with human responses. These stable configurations exhibit inter-persona distances of up to 15.78 vs. 5.88 for collapse-prone personas on Qwen-72B-VL.

Takeaway

The foundational assumptions of persona-based LLM simulation do not generally hold. Richer, more complex personas do not reliably yield better behavioral fidelity or diversity — in fact, they often cause the opposite. Effective persona design depends less on maximizing expressivity and more on identifying behaviorally stable attribute combinations that the model can reliably represent. The field should shift toward representation-aware persona construction, focusing on stable, meaningful attribute combinations rather than increasing narrative detail or attribute count.

BibTeX

@article{bhattacharyya2025llmpersonality,
  title={How Well Do Large Language Models Capture Human Personality?},
  author={Bhattacharyya, Aanisha and Singla, Yaman Kumar and Shah, Rajiv Ratn and Chen, Changyou and Ajmera, Jitendra},
  year={2025},
  url={https://www.researchgate.net/publication/405480733_How_Well_Do_Large_Language_Models_Capture_Human_Personality}
}