LCBM

[paper link]

Large Content and Behavior Models to Understand, Simulate, and Optimize Content and Behavior

Shannon and Weaver's seminal information theory divides communication into three levels: technical, semantic, and effectiveness. While the technical level deals with the accurate reconstruction of transmitted symbols, the semantic and effectiveness levels deal with the inferred meaning and its effect on the receiver. Large Language Models (LLMs), with their wide generalisability, make some progress towards the second level. However, LLMs and other communication models are not conventionally designed for predicting and optimising communication for desired receiver behaviours and intents. As a result, the effectiveness level remains largely untouched by modern communication systems. In this paper, we introduce the receivers' "behavior tokens", such as shares, likes, clicks, purchases, and retweets, in the LLM's training corpora to optimize content for the receivers and predict their behaviors. Our trained models, other than showing similar performance to LLMs on content understanding tasks, show generalization capabilities on the behavior dimension for behavior simulation, content simulation, behavior understanding, and behavior domain adaptation. Using a wide range of tasks on three corpora, we show results on all these capabilities. We call these models Large Content and Behavior Models (LCBMs). Further, to spur more research on LCBMs, we release our new Content Behavior Corpus (CBC), a repository containing communicator, message, and corresponding receiver behavior.

Image
Dataset coming soon...

Contact yamank@iiitd.ac.in for questions and suggestions.