*Github; Contributors: @k3nn.eth#0270, @Stanley#8720, @APersonofNote#3896, @cephalopod#8465, @patman needs no robin#0371*

<aside> 💡 L.I.O.N: language informatics on organizational networks

</aside>

Leo

Overview

The utility of network analysis has never been more clear than in web3. The Ethereum blockchain alone has a daily volume of over a million transactions—all of which are public record—allowing us to draw up edges connecting an ever-expanding network of nodes.

There is much to learn about the nature of web3 through this lens. For DAOs specifically, transaction data can provide insight into on-chain governance operations and degrees of decentralization.

However, many DAO operations occur off-chain. When it comes to generating insights about community-run, internet-native organizations, there is a particular piece of information that is both deeply insightful and stored off-chain: communication.

Organizational Network Analysis [ONA] looks at both network and language layers of an organization. Natural language processing [NLP] tools can be used to extract psychometric properties like engagement, turnover intent, and cultural fit by analyzing the language layer within each edge of the network. [1]

Project Lion is an experiment in taking this idea to the next level. Using DAO communication data, state-of-the-art NLP tools, and novel methodology, we envision a psychometric system that moves beyond topics, lexicons, and word frequencies to develop adaptive community-trained intelligent agents which serve to simulate the communication patterns of complex human systems.

Project Details

Classical NLP tools [e.g., nltk, spacy, gensim, empath, etc.] largely dominate the spheres of ONA work. Project Lion intends to explore the next generation of NLP tools to extend far beyond current capabilities.

By fine-tuning transformer models like GPT2, GPT3 [2], and GPT-J on Discord communication data, we believe a new layer of psychometric insight is waiting to reveal itself in the aggregate voice of a community.

We refer to this concept as the digital twin: a digital copy of a group’s language patterns summed up into a single AI voice.

We hypothesize that:

[H1] Given enough training data, GPT models are capable of responding to questions that reflect the aggregate responses of a community.
[H2] GPT models can uncover language patterns not recognized by classical NLP tools such as deviations from the community voice.
[H3] The use of GPT models for classification tasks allow for less rigid, dynamic topic modeling for supplementing ONA.

If these can be validated, Project Lion may change the way large group psychometrics are measured in the practical setting. Today, the most common approach is to use organization-wide surveys to understand the psychometric properties of large groups. If fine-tuned GPT models are sufficiently reflective of the aggregate voice of a community as hypothesized, we may be able to deploy surveys to AI agents rather than humans while maintaining relative accuracy.