Google researchers can create an AI that thinks a lot like you after just a two-hour interview
Researchers at Stanford University paid 1,052 people $60 each to read the first two lines of The Great Gatsby to an application. That done, the AI, which looked like a 2D sprite from a SNES-era Final Fantasy game, asked participants to tell their life story. The scientists took these interviews and built them into an AI that they say replicated the participants’ behavior with 85% accuracy.
The study, titled Simulations of generating agent per 1000 peopleis a joint venture between Stanford and scientists working for Google’s DeepMind AI research lab. The idea is that creating AI agents based on random people could help politicians and businesspeople better understand the public. Why use focus groups or survey the public when you can talk to them once, spin an LLM based on that conversation, and then have their thoughts and opinions forever? Or at least as close an approximation of those thoughts and feelings as LLM can recreate.
“This work provides a foundation for new tools that can help study individual and collective behavior,” the paper’s abstract says.
“How might, for example, a diverse set of people respond to new policies and public health messages, respond to product launches, or respond to major shocks?” the paper continued. “When simulated individuals are combined into collectives, these simulations could help pilot interventions, develop complex theories capturing nuanced causal and contextual interactions, and expand our understanding of structures such as institutions and networks in fields such as economics, sociology , organizations and political science. “
All these options, based on a two-hour interview, were included in the LLM, who answered questions mostly like their real-life counterparts.
Much of the process was automated. The researchers contracted Bovitz, a market research firm, to recruit participants. The goal was to obtain a broad sample of the US population, as broad as possible when limited to 1,000 people. To complete the survey, users signed up for an account in a custom-built interface, made a 2D sprite avatar, and began talking to an AI interviewer.
The interview questions and style are a modified version of that used by American Voices Project, a joint project of Stanford and Princeton University that interviewed people from all over the country.
Each interview began with the participant reading the first two lines The Great Gatsby (“In my younger and more vulnerable years, my father gave me some advice that I’ve been turning over in my head ever since. ‘Whenever you feel like criticizing someone,’ he told me, ‘just remember that all the people in this world hasn’t had the advantages you’ve had.’”) as a way to calibrate the sound.
According to the paper, “The interview interface displayed a two-dimensional sprite avatar representing the interviewing agent in the center, with the participant’s avatar shown at the bottom walking toward a door to indicate progress. When the interviewing AI agent spoke, this was signaled by a pulsating animation of the central circle with the interviewer’s avatar.”
The two-hour interviews, on average, yielded transcripts of 6,491 words in length. They asked questions about race, gender, politics, income, social media use, their job stress and their family composition. The researchers published the interview script and the questions asked by the AI.
These transcripts, less than 10,000 words each, were then entered into another LLM, which the researchers used to spin generative agents designed to replicate the participants. The researchers then put both the participants and the AI clones through more questions and economic games to see how they would compare. “When an agent is queried, the entire interview transcript is injected into the model’s prompt, instructing the model to mimic the person based on the interview data,” the paper said.
This part of the process was as close to controlled as possible. The researchers used General Social Survey (GSS) and The Big Five Personality Inventory (BFI) to test how LLMs live up to their inspiration. He then put the participants and masters through five economic games to see how they would compare.
The results were mixed. The AI agents answered about 85% of the questions the same way as real-world GSS participants. They reached 80% of the BFI. However, the numbers plummeted when agents started playing economic games. The researchers offered real-life participants cash rewards to play games like this The Prisoner’s Dilemma and The Dictator’s Game.
In Prisoner’s Dilemma, participants can choose to work together and both succeed or screw up their partner for a chance to win big. In the dictator game, participants must choose how to allocate resources to other participants. Real-life subjects won money over the initial $60 for playing these.
Faced with these economic games, artificial intelligence human clones do not replicate their real-world counterparts. “On average, generative agents achieved a normalized correlation of 0.66,” or about 60%.
The entire paper is worth reading if you’re interested in how scientists think about AI agents and the public. It didn’t take long for the researchers to transform the personality of a human being into a similarly behaving LLM. Given time and energy, they could probably bring the two closer together.
This is worrying for me. Not because I don’t want to see the ineffable human spirit reduced to a spreadsheet, but because I know this kind of technology will be used for disease. We’ve already seen dumber LLMs trained in public records trick grandmothers into giving bank information to an AI relative after a quick phone call. What happens when these machines have a script? What happens when they have access to custom-built personas based on social media activity and other publicly available information?
What happens when a corporation or politician decides that society wants and needs something based not on their stated will but on its approximation?