A simulation of me: fine-tuning an LLM on 240k text messages

: Because this data contains actual private conversations, the raw .txt file is not publicly available for download to protect the privacy of the individuals involved. Other Potential Meanings

: The messages were cleaned by removing group chats and unknown contacts, then grouped into "chunks" of 200 tokens to serve as training prompts for the AI.

The most relevant reference is a project by Edward Donner , where he fine-tuned a Large Language Model (LLM) on his own private history of 240,805 text messages to create a digital simulation of himself.

: The data consisted of SMS, iMessage, and WhatsApp conversations with 288 people.