H O W C H AT G P T U N D E R S T A N D S C O N T E X T: THE POWER OF SELF-ATTENTION

BUSINESS SUMMARYAnyone who has used ChatGPT will notice that the quality of responses of this next-generation chatbotare superior to that of older chatbots. ChatGPT and other similar apps are somehow able to produceresponses that are more coherent and well-organized and that go to the heart of the users’ instructions.This leap in quality can be in large part explained by the innovation of “self-attention”, a mechanism bywhich a machine learning model can use the context of inputs to extract and apply more informationabout language, and thus produce higher-quality outputs. In this technology explainer, we describehow the self-attention mechanism works, at a technical level, and highlight its legal implications. Withthe rise of legal disputes implicating generative AI apps and their outputs, it is imperative that legalpractitioners understand the underlying technology in order to make informed assertions and adequatelyguide clients.INTRODUCTIONLarge language models (“LLMs”) are machine learning models designed for natural language processingtasks. Generative LLMs focus on generating new text. One of the most influential and well-knowngenerative LLMs is “GPT”, a model developed by OpenAI. GPT utilizes a “transformer” modelarchitecture (the “T” in “GPT”), first defined by researchers at Google in a 2017 paper titled “AttentionIs All You Need.”1 As this title implies, the lodestar of the transformer architecture is the concept of“attention”, specifically, “self-attention”: a way for each element in a sequence to focus on otherelements in the sequence and consider their importance adaptively. This mechanism captures contextualrelationships between elements of natural language to produce a level of human-like coherence thatmakes it appear as if the model “understands” natural language.In this technology explainer, we begin with the assumption that our model has already been trained, andwe are examining how the attention mechanism works at “inference time”, that is, operating on users’inputs it has not seen before. We will describe how the transformer model, specifically the subtype usedby GPT,2 uses self-attention to produce a mathematical representation of “context” to generate new text.

# Search curl -X POST "https://search.dria.co/hnsw/search" \ -H "x-api-key: <YOUR_API_KEY>" \ -H "Content-Type: application/json" \ -d '{"rerank": true, "top_n": 10, "contract_id": "7dK4AdLB8aQJF6pv7egEeyKosJwXmKRPwFwCtsfrmRs", "query": "What is alexanDRIA library?"}' # Query curl -X POST "https://search.dria.co/hnsw/query" \ -H "x-api-key: <YOUR_API_KEY>" \ -H "Content-Type: application/json" \ -d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "7dK4AdLB8aQJF6pv7egEeyKosJwXmKRPwFwCtsfrmRs", "level": 2}'