Created at 11am, Apr 17
benjaminArtificial Intelligence
0
Toolformer: Language Models Can Teach Themselves to Use Tools
FvSJMhzR6OCItn-9_cqRWoZUKqnRlapyJN3tXGEy3So
File Type
PDF
Entry Count
94
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Abstract of the paper: Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.https://arxiv.org/abs/2302.04761

All AC NC % All AC NC % 0 1 3 10 34.9 0.0 47.8 53.0 44.3 40.3 52.9 58.0 29.0 82.8 53.5 54.0 22.5 98.1 34.9 18.9 18.9 19.3 17.1 19.9 26.3 26.5 6.6 26.3 26.4 0.0 8.5 99.3 100.0 Table 9: Toolformer results on the T-REx subset of LAMA and on WebQS for different values of k used during decoding. Numbers shown are overall performance (All), performance on the subset where the model decides to make an API call (AC) and all remaining examples (NC), as well as the percentage of examples for which the model decides to call an API (%). Fast train success in the fourth example that does not give any relevant information but still reduces perplexity. However, some amount of noise in the API calls that are not ltered can actually be useful as it forces the model netuned on C to not always blindly follow the results of each call it makes.
id: 5d32a055b24966d3d655bf2b93194d6c - page: 9
Data Quality We qualitatively analyze some API calls generated with our approach for different APIs. Table 10 shows some examples of texts from CCNet augmented with API calls, as well as the i L+ corresponding score L that is used as a li tering criterion, and whether the API calls made by the model are intuitively useful in the given context. As can be seen, high values of L typically correspond to useful API calls, whereas low values correspond to API calls that do not provide any information that is useful for predicting future tokens. There are some exceptions, e.g., an API call for i L+ i
id: 848632da16dd79e680bc03b451662a68 - page: 9
6 Related Work Language Model Pretraining There are various approaches that augment language models with some form of additional textual information during pretraining, including various forms of metadata (Keskar et al., 2019), HTML tags (Aghajanyan et al., 2021), Wikipedia markup (Schick et al., 2022), or related texts obtained from an information retrieval system (Guu et al., 2020; Borgeaud et al., 2021; Izacard et al., 2022). For all of these Example L i L+ i
id: 0614480894882d8e56f6e79b6234ba2c - page: 9
Useful The Flodden Window (a war memorial dedicated to The Middleton Archers), in the Grade I-listed Church of St Leonard in Middleton is said to be the oldest war memorial in the United Kingdom. <API> WikiSearch(War memorial Flodden) Battle of Flodden > Commemoration > The stained-glass Flodden Window in Middleton Parish Church [. . . ] was constructed by Sir Richard Assheton in memory of the Battle of Flodden and the archers from Middleton who fought in it. </API> Sir Richard Assheton of Middleton (who built St Leonard) was granted knighthood [. . . ] 5.49 (cid:51) Note: The WL will be open on Friday, <API> Calendar() Today is Thursday, March 9, 2017. </API> March 10, and Sunday, March 19 for regular hours. 2.11 (cid:51) The Nile has an approximate length of <API> QA(What is the approximate length of the Nile?) 6,853 km </API> 6,853 kilometers, the White Nile being its main source. 2.08 (cid:51)
id: 5184bfc91199ec84730d93cfed021940 - page: 10
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "FvSJMhzR6OCItn-9_cqRWoZUKqnRlapyJN3tXGEy3So", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "FvSJMhzR6OCItn-9_cqRWoZUKqnRlapyJN3tXGEy3So", "level": 2}'