Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods - Dria

Join the Network

Created at 9pm, Apr 4

Artificial Intelligence

0

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

R9jXvPvINWDQY3bB-yd1tqHvqTmAeF6tBLpM6Sc86nQ

File Type

PDF

Entry Count

128

Embed. Model

jina_embeddings_v2_base_en

Index Type

hnsw

Abstract—With extensive pre-trained knowledge and high-levelgeneral capabilities, large language models (LLMs) emerge as apromising avenue to augment reinforcement learning (RL) inaspects such as multi-task learning, sample efficiency, and taskplanning. In this survey, we provide a comprehensive review ofthe existing literature in LLM-enhanced RL and summarize itscharacteristics compared to conventional RL methods, aimingto clarify the research scope and directions for future studies.Utilizing the classical agent-environment interaction paradigm,we propose a structured taxonomy to systematically categorizeLLMs’ functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator.Additionally, for each role, we summarize the methodologies,analyze the specific RL challenges that are mitigated, and provideinsights into future directions. Lastly, potential applications,prospective opportunities and challenges of the LLM-enhancedRL are discussed.Index Terms—Reinforcement learning (RL), large languagemodels (LLM), vision-language models (VLM), multi-modal RL,LLM-enhanced RL.

Goal / Instruction Next Action ActionCandidates Action CandidatesReference Policy Pretrained LLM (i) Direct Decision-maker Env Fig. 5. LLM as a decision-maker.

id: fcda73c901c4bf0b9d20ffec0a6167ee - page: 9

Results show that using Wikipedia-pre-trained and GPT2 language models, there are consistent performance gains in terms of reward and convergence speed in various environments. Reference proposes to use pre-trained LLM as a general scaffold for task-specific model learning across various environments by adding goals along with observations as the input and converting them to sequential data. Experiments demonstrate that language modeling improves combinatorial generalization in policy learning and can substantially improve out-of-distribution performance in new tasks. To unify language reasoning with actions in a single policy, reference generates textual captions interleaved with actions when training the Transformer-based policy. Results show that by using captions describing the next subgoals, the reasoning policy can consistently outperform the caption-free baseline. In-context RL is one approach to g

id: 792fa9867fc0bb242e9dafc652803c94 - page: 9

Reference redesigns the offline in-context approach to successfully train long-sequence Transformers over entire rollouts to tackle the challenges of generalization, long-term memory, and metalearning. To deal with situations when data is scarce and the environment is risky, reference proposes Language Models for Motion Controls (LaMo) to employ pre-trained LLMs with LoRA fine-tuning method that augments the pretrained knowledge with in-domain knowledge. The experiments indicate LaMo achieves state-of-the-art performance in sparse-reward tasks and has performance gains in datalimited scenarios. To integrate multi-modal data, e.g., vision and language, into the offline RL, reference co-finetunes vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, e.g., visual question answering. In the framework, they incorporate the actions as natural language tokens and

id: 6cac3f347895ff6129c9fb67d11dde62 - page: 9

X, NO. X, XXX 2024 vision and language datasets. Results show that such co-finetune methods can increase generalization performance and the chain of thought reasoning can help the agent perform multistage semantic reasoning and solve complex tasks. indirect decision-making, LLM guides the decision-making by generating action candidates or instructing the policy updating directions. In the following, challenges and future directions of the two roles are listed below:

id: e7d1680631b2debd2c79a9c6141a71e6 - page: 9

How to Retrieve?

# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "R9jXvPvINWDQY3bB-yd1tqHvqTmAeF6tBLpM6Sc86nQ", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "R9jXvPvINWDQY3bB-yd1tqHvqTmAeF6tBLpM6Sc86nQ", "level": 2}'