Created at 3pm, Mar 24
t2ruvaArtificial Intelligence
0
Is ChatGPT a Good Recommender?
9g4SHd8Oh66TfKg-LJaQxoWuHv_1_DGw3eHU7Pq50wk
File Type
PDF
Entry Count
86
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Recommendation systems have witnessed significant advancementsand have been widely used over the past decades. However, mosttraditional recommendation methods are task-specific and therefore lack efficient generalization ability. Recently, the emergenceof ChatGPT has significantly advanced NLP tasks by enhancingthe capabilities of conversational models. Nonetheless, the application of ChatGPT in the recommendation domain has not beenthoroughly investigated. In this paper, we employ ChatGPT as ageneral-purpose recommendation model to explore its potentialfor transferring extensive linguistic and world knowledge acquiredfrom large-scale corpora to recommendation scenarios. Specifically,we design a set of prompts and evaluate ChatGPT’s performanceon five recommendation scenarios, including rating prediction,sequential recommendation, direct recommendation, explanationgeneration, and review summarization. Unlike traditional recommendation methods, we do not fine-tune ChatGPT during the entireevaluation process, relying only on the prompts themselves to convert recommendation tasks into natural language tasks. Further,we explore the use of few-shot prompting to inject interaction information that contains user potential interest to help ChatGPTbetter understand user needs and interests. Comprehensive experimental results on Amazon Beauty dataset show that ChatGPT hasachieved promising results in certain tasks and is capable of reaching the baseline level in others. We conduct human evaluationson two explainability-oriented tasks to more accurately evaluatethe quality of contents generated by different models. The humanevaluations show ChatGPT can truly understand the provided information and generate clearer and more reasonable results. Wehope that our study can inspire researchers to further explore thepotential of language models like ChatGPT to improve recommendation performance and contribute to the advancement of the recommendation systems field. The prompts and codes are availablein https://github.com/williamliujl/LLMRec.

CIKM 23, October 2125, 2023, Birmingham, United Kingdom dataset contains the customer review text with accompanying metadata on 29 categories of products. This paper focuses on evaluating the Beauty category.
id: 3311b3d0e196a7b34fd8bf0bbd36d550 - page: 6
4.1.2 Metrics. In numerical evaluations, we employ Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) for rating prediction. And we adopt top-k Hit Ratio (HRk), top-k Normalized Discounted Cumulative Gain (NDCGk) for sequential recommendation and direct recommendation which are widely used in related works [19, 68]. Specifically, we report results on HR{1,5,10}, NCGG{5,10} for evaluation. Besides, n-gram Bilingual Evaluation Understudy (BLEU-n) and n-gram Recall-Roiented Understudy for Gising Evaluation (ROUGE-n) are used to evaluate the explanation generation and review summarization tasks. In human evaluations, we have designed and deployed a crowdsourcing task to assess the qualities of the generated explanations and review summaries. Through this task, we aim to accurately evaluate the effectiveness of the content by gathering feedback from a diverse range of human evaluators.
id: fd428e2177b2b298d16dbcf2572a7e9a - page: 6
Implementation Details. In order to verify that we can di4.1.3 rectly apply the knowledge learned by ChatGPT to recommendation scenarios without the need for a large amount of task-specific data for training, we apply gpt-3.5-turbo to conduct few-shot and zeroshot experiments for the five tasks mentioned above. We collect n items that users have interacted with and k shots of historical records to enable ChatGPT to learn users interests implicitly. In this experiment, we use the titles of the items as meta information, and set = 10 and = 3 due to the limitation of a maximum context length of 4096 tokens in ChatGPT. We randomly sample 100 records from the test set proposed by P5 for evaluation. For direct recommendation, we set the number of negative samples to 99, thus forming a candidate list of length 100 with one positive item. Also, due to the addition of the candidate pool in the request, we set the number of shots to 1. For sequential recommendation, we input the u
id: 4c1a605a86bf94b1990b9a453ee5fa37 - page: 6
For human evaluation on explanation generation and review summarization, we sample some results of different methods for each task, and each result will be scored and ranked by three human evaluators. After obtaining the manually annotated results, we will calculate the average top1 ratio and average ranking position of different methods to measure their generation performance.
id: dba2990d0b27f39e2c875e7a69b15d13 - page: 6
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "9g4SHd8Oh66TfKg-LJaQxoWuHv_1_DGw3eHU7Pq50wk", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "9g4SHd8Oh66TfKg-LJaQxoWuHv_1_DGw3eHU7Pq50wk", "level": 2}'