Created at 1pm, Dec 29
benjaminArtificial Intelligence
0
RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation
HoYKS_8cc8jN4cp-Fc4GaFgCVc6KxaMACJenAOOoO9I
File Type
PDF
Entry Count
94
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Abstract—Large language models (LLMs) have demonstrated remarkable capabilities and have been extensively deployed across various domains, including recommender systems. Nu- merous studies have employed specialized prompts to harness the in-context learning capabilities intrinsic to LLMs. For example, LLMs are prompted to act as zero-shot rankers for listwise ranking, evaluating candidate items generated by a retrieval model for recommendation. Recent research further use instruc- tion tuning technique to align LLM with human preference for more promising recommendations. Despite its potential, current research overlooks the integration of multiple ranking tasks to enhance model performance. Moreover, the signal from the conventional recommendation model is not integrated into the LLM, limiting the current system performance.In this paper, we introduce RecRanker, tailored for in- struction tuning LLM to serve as the Ranker for top-k Recommendations. Specifically, we introduce importance-aware sampling, clustering-based sampling, and penalty for repetitive sampling for sampling high-quality, representative, and diverseusers as training data. To enhance the prompt, we introduce a po- sition shifting strategy to mitigate position bias and augment the prompt with auxiliary information from conventional recommen- dation models, thereby enriching the contextual understanding of the LLM. Subsequently, we utilize the sampled data to assem- ble an instruction-tuning dataset with the augmented prompt comprising three distinct ranking tasks: pointwise, pairwise, and listwise rankings. We further propose a hybrid ranking method to enhance the model performance by ensembling these ranking tasks. Our empirical evaluations demonstrate the effectiveness of our proposed RecRanker in both direct and sequential recommendation scenarios.

Depending on the values of these coefficients, the hybrid ranking can effectively mimic any of the individual ranking methods, thus providing flexibility in the recommendation approach. For the pointwise ranking task, the utility score, Upointwise, is initially determined by the relevance score from the LLM prediction. To refine this score and differentiate between items with identical ratings, an additional utility score from the retrieval model is incorporated, denoted as Uretrieval = m C1. Here, C1 is a constant and m, representing the items position as determined by the retrieval model, varies from 1 to k (total number of candidate items). Therefore, the comprehensive utility score for the pointwise ranking task is Upointwise = Uretrieval +L(P). In the pairwise ranking scenario, preferred items by LLM are attributed a utility score Upairwise = C2, where C2 is a constant. For listwise ranking, the formula Ulistwise =
id: 94c4c7ef0cd2a2111553a505f887f926 - page: 7
This formula assigns scores across the list of items, integrating the listwise perspective into the hybrid approach.
id: aecb26631983cf586c0f930d03afaf4f - page: 7
V. EXPERIMENT TABLE II: Dataset Description. Dataset # of User # of Item # of Rating Density ML-100K ML-1M BookCrossing 943 6,040 77,805 1,682 3,706 185,973 100,000 1,000,209 433,671 0.063046 0.044683 0.000030 to which integrating the introduced model can improve the performance of current recommendation systems. Therefore, we conduct comprehensive experiments to answer the following research questions: RQ1: Does our proposed RecRanker framework enhance The primary goal is to investigate the extent the performance of existing recommendation models? (6) RQ2: What impact do importance aware sampling and enhanced prompt have on the quality of recommendation respectively? RQ3: How do various hyper-parameters influence the overall performance of the framework? RQ4: How does the instruction-tuned model compare to other LLMs, such as GPT?
id: 9da641db56d749bc1018d6d88f6117c9 - page: 7
A. Experimental Setup 1) Dataset: Following , we rigorously evaluate the performance of our proposed framework by employing three heterogeneous, real-world datasets. MovieLens2 dataset is utilized as a standard benchmark in movie recommendation systems. We explore two subsets of this dataset: MovieLens-100K, containing 100,000 user-item ratings, and MovieLens-1M, which expands to approximately 1 million ratings. BookCrossing3 dataset comprises user-submitted book ratings on a 1 to 10 scale and includes metadata such as Book-Author and Book-Title. The key statistics of these datasets are detailed in Table II.
id: 9e0a90b58e7a8fbf3276fc4f2ac9861f - page: 7
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "HoYKS_8cc8jN4cp-Fc4GaFgCVc6KxaMACJenAOOoO9I", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "HoYKS_8cc8jN4cp-Fc4GaFgCVc6KxaMACJenAOOoO9I", "level": 2}'