D2K: Turning Historical Data into Retrievable Knowledge for Recommender Systems - Dria

Join the Network

Created at 6am, Jan 23

Artificial Intelligence

1

D2K: Turning Historical Data into Retrievable Knowledge for Recommender Systems

-CXywEt1uvBpugv3NUTQ_h2GtWLubihV0y4kEXsiyqA

File Type

PDF

Entry Count

85

Embed. Model

jina_embeddings_v2_base_en

Index Type

hnsw

Abstract of the Paper: A vast amount of user behavior data is constantly accumulating ontoday’s large recommendation platforms, recording users’ variousinterests and tastes. Preserving knowledge from the old data whilenew data continually arrives is a vital problem for recommendersystems. Existing approaches generally seek to save the knowledgeimplicitly in the model parameters. However, such a parametercentric approach lacks scalability and flexibility—the capacity ishard to scale, and the knowledge is inflexible to utilize. Hence, inthis work, we propose a framework that turns massive user behavior data to retrievable knowledge (D2K). It is a data-centric approachthat is model-agnostic and easy to scale up. Different from onlystoring unary knowledge such as the user-side or item-side information, D2K proposes to store ternary knowledge for recommendation,which is determined by the complete recommendation factors—user, item, and context. The knowledge retrieved by target samplescan be directly used to enhance the performance of any recommendation algorithms. Specifically, we introduce a Transformer-basedknowledge encoder to transform the old data into knowledge withthe user-item-context cross features. A personalized knowledgeadaptation unit is devised to effectively exploit the informationfrom the knowledge base by adapting the retrieved knowledge tothe target samples. Extensive experiments on two public datasetsshow that D2K significantly outperforms existing baselines and iscompatible with a major collection of recommendation algorithms.Original Paper: https://arxiv.org/pdf/2401.11478.pdf

Jiarui Qin, et al. in D2K provides more useful information to the recommendation models than the unary knowledge. The coreset selection baselines Random & SVP-CF are simple yet effective methods to preserve knowledge because, in most cases, they perform better than Fixed Window (R) on recent data. The coreset methods even produce better results than the other more complex methods on the AD dataset. However, the coreset methods are based heavily on heuristics such as predefined data selection criteria. And they drop a large portion of original data, which could lead to severe information loss. The time & space overhead of D2K is shown in Appendix A.3. And to further demonstrate the usefulness of D2K, we also test the performance of only using the "direct knowledge" (Definition 1) to produce predictions without original input in Appendix A.4.

id: f14ba3f23e26d6275ffa80f5cebf9c2a - page: 6

3.3 Ablation Study (RQ2) In the ablation study section, we mainly analyze the personalized knowledge adaptation unit and the different ways of injecting the knowledge into a recommendation model.

id: 84a7cf0adef1ac58667b9bbe1f4e4932 - page: 6

3.3.1 Personalized Knowledge Adaptation Unit. To verify the effectiveness of the proposed knowledge adaptation method in Section 2.4.3, we develop four different variants of D2K implementation as shown in Table 1 and Table 2. By comparing the results of the different variants, we have the following observations: (1) D2Kbase does not use the adaptation unit. Thus it performs worse than the other three variants in most cases. This result shows that the adaptation unit is essential to the performance, and we have to make the global knowledge adaptive to the current target sample. (2) D2K-adp-sep utilizes a separate embedding table for the input in the adaptation unit to avoid interference from the original input. D2K-adp-small uses a smaller embedding size than D2K-adp-sep to reduce the number of additional parameters introduced by the adaptation unit. By comparing the results of D2K-adp-sep/D2K-adpsmall and D2K-adp-share, we cannot say for sure that incorporating a separate embe

id: 52eb528dee6ac4e5ee66896bc38d96d1 - page: 6

Even if separate embedding is good for performance, it will cost much more GPU memory consumption than the shared embedding variant. Thus we believe using shared embedding is a better practice for the personalized knowledge adaptation unit.

id: a9bce483f6f8538ff4a03f038cbef286 - page: 6

How to Retrieve?

# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "-CXywEt1uvBpugv3NUTQ_h2GtWLubihV0y4kEXsiyqA", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "-CXywEt1uvBpugv3NUTQ_h2GtWLubihV0y4kEXsiyqA", "level": 2}'