0
On Protecting the Data Privacy of Large Language Models (LLMs): A Survey
fkKUf5TcjeysPbqtef0DlXWJ-AXIyXrxAwHkT5pKuVU
File Type
PDF
Entry Count
150
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Large language models (LLMs) are complex artificial intelligence systems capable of understanding, generating and translating human language. They learn language patterns by analyzing large amounts of text data, allowing them to perform writing, conversation, summarizing and other language tasks. When LLMs process and generate large amounts of data, there is a risk of leaking sensitive information, which may threaten data privacy. This paper concentrates on elucidating the data privacy concerns associated with LLMs to foster a comprehensive understanding. Specifically, a thorough investigation is undertaken to delineate the spectrum of data privacy threats, encompassing both passive privacy leakage and active privacy attacks within LLMs. Subsequently, we conduct an assessment of the privacy protection mechanisms employed by LLMs at various stages, followed by a detailed examination of their efficacy and constraints. Finally, the discourse extends to delineate the challenges encountered and outline prospective directions for advancement in the realm of LLM privacy protection.

VII. PRIVACY PROTECTION IN INFERENCE During the inference process of LLMs, the issue of privacy leakage has garnered widespread attention. To address this issue, researchers have developed numerous strategies to ensure privacy security during the inference phase. In this section, we summarize the privacy protection approaches for the inference stage of LLMs, focusing on various approaches including encryption-based privacy protection approaches, privacy protection approaches through detection, and hardwarebased approaches.
id: a22898102b027d61e3425c2564a019a8 - page: 10
A. Cryptography-based Approaches 1) Homomorphic Encryption: Homomorphic encryption is a cryptographic technique that allows for computations to be performed on ciphertexts, ensuring that the result, when decrypted, is identical to the result of the same operations performed on the plaintext. This encryption method is key in enabling data to be processed while maintaining its encrypted state, adding a new dimension to data privacy and security. Homomorphic encryption is primarily categorized into three types: Partial Homomorphic Encryption (PHE): Supports one type of operation (usually addition or multiplication) on ciphertexts. Somewhat Homomorphic Encryption (SWHE): Allows a limited number of operations on ciphertexts. Fully Homomorphic Encryption (FHE): The most powerful, supporting an unlimited number of both addition and multiplication operations on ciphertexts. To better understand homomorphic encryption algorithms, we provide the following definition.
id: f98ec16360c907266ddd85d27f61ab37 - page: 10
Definition 7.1: An encryption scheme is considered homomorphic over an operation if it satisfies a specific mathematical property. Specifically, it supports the following equation: E(m1) E(m2) = E(m1 m2), m1, m2 M (2) Here, E represents the encryption algorithm, M denotes the set of all possible messages that can be encrypted, and m1 and m2 are any two messages in the scheme. The operation can be any binary operation (e.g. addition or multiplication).
id: b563faef28f194327e334f6b3c82471d - page: 10
Tech Tips: Homomorphic encryption safeguards privacy during the inference stage by encrypting both the model parameters and input data. With HE, computations can be performed directly on encrypted data, allowing the model to make predictions without decrypting sensitive information. This process ensures that neither the raw data nor the model architecture is exposed in their unencrypted form, preserving privacy throughout the inference process. Decryption of the results is only done by trusted parties possessing the decryption key, maintaining the confidentiality of the information. Additionally, HE facilitates secure outsourcing of computations to untrusted servers, enabling organizations to utilize external resources without compromising data privacy.
id: ac7466a3e35dedd0b967292e190bc7f9 - page: 10
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "fkKUf5TcjeysPbqtef0DlXWJ-AXIyXrxAwHkT5pKuVU", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "fkKUf5TcjeysPbqtef0DlXWJ-AXIyXrxAwHkT5pKuVU", "level": 2}'