MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction - Dria

Join the Network

Created at 12pm, Mar 28

Artificial Intelligence

0

MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction

qa043VsRv_haEyexFWGz2gTYdDNBbNAeZWp-5S33_FI

File Type

DOCX

Entry Count

42

Embed. Model

jina_embeddings_v2_base_en

Index Type

hnsw

The rise of Machine Learning as a Service (MLaaS) has led to the widespread deployment of machine learning models trained on diverse datasets. These models are employed for predictive services through APIs, raising concerns about the security and confidentiality of the models due to emerging vulnerabilities in prediction APIs. Of particular concern are model cloning attacks, where individuals with limited data and no knowledge of the training dataset manage to replicate a victim model’s functionality through black-box query access. This commonly entails generating adversarial queries to query the victim model, thereby creating a labeled dataset. This paper proposes \'MisGUIDE\', a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process by providing a probabilistic response when the query is deemed OOD. The first step employs a Vision Transformer-based framework to identify OOD queries, while the second step perturbs the response for such queries, introducing a probabilistic loss function to MisGUIDE the attackers. The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries. Extensive experiments conducted on two benchmark datasets demonstrate that the proposed framework significantly enhances the resistance against state-of-the-art data-free model extraction in black-box settings.

Equation 1 presents the joint loss function, designated as LG, which is the aggregate of the new disagreement loss LD and the class diversity loss Ldiv, with a weighting factor of . LG = LD +Ldiv (4) 2.3.4 Defender goal In regards to model cloning attacks, the defender has to stop an adversary from being able to clone the victim model. The objective of a defender is to reduce the test accuracy of the cloned model while maintain high accuracy for benign users of the service. The constraints can be expressed by setting a threshold, T, for model accuracy on in-distribution cases. The Defender goal can be formulate to minimize the accuracy of cloning model Acc[M v(x; )] on the victims target distribution D(X) with threshold T. min ExD(X) [M v(x; )] (5) ExD(X) [M v(x; )] T (6) Equations 5 and 6 present a co

id: 595fc6d91c5695edeac05fbdc6ec5042 - page: 6

Defending within these constraints, such as our suggested defense, is called accuracy-constrained.These techniques offer enhanced security, without compromising classification accuracy. 3

id: 919fd9d5b69d085b12615a374964c94a - page: 6

Proposed Methodology In this section, proposed countermeasure MisGUIDE framework to defend the model extraction attacks is described. Subsection 3.1 provides a brief overview to proposed MisGUIDE framework, while Subsections 3.2 and 3.3 describe the two main components of MisGUIDE: the vision transformer acting as an OOD detector and the probabilistic threshold criteria. 3.1 MisGUIDE Framework The MisGUIDE defense mechanism relies on the insight that contemporary model extraction attacks leverage a generative framework to create new query samples from a distribution using random noise, notably generating numerous OOD samples for victim model queries. The key principle of MisGUIDE is to employ a Vision Transformer as an OOD detector to detect potentially malicious query from adversaries. In accordanc

id: 72f63ee5d15a29cc531ecaaa129023cb - page: 7

Figure 1: MisGUIDE framework, highlighting its core components: the Victim Model (M), an OOD Detector, a Misguiding Function introducing controlled randomness, and a Switch Mechanism dynamically deciding accurate or intentionally incorrect predictions. with the proposed probabilistic threshold, MisGUIDE deliberatly furnishes inaccurate predictions for queries identified as OOD, preserving accurate predictions for in-distribution (ID) queries. This strategy leads to the mislabeling of a substantial portion of the adversarys dataset. Training a model on this mislabeled dataset yields a clone model with poor generalization accuracy. Figure 1 provides a visual depiction of proposed MisGUIDE defense framework. The defense comprises four integral components: (1) Victim Model M, (2) an OOD detector, (3) a misguiding probabilistic threshold, and (4) a response switching mechanism. The switch mechanism determines whether the system should respond with a correct or incorrect prediction to the in

id: 08ee50d49dcce521c4ca381c1534ab53 - page: 7

How to Retrieve?

# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "qa043VsRv_haEyexFWGz2gTYdDNBbNAeZWp-5S33_FI", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "qa043VsRv_haEyexFWGz2gTYdDNBbNAeZWp-5S33_FI", "level": 2}'