Created at 4pm, Apr 4
Ms-RAGArtificial Intelligence
0
Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds
NkdpRPoeS5GRnCtmvH0yXIZkJrGVlJJSgO6cwzhzge8
File Type
PDF
Entry Count
62
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Kamalika Chaudhuri1, Chuan Guo1, Laurens van der Maaten1, Saeed Mahloujifar1,Mark Tygert11- Fundamental Artificial Intelligence Research at MetaProtecting privacy during inference with deep neural networks is possible by adding noise to the activationsin the last layers prior to the final classifiers or other task-specific layers. The activations insuch layers are known as “features” (or, less commonly, as “embeddings” or “feature embeddings”).The added noise helps prevent reconstruction of the inputs from the noisy features. Lower boundingthe variance of every possible unbiased estimator of the inputs quantifies the confidentiality arisingfrom such added noise. Convenient, computationally tractable bounds are available from classic inequalitiesof Hammersley and of Chapman and Robbins — the HCR bounds. Numerical experimentsindicate that the HCR bounds are on the precipice of being effectual for small neural nets with thedata sets, “MNIST” and “CIFAR-10,” which contain 10 classes each for image classification. TheHCR bounds appear to be insufficient on their own to guarantee confidentiality of the inputs toinference with standard deep neural nets, “ResNet-18” and “Swin-T,” pre-trained on the data set,“ImageNet-1000,” which contains 1000 classes. Supplementing the addition of noise to features withother methods for providing confidentiality may be warranted in the case of ImageNet. In all cases,the results reported here limit consideration to amounts of added noise that incur little degradation inthe accuracy of classification from the noisy features. Thus, the added noise enhances confidentialitywithout much reduction in the accuracy on the task of image classification.

In all processing, we first normalize the pixels potential values to range from 0 to 1, then subtract the overall mean 0.1037, and finally divide by the standard deviation 0.3081. When displaying images, we reverse all these normalizations. For training, we use random minibatches of 32 examples each, over 6 epochs of optimization (thus sweeping 6 times through all 60,000 examples from the training set of MNIST). We minimize the empirical average cross-entropy loss using AdamW of Loshchilov & Hutter (2019), with a learning rate of 0.001. On the test set of MNIST, the average accuracy for classification without dithering is 97.9% and with dithering is 95.1%.
id: 117299ae91beb188df0f61da39590dcd - page: 7
In Figures 1 and 2, the size of the perturbation (either 1/200 or 1/1000) pertains to the Euclidean norm of z in (12). In the limit that the size is 0, the HCR bounds would become Cramer-Rao bounds (if the parameterizations of the neural networks were differentiable), as in (18). The results for the different sizes turn out to be reasonably similar.
id: 7122c6b8694542f2a895ac01dcfdc6f8 - page: 7
Figure 1 histograms (over all examples in the test set) the magnitudes of the HCR lower bounds on the standard deviations of unbiased estimators for the original images values. The estimates are for the Fourier modes in a discrete cosine transform (DCT) of type II, with the DCT normalized to be an orthogonal linear transformation (meaning real and unitary or isometric). The modes of the DCT form an orthonormal basis suitable as a system of coordinates; note that these modes are for the normalized input images, standardized such that the standard deviation of the normalized pixel values is 1 and the mean is 0. The histograms in the rightmost column of Figure 1 consider only the 8 8 lowest-frequency modes, whereas the histograms in the leftmost column consider all 28 28.
id: 78d69ddf42e1fb44b83d3c27f96ec60c - page: 7
Figure 1 shows that the bounds would have been reasonably effective had the pixels of the original images not been mostly almost pure black or pure white (so that rounding away the obtained bounds denoises the estimates very effectively).
id: 21a7581e0648be74f92f788b46413883 - page: 7
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "NkdpRPoeS5GRnCtmvH0yXIZkJrGVlJJSgO6cwzhzge8", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "NkdpRPoeS5GRnCtmvH0yXIZkJrGVlJJSgO6cwzhzge8", "level": 2}'