Created at 12pm, Jan 5
sXwbvaWWArtificial Intelligence
0
ImageNet Classification with Deep Convolutional Neural Networks
FnFrfV_5jiMrREQL_uY2ODFRdGn0y8ihWG3_rLm6xC0
File Type
PDF
Entry Count
48
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

This article talks about a major advancement in the field of computer vision, which is teaching computers how to recognize and classify images. The researchers trained a very large and complex neural network – a type of artificial intelligence modeled after the human brain – to categorize 1.3 million high-resolution images into 1000 different categories.This neural network is like a super-charged brain with 60 million adjustable settings (parameters) and 500,000 artificial 'neurons' or processing units. It's composed of several layers, including five convolutional layers which are specialized for processing images. Imagine these layers as a series of filters that extract various features from an image, such as edges, textures, or specific objects. Some layers are followed by max-pooling, which helps reduce the amount of information the network needs to handle by summarizing the most important features.The network also includes two fully connected layers that help in making the final decision on what category each image belongs to, ending with a softmax function that assigns probabilities to these categories.To train this network efficiently, the researchers used a couple of smart techniques. They utilized non-saturating neurons, a method that helps the network learn faster and more effectively. They also used a powerful GPU implementation, which is like giving the neural network a super-fast computer brain to work with.To avoid overfitting – where the model gets really good at recognizing the images it was trained on but fails to generalize to new images – they introduced a new method of regularization. This method is like a training discipline that ensures the neural network doesn't focus too much on the specific details of the training images but learns the broader patterns that are useful for recognizing new images.Their approach achieved impressively low error rates on a test set of images, significantly outperforming previous methods. This work marked a substantial leap forward in the ability of computers to interpret and understand visual information.

4 Reducing Overtting Our neural network architecture has 60 million parameters. Although the 1000 classes of ILSVRC make each training example impose 10 bits of constraint on the mapping from image to label, this turns out to be insufcient to learn so many parameters without considerable overtting. Below, we describe the two primary ways in which we combat overtting. 4.1 Data Augmentation The easiest and most common method to reduce overtting on image data is to articially enlarge the dataset using label-preserving transformations (e.g., [25, 4, 5]). We employ two distinct forms of data augmentation, both of which allow transformed images to be produced from the original images with very little computation, so the transformed images do not need to be stored on disk. In our implementation, the transformed images are generated in Python code on the CPU while the GPU is training on the previous batch of images. So these data augmentation schemes are, in effect, computationally free.
id: 492585586e2e2e52fc862711784af8e8 - page: 5
The rst form of data augmentation consists of generating image translations and horizontal reections. We do this by extracting random 224 224 patches (and their horizontal reections) from the 256256 images and training our network on these extracted patches4. This increases the size of our training set by a factor of 2048, though the resulting training examples are, of course, highly interdependent. Without this scheme, our network suffers from substantial overtting, which would have forced us to use much smaller networks. At test time, the network makes a prediction by extracting ve 224 224 patches (the four corner patches and the center patch) as well as their horizontal reections (hence ten patches in all), and averaging the predictions made by the networks softmax layer on the ten patches.
id: 53afac86324fefc01ce0f25a3248caaa - page: 5
The second form of data augmentation consists of altering the intensities of the RGB channels in training images. Specically, we perform PCA on the set of RGB pixel values throughout the ImageNet training set. To each training image, we add multiples of the found principal components, 4This is the reason why the input images in Figure 2 are 224 224 3-dimensional. 5 with magnitudes proportional to the corresponding eigenvalues times a random variable drawn from a Gaussian with mean zero and standard deviation 0.1. Therefore to each RGB image pixel Ixy = [I R xy, I G
id: 800fdcbae59ed02df88aaa8fa208d3b6 - page: 5
Each i is drawn only once for all the pixels of a particular training image until that image is used for training again, at which point it is re-drawn. This scheme approximately captures an important property of natural images, namely, that object identity is invariant to changes in the intensity and color of the illumination. This scheme reduces the top-1 error rate by over 1%. 4.2 Dropout
id: dd5c906b255ab6b3c0d48b794ef1a044 - page: 6
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "FnFrfV_5jiMrREQL_uY2ODFRdGn0y8ihWG3_rLm6xC0", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "FnFrfV_5jiMrREQL_uY2ODFRdGn0y8ihWG3_rLm6xC0", "level": 2}'