Created at 1pm, Jan 5
sXwbvaWWArtificial Intelligence
0
Dropout: a simple way to prevent neural networks from overfitting
BPVqrvCAnQXxJvcU1uXbx5asuNg8-czaifZb2wj8Gs4
File Type
PDF
Entry Count
104
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

The article discusses a technique called 'Dropout' for improving the performance of deep neural networks. Deep neural networks, which are complex models with many layers and parameters, are extremely powerful in machine learning tasks like image and speech recognition. However, they have a tendency to overfit. Overfitting is like a student who memorizes facts for a test but fails to understand the concepts well enough to apply them to new problems. In the case of neural networks, this means they might perform exceptionally well on the data they were trained on, but poorly on new, unseen data.Dropout is a surprisingly simple yet effective solution to this problem. Imagine a neural network as a team of neurons working together to solve a problem. With dropout, during the training process, a random selection of these neurons is 'dropped out,' or ignored, at each step. It's like randomly making some team members unavailable during practice sessions. This forces the remaining neurons to learn more robust and independent representations, as they can't rely on specific neurons always being present.During the network's training phase, dropout effectively creates a variety of different, smaller networks (since different neurons are dropped out each time). At test time, when the network is used on real-world data, dropout approximates the effect of averaging the outputs of all these smaller networks. This is done by using a single, complete network with adjusted weights, resulting in a more generalized and versatile model.The authors demonstrate that using dropout significantly reduces overfitting, leading to better performance in various machine learning tasks like vision and speech recognition, document classification, and computational biology. The results showed state-of-the-art performance on many benchmark datasets, proving dropout to be a powerful tool in the machine learning toolkit.

(b) Dropout with p = 0.5. Figure 8: Eect of dropout on sparsity. ReLUs were used for both models. Left: The histogram of mean activations shows that most units have a mean activation of about 2.0. The histogram of activations shows a huge mode away from zero. Clearly, a large fraction of units have high activation. Right: The histogram of mean activations shows that most units have a smaller mean mean activation of about 0.7. The histogram of activations shows a sharp peak at zero. Very few units have high activation.
id: 9b57b2c95a9922643486918df46a0e99 - page: 16
We found that as a side-eect of doing dropout, the activations of the hidden units become sparse, even when no sparsity inducing regularizers are present. Thus, dropout automatically leads to sparse representations. To observe this eect, we take the autoencoders trained in the previous section and look at the sparsity of hidden unit activations on a random mini-batch taken from the test set. Figure 8a and Figure 8b compare the sparsity for the two models. In a good sparse model, there should only be a few highly activated units for any data case. Moreover, the average activation of any unit across data cases should be low. To assess both of these qualities, we plot two histograms for each model. For each model, the histogram on the left shows the distribution of mean activations of hidden units across the minibatch. The histogram on the right shows the distribution of activations of the hidden units.
id: 1e3c675765ae7eb9b1d2d1517413c977 - page: 16
Comparing the histograms of activations we can see that fewer hidden units have high activations in Figure 8b compared to Figure 8a, as seen by the signicant mass away from 1944 Dropout zero for the net that does not use dropout. The mean activations are also smaller for the dropout net. The overall mean activation of hidden units is close to 2.0 for the autoencoder without dropout but drops to around 0.7 when dropout is used.
id: d5d42e0aeea07aaadd5862f63409fc40 - page: 16
7.3 Eect of Dropout Rate Dropout has a tunable hyperparameter p (the probability of retaining a unit in the network). In this section, we explore the eect of varying this hyperparameter. The comparison is done in two situations. 1. The number of hidden units is held constant. 2. The number of hidden units is changed so that the expected number of hidden units that will be retained after dropout is held constant. In the rst case, we train the same network architecture with dierent amounts of dropout. We use a 784-2048-2048-2048-10 architecture. No input dropout was used. Figure 9a shows the test error obtained as a function of p. If the architecture is held constant, having a small p means very few units will turn on during training. It can be seen that this has led to undertting since the training error is also high. We see that as p increases, the error goes down. It becomes at when 0.4 p 0.8 and then increases as p becomes close to 1. 0.8 1.0 2.5 0.6 0.0 0.0
id: f98e612095edac7aebab9799c67292d2 - page: 17
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "BPVqrvCAnQXxJvcU1uXbx5asuNg8-czaifZb2wj8Gs4", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "BPVqrvCAnQXxJvcU1uXbx5asuNg8-czaifZb2wj8Gs4", "level": 2}'