Created at 4am, Mar 26
SplinterArtificial Intelligence
0
Single Image Super Resolution Using Deep Residual Learning
x1ipeuah_9BVyhH4XwoRw9MApOzYtJH3rWUIdxiEGf0
File Type
PDF
Entry Count
64
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Single Image Super Resolution (SSIR) is an intriguing research topic in computer vision where the goal is to create high-resolution images from low-resolution ones using innovative techniques. SSIR has numerous applications in fields such as medical/satellite imaging, remote target identification and autonomous vehicles. Compared to interpolation based traditional approaches, deep learning techniques have recently gained attention in SISR due to their superior performance and computational efficiency. This article proposes an Autoencoder based Deep Learning Model for SSIR. The down-sampling part of the Autoencoder mainly uses 3 by 3 convolution and has no subsampling layers. The up-sampling part uses transpose convolution and residual connections from the down sampling part. The model is trained using a subset of the VILRC ImageNet database as well as the RealSR database. Quantitative metrics such as PSNR and SSIM are found to be as high as 76.06 and 0.93 in our testing. We also used qualitative measures such as perceptual quality.

The Autoencoder model used with the ImageNet dataset consists of 5 convolutional layers that down-sample the input image from 256 256 3 to 8 8 512 after the fifth layer. This is followed by 5 transposed convolutional layers that up-sample the image from 8 8 512 to 256 256 6. Residual connections were used between the convolutional layers and the transposed convolutional layers as shown in Figure 7. Finally, image reconstruction is achieved by one convolutional layer at the end. Figure 7. Our Network Structure. Here we have five conv. layers to down sample the input image, followed by five Transposed conv. layer for image reconstruction. Residual skip connections are not shown in this figure.
id: 5256dafe42d81eb3dfa7f790adfc1bbb - page: 8
All the filters used in the model are of the same kernel size 3 3 except for the last image reconstruction layer which has a kernel size of 2 2. The number of filters in each layer starts with 128 filters on the first and second convolutional layers and goes to 256, 512, 512 on the third, fourth, and fifth down-sampling layers respectively. Using an increasing number of filters enabled us to stack feature maps and avoid any loss of features between the layers through this down-sampling. As for the decoder part of the model, the number of filters decreases gradually from 512 filters at the first layer to 256, 128, 128, 3 on the other four layers. The last reconstruction layer consists of only 3 filters. 433 AI 2024, 5
id: 785ea9e27090d8b32aea1770c6d1d30a - page: 8
When removing Batch Normalization (BN) modules from the model, better results were achieved with less memory usage. To help tackle the issue of vanishing gradients, the activation function applied through all the layers was the Leaky Rectified Linear Unit (Leaky ReLU). For the images from the RealSR V1 dataset, the number of convolutional filters were increased to 1024, 1024, 512, and 512, starting from the first layer. The models were implemented using the TensorFlow framework and run on Google Colab. The TensorFlow code is given in Appendix A. 5.2. Training Learning end-to-end mapping requires the estimation of the network parameters W1, W2, . . . , W11 and B1, B2, . . . , B11 where Wi, Bi denotes the parameters of the learning filters in the ith layer, that minimize the loss between the reconstructed image and the ground truth image. We use the Mean Absolute Error (MAE) as the loss function: Loss = 1 n n i=1
id: 337847030ca43a0d80b8a898ee5ae926 - page: 9
|F(Yi) Xi| In Super-resolution algorithms, the input image goes through all layers until the output. This requires very-long term memory, and also causes the issue of vanishing gradients. Residual learning with skip connections solve this issue; instead of learning the output directly from the input, the network learn the residual image between the output and input in different layers as shown in Figure 8.
id: 31c5d6886897f45707d3c37f843bb658 - page: 9
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "x1ipeuah_9BVyhH4XwoRw9MApOzYtJH3rWUIdxiEGf0", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "x1ipeuah_9BVyhH4XwoRw9MApOzYtJH3rWUIdxiEGf0", "level": 2}'