Darkvisionnet: Low-Light Imaging Via Rgb-Nir Fusion With Deep Inconsistency Prior
Darkvisionnet: Low-Light Imaging Via Rgb-Nir Fusion With Deep Inconsistency Prior
Darkvisionnet: Low-Light Imaging Via Rgb-Nir Fusion With Deep Inconsistency Prior
Abstract
RGB-NIR fusion is a promising method for low-light imag-
ing. However, high-intensity noise in low-light images ampli-
fies the effect of structure inconsistency between RGB-NIR
images, which fails existing algorithms. To handle this, we
propose a new RGB-NIR fusion algorithm called Dark Vi-
sion Net (DVN) with two technical novelties: Deep Struc-
ture and Deep Inconsistency Prior (DIP). The Deep Struc-
ture extracts clear structure details in deep multiscale feature
space rather than raw input space, which is more robust to
noisy inputs. Based on the deep structures from both RGB
and NIR domains, we introduce the DIP to leverage the struc-
ture inconsistency to guide the fusion of RGB-NIR. Benefit-
ing from this, the proposed DVN obtains high-quality low-
light images without the visual artifacts. We also propose a
new dataset called Dark Vision Dataset (DVD), consisting
of aligned RGB-NIR image pairs, as the first public RGB-
NIR fusion benchmark. Quantitative and qualitative results
on the proposed benchmark show that DVN significantly out-
performs other comparison algorithms in PSNR and SSIM,
especially in extremely low light conditions.
Introduction
High-quality low-light imaging is a challenging but signif-
icant task. On the one hand, it is the cornerstone of many
important applications such as 24-hour surveillance, smart- Figure 1: (a) and (b) are fusion examples from DVD, com-
phone photography, etc. On the other hand, though, mas- pared to CU-Net (Deng and Dragotti 2020), DKN (Kim,
sive noise of images in extremely dark environments hin- Ponce, and Ham 2021) and Scale Map (Yan et al. 2013),
ders algorithms from the satisfactory restoration of low-light our method, Dark Vision Net (DVN), effectively handle the
images. RGB-NIR fusion techniques provide a new per- structure inconsistency between RGB-NIR images. Regions
spective for the challenge: it enhances the low-light noisy with inconsistent structures are framed in red.
color (RGB) image through rich, detailed information in the
corresponding near-infrared (NIR) image (The high quality
taken in an extremely low-light environments, as shown in
of NIR images in dark environments comes from invisible
Figure 1(a).
near-infrared flash), which greatly improves the signal-to-
noise ratio (SNR) of the restored RGB image. Under the However, the existing RGB-NIR fusion algorithms suf-
constraint of cost, size and other factors, RGB-NIR fusion fers from the problem of structure inconsistency between
becomes the most promising technique to restore the van- RGB and NIR images, resulting in unnatural appearance
ished textual and structural details from noisy RGB images and loss of key information, which limits the application
of RGB-NIR fusion algorithm in low-light imaging. Fig-
* Authors contributed equally. ure 1 illustrates two typical examples of structure incon-
†
Corresponding author. sistency between RGB and NIR images: Figure 1(b) shows
Copyright © 2022, Association for the Advancement of Artificial the absence of NIR shadows in the RGB image (grass shad-
Intelligence (www.aaai.org). All rights reserved. ows only appear on the book edge in the NIR image), and
Figure 2: Overview of the proposed DVN. In the first stage, the network predicts deep structure maps utilising the multi-scale
features maps from restoration network R by the proposed Deep Structure Extraction Module (DSEM) for noisy RGB and NIR
respectively. In the second stage, taking advantage of the predicted deep structures, the DIP can be calculated by inconsistency
function F. In the third stage, the DIP-weighted NIR structures are fused with the RGB features to obtain the final fusion result
without obvious structure inconsistency.
the nonexistence of RGB color structure in the NIR im- cal novelties are introduced: (1) We find a new way, referred
age (text ‘complete’ almost disappears on the book cover to as deep structures, to represent clear structure information
in the NIR image). Fusion algorithms need to tackle these encoded in deep features extracted by the proposed Deep
structure inconsistency to avoid visual artifacts in output im- Structure Extraction Module (DSEM). Even facing images
ages. There are two categories of RGB-NIR fusion methods with low SNR, the deep structures can still be effectively ex-
currently, i.e. traditional methods and neural-network-based tracted and represent reliable structural information, which
methods, and modeling the structure of the paired RGB-NIR is critical to the introduction of prior knowledge. (2) We pro-
images plays an important role in both of them. Traditional pose Deep Inconsistency Prior (DIP), which indicates the
methods, such as Scale Map (Yan et al. 2013), tackle the differences between RGB-NIR deep structures. Integrated
structure inconsistency problem by manually designed func- into the fusion of RGB-NIR deep features, the DIP empow-
tions. Some neural-network-based methods (Kim, Ponce, ers the network to handle the structure inconsistency. Bene-
and Ham 2021; Li et al. 2019), on the other hand, utilize fiting from this, the proposed DVN can obtain high-quality
deep learning techniques to automatically learn the structure low-light images.
inconsistency by a large amount of data. Both of them per- In addition, to the best of our knowledge, there is no avail-
form well under certain circumstances. able benchmark dedicated for the RGB-NIR fusion task so
However, when confronted with extreme low-light envi- far. The lack of benchmarks to evaluate and train fusion al-
ronments, existing methods fail to maintain satisfactory per- gorithms greatly limits the development of this field. To fill
formance, since the structure inconsistency is dramatically this gap, we propose a dataset named Dark Vision Dataset
exacerbated by massive noise in the RGB image. As shown (DVD) as the first RGB-NIR fusion benchmark. Based on
in Figure 1(b), the dense noise in the RGB image makes this dataset, we give qualitative and quantitative evaluations
it difficult for Scale Map to extract structural information, to prove the effectiveness of our method. In summary, the
causing the failure of distinguishing which structures in the main contributions of this paper are as follows:
NIR image should be eliminated, and result in the unnatu- • We propose a novel RGB-NIR fusion algorithm called
ral ghost images on the book edge. Deformable Kernel Net- Dark Vision Net (DVN) With Deep Inconsistency Prior
works (DKN) (Kim, Ponce, and Ham 2021) falsely weak- (DIP). The DIP explicitly integrates the prior of struc-
ens gradients of input RGB image that do not exist in the ture inconsistency into the deep features, avoiding over-
corresponding NIR image, which leads to the blurriness of relying on NIR features in the feature fusion. Benefits
letters on the book cover. Even though these structural in- from this, DVN can obtain high-quality low-light images
consistency of corresponding RGB and NIR images can be without visual artifacts.
captured by human eyes effortlessly, they still confuse most • We propose a new dataset Dark Vision Dataset (DVD) as
of the existing fusion algorithms. the first public dataset for training and evaluating RGB-
In this paper, we focus on improving the RGB-NIR fu- NIR fusion algorithms.
sion algorithm for extremely low SNR images by tackling • The quantitative and qualitative results indicate that DVN
the structure inconsistency problem. Based on the above is significantly better than other compared methods.
analysis, we argue that the structure inconsistency under ex-
tremely low light can be handled well by introducing prior Related Work
knowledge into deep features. To achieve this, we propose Image Denoising. In recent years, denoising algorithms
a deep RGB-NIR fusion algorithm called Dark Vision Net based on deep neural networks have continually emerged
(DVN), which explicitly leverages the prior knowledge of and overcome the drawbacks of analytical methods (Lu-
structure inconsistency to guide the fusion of RGB-NIR cas et al. 2018). The image noise model is gradually im-
deep features, as shown in Figure 2. With DVN, two techni- proved simultaneously (Wei et al. 2020; Wang et al. 2020).
(Mao, Shen, and Yang 2016) applied an encoder-decoder
network to suppress the noise and recover high-quality im-
ages. (Zhang, Zuo, and Zhang 2018) presented a denois-
ing network to process blind noise denoising. (Guo et al.
2019; Cao et al. 2021) attempted to remove the noise from
real noisy images. There are also deep denoising algorithms Figure 3: Through applying F on edge maps of clean RGB
trained without clean data supervision (Lehtinen et al. 2018; and NIR images, the calculated inconsistency map clearly
Krull, Buchholz, and Jug 2019; Huang et al. 2021). How- shows the structure inconsistency between RGB and NIR.
ever, in extremely dark environments, fine texture details
damaged by the high-intensity noise are very difficult to
restore. In that case, denoising algorithms tend to gener- fill this gap, we collect a dataset named Dark Vision Dataset
ate over-smoothed outputs, which are unsatisfactory. By the (DVD) as the first public available RGB-NIR fusion bench-
way, low-light image enhancement algorithms (Chen et al. mark, which contains noise-free reference image pairs and
2018; Lamba and Mitra 2021; Gu et al. 2019) try to directly real noisy low-light image pairs. With noise-free reference
restore high-quality images in terms of brightness, color, image pairs, the proposed DVD can be used to quantitatively
etc. However, these algorithms cannot deal with such high- and qualitatively evaluate fusion algorithms. In addition, the
intensity noise as well. DVD also contains real noisy low-light image pairs, which
can be used to qualitatively evaluate the performance of the
RGB-NIR Fusion. To obtain high-quality low-light im- fusion algorithm in real scenes.
ages, researchers (Krishnan and Fergus 2009) try to fuse
NIR images with RGB images. Traditional RGB-NIR fu- Approach
sion algorithms include weighted least squares (Zhuo et al.
2010), Guided Image Filtering (GIF) (He, Sun, and Tang Prior Knowledge of Structure Inconsistency
2012), gradient preserving (Connah, Drew, and Finlayson As previously described, the network needs to be aware of
2015), multi-scale decomposition (Son and Zhang 2016). the inconsistent regions on the two inputs. We design an in-
Recently, (Yan et al. 2013) pointed out the gradient inconsis- tuitive function to measure the inconsistency from image
tency between RGB-NIR image pairs, and proposed Scale features. Firstly binary edge maps are extracted from each
Map to try to solve it. Among the methods based on deep feature channel. Then the inconsistency is defined as
neural network, Joint Image Filtering with Deep Convolu-
tional Networks (DJFR) (Li et al. 2019) constructs a unified F(edgeC: , edgeN ) = λ(1 − edgeC: )(1 − edgeN )
two-stream network model for image fusion, CU-Net (Deng + edgeC: · edgeN (1)
and Dragotti 2020) combines sparse encoding with Convo-
lutional Neural Networks (CNNs), DKN (Kim, Ponce, and where C: ∈ RH×W and N ∈ RH×W denote R/G/B channel
Ham 2021) explicitly learns sparse and spatially-variant ker- of the clean RGB image and NIR image, edgeC: and edgeN
nels for image filtering. (Lv et al. 2020) innovatively con- respectively represent the binarized edge maps of C: and N ,
structs a network that directly decouples RGB and NIR sig- which is obtained by binarizing its mean value as a threshold
nals for 24-hour imaging. In general, the above-mentioned after Sobel filtering.
RGB-NIR fusion algorithm has two main problems. One is As shown in Figure 3, F(·, ·) equals to 0 in the regions
insufficient ability to deal with RGB-NIR texture inconsis- where edgeC: and edgeN shows severe inconsistency. On
tency, leading to heavy artifacts on the final fusion images. the contrary, F(·, ·) equals to 1 in the regions where the
The other is the inadequate noise suppression capability, es- structures of RGB and NIR are consistent. And in other re-
pecially when dealing with high-intensity noise in extremely gions, F(·, ·) is set to a hyperparameter λ(0 < λ < 1), indi-
low-light environments. To handle the above problems, this cating that there is no significant inconsistency. Utilising the
paper proposes the DarkVisionNet with a novel DIP mech- output inconsistency map of F, the inconsistent NIR struc-
anism to effectively deal with the inconsistency between tures can be easily suppressed by a direct multiplication.
RGB-NIR images.
Datasets. There is only a small amount of data that can Extraction of Deep Structures
be used for RGB-NIR fusion studies because of the diffi- Even though function F subtly describes the inconsistency
culty to obtain aligned RGB-NIR image pairs. Some stud- between RGB and NIR images, it cannot be applied di-
ies (Foster et al. 2006) focus on obtaining hyperspectral rectly in extremely low light cases. As shown in Figure 4,
datasets, and strictly aligned RGB-NIR image pairs can be the calculated inconsistency map contains nothing but non-
obtained by integrating hyperspectral images on the corre- informative noise when facing extremely noisy RGB image.
sponding band. (Krishnan and Fergus 2009) present a pro- To avoid the influence of noise in the structure inconsistency
totype camera to collect RGB-NIR image pairs. However, extraction, we propose the Deep Structure Extraction Mod-
these datasets are too small to be used to comprehensively ule (DSEM) and Deep Inconsistency Prior (DIP), where we
measure the performance of fusion algorithms. More impor- compute the structure inconsistency in feature space. Con-
tantly, due to the lack of data on actual scenarios, they can- sidering the processing flow of RGB and NIR are basically
not encourage follow-up researchers to focus on the valuable the same, we give a unified description here to keep the sym-
problems that RGB-NIR will encounter in applications. To bols concise.
where structgt gt
i,c,j is the jth pixel of structi,c , ∇ represents
the Sobel operator, ∇deci,c,j is the jth pixel in ∇deci,c and
m∇deci,c is the global average pooling result of ∇deci,c . The
supervision signal obtained by this design effectively trains
the DSEM and clear deep structure maps are predicted as
shown in Figure 4.
Figure 6: Fusion examples from DVD. The proposed DVN shows great superiority than other algorithms. Images are brightened
for visual convenience. See supplementary material for more examples.
GIF DJFR DKN Scale Map CUNet NBNet MPRNet DVN (Ours)
PSNR 22.32 26.28 27.22 21.98 28.62 31.38 31.79 31.50
σ=2
SSIM 0.6410 0.8263 0.8902 0.6616 0.9138 0.9477 0.9504 0.9551
PSNR 19.15 23.91 24.34 21.02 26.81 29.14 29.37 29.62
σ=4
SSIM 0.5033 0.7464 0.8427 0.6225 0.8832 0.9259 0.9276 0.9400
PSNR 17.30 22.40 22.78 20.02 25.43 27.27 27.68 28.26
σ=6
SSIM 0.4240 0.6802 0.8067 0.5959 0.8510 0.9060 0.9083 0.9273
PSNR 15.98 20.72 22.50 19.07 23.75 24.81 26.20 26.98
σ=8
SSIM 0.3701 0.6177 0.7799 0.5742 0.8154 0.8822 0.8908 0.9155
ond step is to add noise to the pseudo-dark raw images, in- Table 2: Performance comparison (PSNR). The conclusions are
cluding Gaussian noise with variance equals to σ, and Pois- the same if SSIM is applied as the metric.
PSNR comparison on public IVRG PSNR comparison with other
son noise with a level proportional to σ. dataset (σ=50, input PSNR = 13.44) methods on DVD (σ = 4)
DJFR 23.35 SID (Chen et al. 2018) 25.26
CUNet 24.96 SGN (Gu et al. 2019) 28.40
Performance Comparison Scale Map 25.59 SSN (Dong et al. 2018) 13.72
DVN (Ours) 30.43 DVN (Ours) 29.62
Results on DVD Benchmark. We evaluate and compare
DVN with representative methods in related fields, includ- network is trained on a synthetic noisy dataset.
ing single-frame noise reduction algorithms NBNet (Cheng
et al. 2021) and MPRNet (Zamir et al. 2021), joint image Comparison on Public Dataset. So far, there is no high-
filtering algorithms GIF (He, Sun, and Tang 2012), DJFR quality public RGB-NIR dataset like DVD yet. For exam-
(Li et al. 2019), DKN (Kim, Ponce, and Ham 2021) and ple, RGB-NIR pairs in IVRG (Brown and Süsstrunk 2011)
CUNet (Deng and Dragotti 2020), as well as Scale Map (Yan are not well aligned. Even so, we retrained DVN and other
et al. 2013) which specially designed for RGB-NIR fusion. methods on IVRG and give quantitative comparison in Table
All methods are trained or finetuned on DVD from scratch. 2. It is clear that DVN still performs well.
We use PSNR and SSIM (Wang et al. 2004) for quantita-
tive measurement. Qualitative comparison is shown in Fig- Comparison with Low-Light Enhancement Methods.
ure 6, and quantitative comparison under different noise in- We also compare our method with the low-light enhance-
tensity settings (σ = 2, 4, 6, 8, the larger the σ, the heavier ment methods. We retrained SID (Chen et al. 2018) and SGN
the noise) on DVD benchmark is shown in Table 1. (Gu et al. 2019), the comparison can be seen in Table 2. It is
The qualitative comparison in Figure 6 clearly illustrates clear that our proposed DVN still shows great superiority.
the superiority of the proposed DVN on noise removal, de-
tail restoration and visual artifacts suppression. In contrast, Effectiveness of DIP
image denoising algorithms (i.e. NBNet and MPRNet) can-
not restore texture details when the noise intensity becomes In this section, we verify that the proposed DIP is effective in
high, and the output turns into pieces of smear even though handling the mentioned structure inconsistency. For compar-
the noise is effectively suppressed. GIF and DJFR output im- ison, we retrain a baseline, which is the same as the proposed
ages with heavy noise as the 3rd and 4th column in Figure DVN only without the DIP module. As Figure 8(a) shows,
6 shows, which greatly affects the fusion quality. The fusion the NIR shadow of the grass still remains in the fusion re-
effect of DKN and CUNet (5th and 6th column in Figure 6) sult without DIP, but not in the fusion result with DIP. This
under mild noise (e.g. σ = 2) is acceptable. But under heavy directly proves that DIP can handle the structure inconsis-
noise, obvious color deviation appears in the DKN output, tency. Figure 8(b) shows that DIP can also deal with serious
and neither of them can deal with structure inconsistency structure inconsistency caused by the misalignment between
(see the 4th row in Figure 6), resulting in severe artifacts in RGB-NIR images to a certain extent (this example pair can-
the fusion images. Scale Map outputs images with rich de- not be aligned even after image registration). This has prac-
tails. However, it cannot reduce the noise in the areas where tical value because the problem of misalignment frequently
texture is lacking in the NIR image. In addition, it is hard occurs in applications. Taking into account the nature of DIP,
to achieve a balance between noise suppression and texture the remaining artifacts are in line with expectations, since
migration when applying Scale Map. they are concentrated near the pixels with gradients in the
RGB image.
Generalization on Real Noisy RGB-NIR. To evaluate In addition, Figure 8 also visualizes the deep structure of
the performance of algorithms when facing real low-light RGB, NIR, consistent NIR (DIP-weighted) as well as DIP
images, we conduct a qualitative experiment on several pairs Maps. It is obvious that even facing noisy input, the RGB
of RGB-NIR images captured in real low-light environ- deep structure still contains clear structures. The visual com-
ments. As shown in Figure 7, outputs of DVN have obvi- parison between the NIR deep structure and the consistent
ously low noise, rich details, and are visually more natural NIR deep structure proves that the introduction of DIP can
when handling RGB-NIR pairs with real noise, even if the handle structure inconsistency in deep feature space.
Figure 7: Fusion results on RGB-NIR image pairs with real noise. DVN obviously obtains better results than other algorithms.
Figure 8: Illustration of the effectiveness of DIP. (a) shows a typical case of structure inconsistency caused by NIR shadows and
(b) shows a case of the misalignment RGB-NIR. Fusion results and visualizations of deep structures verify the effectiveness of
the DIP. Both examples are gathered from real noisy image pairs.
Table 3: Ablation experiment results are conducted on DVD improve performance as well as Table 3 (row 1 and 3) shows.
to study the effectiveness of each component. σ is set to 4. However, since the inconsistent structures are not removed,
the benefits are not obvious, even if we use intermediate su-
row. LĈ N
rec + Lrec DSEM DIP PSNR SSIM pervision and DSEM simultaneously as row 4 shows.
1. – – – 28.87 0.9356 As Table 3 (row 5) shows, after introducing DIP to deal
2. X – – 29.30 0.9375 with the structure inconsistency, the network performance
3. – X – 29.06 0.9376 can be further improved by a large margin. This demon-
strates the effectiveness of our proposed algorithm and the
4. X X – 29.36 0.9358
necessity to focus on the structure inconsistency problem on
5. X X X 29.62 0.9400
RGB-NIR fusion problem.
Ablation Study Conclusion
We evaluate the effectiveness of each component in the pro- In this paper, we propose a novel RGB-NIR fusion algo-
posed algorithm on the DVD benchmark quantitatively in rithm called Dark Vision Net (DVN). DVN introduces Deep
this section. PSNR and SSIM are reported in Table 3. The inconsistency prior (DIP) to integrate the structure inconsis-
baseline network directly fuse NIR features with RGB fea- tency into the deep convolution features, so that DVN can
tures (row 1 in Table 3). obtain a high-quality fusion result without visual artifacts.
Intermediate supervision LĈ N
rec and Lrec effectively im- In addition, we also proposed the first available benchmark,
prove the performance as Table 3 (row 1 and 2) shows. This which is called Dark Vision Dataset (DVD), for RGB-NIR
indicates the necessity of enhancing the noise suppression fusion algorithms training and evaluation. Quantitative and
capability of the network for clean structure extraction. qualitative results prove that the DVN is significantly better
Applying DSEM to learn deep structures without DIP can than other algorithms.
References Jung, C.; Zhou, K.; and Feng, J. 2020. FusionNet: Multi-
Brown, M.; and Süsstrunk, S. 2011. Multi-spectral SIFT for spectral fusion of RGB and NIR images using two stage con-
scene category recognition. In CVPR 2011, 177–184. IEEE. volutional neural networks. IEEE Access, 8: 23912–23919.
Cao, Y.; Wu, X.; Qi, S.; Liu, X.; Wu, Z.; and Zuo, W. 2021. Karaimer, H. C.; and Brown, M. S. 2016. A software plat-
Pseudo-ISP: Learning Pseudo In-camera Signal Processing form for manipulating the camera imaging pipeline. In Eu-
Pipeline from A Color Image Denoiser. arXiv preprint ropean Conference on Computer Vision, 429–444. Springer.
arXiv:2103.10234. Kim, B.; Ponce, J.; and Ham, B. 2021. Deformable kernel
Charbonnier, P.; Blanc-Feraud, L.; Aubert, G.; and Barlaud, networks for joint image filtering. International Journal of
M. 1994. Two deterministic half-quadratic regularization Computer Vision, 129(2): 579–600.
algorithms for computed imaging. In Proceedings of 1st Krishnan, D.; and Fergus, R. 2009. Dark flash photography.
International Conference on Image Processing, volume 2, ACM Trans. Graph., 28(3): 96.
168–172. IEEE. Krull, A.; Buchholz, T.-O.; and Jug, F. 2019. Noise2void-
Chen, C.; Chen, Q.; Xu, J.; and Koltun, V. 2018. Learning learning denoising from single noisy images. In Proceed-
to see in the dark. In Proceedings of the IEEE Conference ings of the IEEE/CVF Conference on Computer Vision and
on Computer Vision and Pattern Recognition, 3291–3300. Pattern Recognition, 2129–2137.
Cheng, S.; Wang, Y.; Huang, H.; Liu, D.; Fan, H.; and Liu, Lamba, M.; and Mitra, K. 2021. Restoring Extremely Dark
S. 2021. NBNet: Noise Basis Learning for Image Denoising Images in Real Time. In Proceedings of the IEEE/CVF Con-
with Subspace Projection. In Proceedings of the IEEE/CVF ference on Computer Vision and Pattern Recognition, 3487–
Conference on Computer Vision and Pattern Recognition, 3497.
4896–4906. Lehtinen, J.; Munkberg, J.; Hasselgren, J.; Laine, S.; Kar-
Connah, D.; Drew, M. S.; and Finlayson, G. D. 2015. Spec- ras, T.; Aittala, M.; and Aila, T. 2018. Noise2noise: Learn-
tral edge: gradient-preserving spectral mapping for image ing image restoration without clean data. arXiv preprint
fusion. JOSA A, 32(12): 2384–2396. arXiv:1803.04189.
Li, Y.; Huang, J.-B.; Ahuja, N.; and Yang, M.-H. 2019. Joint
Deng, R.; Shen, C.; Liu, S.; Wang, H.; and Liu, X. 2018.
image filtering with deep convolutional networks. IEEE
Learning to predict crisp boundaries. In Proceedings of the
transactions on pattern analysis and machine intelligence,
European Conference on Computer Vision (ECCV), 562–
41(8): 1909–1923.
578.
Lucas, A.; Iliadis, M.; Molina, R.; and Katsaggelos, A. K.
Deng, X.; and Dragotti, P. L. 2020. Deep convolutional neu- 2018. Using deep neural networks for inverse problems in
ral network for multi-modal image restoration and fusion. imaging: beyond analytical methods. IEEE Signal Process-
IEEE transactions on pattern analysis and machine intelli- ing Magazine, 35(1): 20–36.
gence.
Lv, F.; Zheng, Y.; Li, Y.; and Lu, F. 2020. An integrated
DeTone, D.; Malisiewicz, T.; and Rabinovich, A. 2017. Su- enhancement solution for 24-hour colorful imaging. In Pro-
perPoint: Self-Supervised Interest Point Detection and De- ceedings of the AAAI conference on artificial intelligence,
scription. CoRR, abs/1712.07629. volume 34, 11725–11732.
Foster, D. H.; Amano, K.; Nascimento, S. M.; and Foster, Mao, X.; Shen, C.; and Yang, Y.-B. 2016. Image restoration
M. J. 2006. Frequency of metamerism in natural scenes. using very deep convolutional encoder-decoder networks
Josa a, 23(10): 2359–2372. with symmetric skip connections. Advances in neural in-
Gu, S.; Li, Y.; Gool, L. V.; and Timofte, R. 2019. Self- formation processing systems, 29: 2802–2810.
guided network for fast image denoising. In Proceedings Son, C.-H.; and Zhang, X.-P. 2016. Layer-based approach
of the IEEE/CVF International Conference on Computer Vi- for image pair fusion. IEEE Transactions on Image Process-
sion, 2511–2520. ing, 25(6): 2866–2881.
Guo, S.; Yan, Z.; Zhang, K.; Zuo, W.; and Zhang, L. 2019. Ulyanov, D.; Vedaldi, A.; and Lempitsky, V. 2018. Deep
Toward convolutional blind denoising of real photographs. image prior. In Proceedings of the IEEE conference on com-
In Proceedings of the IEEE/CVF Conference on Computer puter vision and pattern recognition, 9446–9454.
Vision and Pattern Recognition, 1712–1722. Wang, Y.; Huang, H.; Xu, Q.; Liu, J.; Liu, Y.; and Wang,
He, K.; Sun, J.; and Tang, X. 2012. Guided image filtering. J. 2020. Practical deep raw image denoising on mobile de-
IEEE transactions on pattern analysis and machine intelli- vices. In European Conference on Computer Vision, 1–16.
gence, 35(6): 1397–1409. Springer.
Hinton, G. E.; and Salakhutdinov, R. R. 2006. Reducing Wang, Z.; Bovik, A. C.; Sheikh, H. R.; and Simoncelli, E. P.
the dimensionality of data with neural networks. science, 2004. Image quality assessment: from error visibility to
313(5786): 504–507. structural similarity. IEEE transactions on image process-
Huang, T.; Li, S.; Jia, X.; Lu, H.; and Liu, J. 2021. ing, 13(4): 600–612.
Neighbor2Neighbor: Self-Supervised Denoising from Sin- Wei, K.; Fu, Y.; Yang, J.; and Huang, H. 2020. A physics-
gle Noisy Images. In Proceedings of the IEEE/CVF Confer- based noise formation model for extreme low-light raw de-
ence on Computer Vision and Pattern Recognition, 14781– noising. In Proceedings of the IEEE/CVF Conference on
14790. Computer Vision and Pattern Recognition, 2758–2767.
Yan, Q.; Shen, X.; Xu, L.; Zhuo, S.; Zhang, X.; Shen, L.;
and Jia, J. 2013. Cross-field joint image restoration via scale
map. In Proceedings of the IEEE International Conference
on Computer Vision, 1537–1544.
Zamir, S. W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F. S.;
Yang, M.-H.; and Shao, L. 2021. Multi-stage progressive
image restoration. In Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition, 14821–
14831.
Zhang, K.; Zuo, W.; and Zhang, L. 2018. FFDNet: Toward
a fast and flexible solution for CNN-based image denoising.
IEEE Transactions on Image Processing, 27(9): 4608–4622.
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; and Fu, Y.
2018. Image super-resolution using very deep residual chan-
nel attention networks. In Proceedings of the European con-
ference on computer vision (ECCV), 286–301.
Zhuo, S.; Zhang, X.; Miao, X.; and Sim, T. 2010. Enhancing
low light images using near infrared flash images. In 2010
IEEE International Conference on Image Processing, 2537–
2540. IEEE.