Domain Randomization-Enhanced Depth Simulation and Restoration For Perceiving and Grasping Specular and Transparent Objects
Domain Randomization-Enhanced Depth Simulation and Restoration For Perceiving and Grasping Specular and Transparent Objects
Domain Randomization-Enhanced Depth Simulation and Restoration For Perceiving and Grasping Specular and Transparent Objects
Qiyu Dai1,∗ , Jiyao Zhang2,∗ , Qiwei Li1 , Tianhao Wu1 , Hao Dong1 ,
Ziyuan Liu3 , Ping Tan3,4 , and He Wang1,†
arXiv:2208.03792v2 [cs.CV] 23 Nov 2022
1
Peking University 2 Xi’an Jiaotong University
3
Alibaba XR Lab 4 Simon Fraser University
{qiyudai,lqw,hao.dong,hewang}@pku.edu.cn,
zhangjiyao@stu.xjtu.edu.cn, thwu@stu.pku.edu.cn,
ziyuan-liu@outlook.com, pingtan@sfu.ca
1 Introduction
With the emerging depth-sensing technologies, depth sensors and 3D point cloud
data become more and more accessible, rendering many applications in VR/AR
and robotics. Compared with RGB images, depth images or point clouds contain
the true 3D information of the underlying scene geometry, thus depth cameras
*: equal contributions, †: corresponding author
2 Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, H. Wang
Domain randomization
Material
Object
Specular Transparent Diffuse
… …
have been widely deployed in many robotic systems, e.g. for object grasping [4,14]
and manipulation [37,22,21], that care about the accurate scene geometry. How-
ever, an apparent disadvantage of accessible depth cameras is that they may
carry non-ignorable sensor noises more significant than usual noises in colored
images captured by commercial RGB cameras. A more drastic failure case of
depth sensing would be on objects that are either transparent or their sur-
faces are highly specular, where the captured depths would be highly erroneous
and even missing around the specular or transparent region. It should be noted
that specular and transparent objects are indeed ubiquitous in our daily life,
given most of the metallic surfaces are specular and many man-made objects
are made of glasses and plastics which can be transparent. The existence of so
many specular and transparent objects in our real-world scenes thus poses severe
challenges to depth-based vision systems and limits their application scenarios
to well-controlled scenes and objects made of diffuse materials.
In this work, we devise a two-stream Swin Transformer [18] based RGB-D
fusion network, SwinDRNet, for learning to perform depth restoration. However,
it is a lack of real data composed of paired sensor depths and perfect depths to
train such a network. Previous works on depth completion for transparent ob-
jects, like ClearGrasp [30] and LIDF [43], leverage synthetic perfect depth image
for network training. They simply remove the transparent area in the perfect
depth and their methods then learn to complete the missing depths in a feed-
forward way or further combines with depth optimization. We argue that both
the methods can only access incomplete depth images during training and never
see a depth with realistic sensor noises, leading to suboptimality when directly
deployed on real sensor depths. Also, these two works only consider a small
number of similar objects with little shape variations and all being transpar-
Domain Randomization-Enhanced Depth Simulation and Restoration 3
ent and hence fail to demonstrate their usefulness when adopted in scenes with
completely novel object instances. Given material specularity or transparency
forms a continuous spectrum, it is further questionable whether their methods
can handle objects of intermediate transparency or specularity.
To mitigate the problems in the existing works, we thus propose to syn-
thesize depths with realistic sensor noise patterns by simulating an active stereo
depth camera resembling RealSense D415. Our simulator is built on Blender and
leverages raytracing to mimic the IR stereo patterns and compute the depths
from them. To facilitate generalization, we further adopt domain randomization
techniques that randomize the object textures, object materials (from specular,
transparent, to diffuse), object layout, floor textures, illuminations along cam-
era poses. This domain randomization-enhanced depth simulation method, or in
short DREDS, leads to 130K photorealistic RGB images and their correspond-
ing simulated depths. We further curate a real-world dataset, STD dataset, that
contains 50 objects with specular, transparent, and diffuse material. Our exten-
sive experiments demonstrate that our SwinDRNet trained on DREDS dataset
can handle depth restoration on object instances from both seen and unseen
object categories in STD dataset and can even seamlessly generalize to Clear-
Grasp dataset and beat the previous state-of-the-art method, LIDF [43] trained
on ClearGrasp dataset. Additionally, SwinDRNet allows real-time depth restora-
tion (30 FPS). Our further experiments on estimating category-level pose and
grasping specular and transparent objects prove that our depth restoration is
both generalizable and successful.
2 Related Work
into this category and outperforms those methods, ensuring fast inference time
and better geometries to improve the performance of downstream tasks.
and error, especially for the hand-scale objects with specular and transparent
materials. The proposed DREDS dataset bridges the sim-to-real domain gap,
and generalizes the RGBD algorithms to unseen objects. DREDS dataset’s com-
parison to the existing specular and transparent depth restoration datasets is
summarized in Table 1.
4 STD Dataset
4.1 Real-world Dataset: STD
To further examine the proposed method in real scenes, we curate a real-world
dataset, composed of Specular, Transparent, and Diffuse objects, which we call
Domain Randomization-Enhanced Depth Simulation and Restoration 7
RGB RGB
Depth Depth
Ground Ground
truth truth
depth depth
Instance Instance
mask mask
5 Method
In this section, we introduce our network for depth restoration in section 5.1 and
then introduce the methods we used for downstream tasks, i.e. category-level 6D
object pose estimation and robotic grasping, in section 5.2.
8 Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, H. Wang
𝑺𝒘𝒊𝒏𝑻𝒄𝒐𝒍𝒐𝒓 {𝑭𝒊𝒄 }
𝑫𝒅𝒆𝒑𝒕𝒉
SwinT Stage4
𝑰𝒄 𝑰)𝒅
Input RGB SwinT Stage3 {𝑯𝒊 } Initial depth
SwinT Stage2
Patch Partition
𝑻𝒄 SwinT Stage1
⊗
𝑪
RGB-D fusion
𝑀$
𝑫𝒄𝒐𝒏𝒇 Confidence of initial depth
⨁
𝑇*+
𝑇*+
𝑇*+
𝑇*+
𝑺𝒘𝒊𝒏𝑻𝒅𝒆𝒑𝒕𝒉
SwinT Stage3
⨁ Concatenate ⊗ ∑
∑ Add
SwinT Stage4
𝑰𝒅 ⊗ Multiply
{𝑭𝒊𝒅 }
Input depth 𝑰-𝒅
Restored depth
map to select accurate depth predictions at noisy and invalid areas of the input
depth while keeping the originally correct area as much as possible.
SwinT-based Feature Extraction. To accurately restore the noisy and
incomplete depth, we need to leverage visual cues from the RGB image that helps
depth completion as well as geometric cues from the depth that may save efforts
at areas with correct input depths. To extract rich features, we propose to utilize
SwinT [18] as our backbone, since it is a very powerful and efficient network
that can produce hierarchical feature representations at different resolutions and
has linear computational complexity with respect to input image size. Given
our inputs contain two modalities – RGB and depth, we deploy two seperate
SwinT networks, SwinTcolor and SwinTdepth , to extract features from Ic and
Id , respectively. For each one of them, we basically follow the design of SwinT.
Taking the SwinTcolor as an example: we first divide the input RGB image
Ic ∈ RH×W ×3 into non-overlapping patches, which is also called tokens, Tc ∈
H W
R 4 × 4 ×48 ; we then pass Tc through the four stages of SwinT to generate the
multi-scale features {Fci }, which are especially useful for dense depth prediction
thanks to the hierarchical structure. The encoder process can be formulated as:
\ Q_A = \mathcal {F}_A \cdot W_q, ~~K_B = \mathcal {F}_B \cdot W_k, ~~V_B = \mathcal {F}_B \cdot W_v, (3)
where W s are the learnable parameters, and then computes the cross-attention
feature HFA →FB from FA to FB :
\ \mathcal {H}_{\mathcal {F}_A\rightarrow \mathcal {F}_B} = T_{CA}(\mathcal {F}_A, \mathcal {F}_B) =\text {softmax}\left (\frac {Q_A \cdot K_B^T}{\sqrt {d_K}}\right ) \cdot V_B, (4)
\ \mathcal {H}^i = \mathcal {H}_{\mathcal {F}_c^i\rightarrow \mathcal {F}_d^i} \bigoplus \mathcal {H}_{\mathcal {F}_d^i\rightarrow \mathcal {F}_c^i} \bigoplus \mathcal {F}_c^i \bigoplus \mathcal {F}_d^i, (5)
10 Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, H. Wang
L
where represents concatenation along the channel axis.
Final Depth Prediction via Confidence Interpolation. The credible
area of the input depth map (e.g., the edges of specular or transparent objects in
contact with background or diffusive objects) plays a critical role in providing in-
formation about spatial arrangement. Inspired by the previous works [35,13], we
make use of a confidence map between the raw and predicted depth maps. How-
ever, unlike [35,13] predicting the confidence map between the multi-modality,
we focus on preserving the correct original value to generate more realistic depth
maps with less distortion. The final depth map can be formulated as:
\hat {\mathcal {I}}_d = C \bigotimes \tilde {\mathcal {I}}_{d} + (1-C) \bigotimes \mathcal {I}_{d} (6)
N
where represents elementwise multiplication, and Îd and Ĩd denote the final
restored depth and the output of depth decoder head, respectively.
Loss Functions For SwinDRNet training, we supervise both the final re-
stored depth Îd and the output of depth decoder head Ĩd , which is formulated
as:
\mathcal {L} = \omega _{\tilde {\mathcal {I}}_d}\mathcal {L}_{\tilde {\mathcal {I}}_d} + \omega _{\hat {\mathcal {I}}_d}\mathcal {L}_{\hat {\mathcal {I}}_d}, (7)
where LÎd and LĨd are the losses of Îd and Ĩd , respectively. ωÎd and ωĨd are
weighting factors. Each of the two loss can be formulated as:
\mathcal {L}_i = \omega _{n}\mathcal {L}_n + \omega _{d}\mathcal {L}_d + \omega _{g}\mathcal {L}_g, (8)
where Ln , Ld and Lg are the L1 losses between the predicted and ground truth
surface normal, depth and the gradient map of depth image, respectively. ωn ,
ωd and ωg are the weights for different losses. We further add higher weight to
the loss within the foreground region, to push the network to concentrate more
on the objects.
Table 4. Quantitative results for domain transfer. The previous best results
means that the best previous method is trained on ClearGrasp and Omniverse, and
evaluated on ClearGrasp. Domain transfer means that SwinDRNet is trained on
DREDS-CatKnown and evaluated on ClearGrasp.
Methods IoU25 IoU50 IoU75 5◦ 2cm 5◦ 5cm 10◦ 2cm 10◦ 5cm 10◦ 10cm
DREDS-CatKnown (Sim)
NOCS-only 85.4 61.1 18.3 22.8 27.2 43.4 51.8 52.9
SGPA-only 77.3 63.7 30.0 30.1 33.1 49.9 55.9 56.7
Refined depth + NOCS 85.4 65.9 27.6 32.1 33.5 57.3 60.9 60.9
Refined depth + SGPA 82.1 73.4 45.4 46.5 47.4 67.5 69.4 69.5
Ours-only 94.3 78.8 36.7 34.6 37.8 55.9 62.9 63.5
Refined depth + Ours 95.3 85.0 49.9 49.3 50.3 70.1 72.8 72.8
STD-CatKnown (Real)
NOCS-only 89.1 63.7 17.2 23.0 28.9 42.1 57.4 58.2
SGPA-only 75.2 63.1 30.5 31.9 34.3 50.3 56.0 56.5
Refined depth + NOCS 88.8 71.1 28.7 29.8 31.2 57.4 60.6 60.7
Refined depth + SGPA 77.2 71.6 49.0 51.1 51.5 72.8 73.7 73.7
Ours-only 91.5 81.3 39.3 38.2 42.9 58.3 71.2 71.5
Refined depth + Ours 91.5 85.7 55.7 53.3 54.1 77.6 79.7 79.7
14 Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, H. Wang
Table 6. Results of real robot experiments. #Objects denotes the sum of grasped
object numbers in all rounds. #Attempts denotes the sum of robotic grasping attempt
numbers in all rounds.
7 Conclusions
In this work, we propose a powerful RGBD fusion network, SwinDRNet, for
depth restoration. Our proposed framework, DREDS, synthesizes a large-scale
RGBD dataset with realistic sensor noises, so as to close the sim-to-real gap for
specular and transparent objects. Furthermore, we collect a real dataset STD, for
real-world performance evaluation. Evaluations on depth restoration, category-
level pose estimation, and object grasping tasks demonstrate the effectiveness of
our method.
Domain Randomization-Enhanced Depth Simulation and Restoration 15
References
1. Blender. https://www.blender.org/
2. Object capture api on macos. https://developer.apple.com/augmented-
reality/object-capture/
3. Bartell, F.O., Dereniak, E.L., Wolfe, W.L.: The theory and measurement of bidi-
rectional reflectance distribution function (brdf) and bidirectional transmittance
distribution function (btdf). In: Radiation scattering in optical systems. vol. 257,
pp. 154–160. SPIE (1981)
4. Breyer, M., Chung, J.J., Ott, L., Roland, S., Juan, N.: Volumetric grasping net-
work: Real-time 6 dof grasp detection in clutter. In: Conference on Robot Learning
(2020)
5. Burley, B.: Extending the disney brdf to a bsdf with integrated subsurface scatter-
ing. Physically Based Shading in Theory and Practice’SIGGRAPH Course (2015)
6. Burley, B., Studios, W.D.A.: Physically-based shading at disney. In: ACM SIG-
GRAPH. vol. 2012, pp. 1–7. vol. 2012 (2012)
7. Calli, B., Singh, A., Bruce, J., Walsman, A., Konolige, K., Srinivasa, S., Abbeel,
P., Dollar, A.M.: Yale-cmu-berkeley dataset for robotic manipulation research. The
International Journal of Robotics Research 36(3), 261–268 (2017)
8. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z.,
Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: An information-rich
3d model repository. arXiv preprint arXiv:1512.03012 (2015)
9. Chen, K., Dou, Q.: Sgpa: Structure-guided prior adaptation for category-level 6d
object pose estimation. In: Proceedings of the IEEE/CVF International Conference
on Computer Vision. pp. 2773–2782 (2021)
10. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using
a multi-scale deep network. Advances in neural information processing systems 27
(2014)
11. Fang, H.S., Wang, C., Gou, M., Lu, C.: Graspnet-1billion: A large-scale bench-
mark for general object grasping. In: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition. pp. 11444–11453 (2020)
12. He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: Pvn3d: A deep point-
wise 3d keypoints voting network for 6dof pose estimation. In: Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition. pp. 11632–
11641 (2020)
13. Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: Penet: Towards precise and
efficient image guided depth completion. In: 2021 IEEE International Conference
on Robotics and Automation (ICRA). pp. 13656–13662. IEEE (2021)
14. Jiang, Z., Zhu, Y., Svetlik, M., Fang, K., Zhu, Y.: Synergies between affordance and
geometry: 6-dof grasp detection via implicit representations. In: Robotics: Science
and Systems XVII, Virtual Event, July 12-16, 2021 (2021)
15. Jiao, J., Cao, Y., Song, Y., Lau, R.: Look deeper into depth: Monocular depth
estimation with semantic booster and attention-driven loss. In: Proceedings of the
European conference on computer vision (ECCV). pp. 53–69 (2018)
16. Khirodkar, R., Yoo, D., Kitani, K.: Domain randomization for scene-specific car
detection and pose estimation. In: 2019 IEEE Winter Conference on Applications
of Computer Vision (WACV). pp. 1932–1940. IEEE (2019)
17. Landau, M.J., Choo, B.Y., Beling, P.A.: Simulating kinect infrared and depth
images. IEEE transactions on cybernetics 46(12), 3018–3031 (2015)
16 Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, H. Wang
18. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin
transformer: Hierarchical vision transformer using shifted windows. In: Proceedings
of the IEEE/CVF International Conference on Computer Vision. pp. 10012–10022
(2021)
19. Long, X., Lin, C., Liu, L., Li, W., Theobalt, C., Yang, R., Wang, W.: Adaptive
surface normal constraint for depth estimation. In: Proceedings of the IEEE/CVF
International Conference on Computer Vision. pp. 12849–12858 (2021)
20. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International
Conference on Learning Representations (2018)
21. Mo, K., Guibas, L.J., Mukadam, M., Gupta, A., Tulsiani, S.: Where2act: From
pixels to actions for articulated 3d objects. In: Proceedings of the IEEE/CVF
International Conference on Computer Vision. pp. 6813–6823 (2021)
22. Mu, T., Ling, Z., Xiang, F., Yang, D., Li, X., Tao, S., Huang, Z., Jia, Z.,
Su, H.: ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale
Demonstrations. In: Annual Conference on Neural Information Processing Systems
(NeurIPS) (2021)
23. Park, J., Joo, K., Hu, Z., Liu, C.K., So Kweon, I.: Non-local spatial propagation
network for depth completion. In: European Conference on Computer Vision. pp.
120–136. Springer (2020)
24. Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of
robotic control with dynamics randomization. In: 2018 IEEE international confer-
ence on robotics and automation (ICRA). pp. 3803–3810. IEEE (2018)
25. Planche, B., Singh, R.V.: Physics-based differentiable depth sensor simulation. In:
Proceedings of the IEEE/CVF International Conference on Computer Vision. pp.
14387–14397 (2021)
26. Planche, B., Wu, Z., Ma, K., Sun, S., Kluckner, S., Lehmann, O., Chen, T., Hutter,
A., Zakharov, S., Kosch, H., et al.: Depthsynth: Real-time realistic synthetic data
generation from cad models for 2.5 d recognition. In: 2017 International Conference
on 3D Vision (3DV). pp. 1–10. IEEE (2017)
27. Prakash, A., Boochoon, S., Brophy, M., Acuna, D., Cameracci, E., State, G.,
Shapira, O., Birchfield, S.: Structured domain randomization: Bridging the re-
ality gap by context-aware synthetic data. In: 2019 International Conference on
Robotics and Automation (ICRA). pp. 7249–7255. IEEE (2019)
28. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learn-
ing on point sets in a metric space. Advances in neural information processing
systems 30 (2017)
29. Qu, C., Liu, W., Taylor, C.J.: Bayesian deep basis fitting for depth completion
with uncertainty. In: Proceedings of the IEEE/CVF International Conference on
Computer Vision. pp. 16147–16157 (2021)
30. Sajjan, S., Moore, M., Pan, M., Nagaraja, G., Lee, J., Zeng, A., Song, S.: Clear
grasp: 3d shape estimation of transparent objects for manipulation. In: 2020 IEEE
International Conference on Robotics and Automation (ICRA). pp. 3634–3642.
IEEE (2020)
31. Schönberger, J.L., Frahm, J.M.: Structure-from-Motion Revisited. In: Conference
on Computer Vision and Pattern Recognition (CVPR) (2016)
32. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain
randomization for transferring deep neural networks from simulation to the real
world. In: 2017 IEEE/RSJ international conference on intelligent robots and sys-
tems (IROS). pp. 23–30. IEEE (2017)
Domain Randomization-Enhanced Depth Simulation and Restoration 17
33. Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T.,
Cameracci, E., Boochoon, S., Birchfield, S.: Training deep networks with synthetic
data: Bridging the reality gap by domain randomization. In: Proceedings of the
IEEE conference on computer vision and pattern recognition workshops. pp. 969–
977 (2018)
34. Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity
invariant cnns. In: 2017 international conference on 3D Vision (3DV). pp. 11–20.
IEEE (2017)
35. Van Gansbeke, W., Neven, D., De Brabandere, B., Van Gool, L.: Sparse and noisy
lidar completion with rgb guidance and uncertainty. In: 2019 16th international
conference on machine vision applications (MVA). pp. 1–6. IEEE (2019)
36. Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normal-
ized object coordinate space for category-level 6d object pose and size estimation.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 2642–2651 (2019)
37. Weng, Y., Wang, H., Zhou, Q., Qin, Y., Duan, Y., Fan, Q., Chen, B., Su, H.,
Guibas, L.J.: Captra: Category-level pose tracking for rigid and articulated objects
from point clouds. In: Proceedings of the IEEE/CVF International Conference on
Computer Vision. pp. 13209–13218 (2021)
38. Xiong, X., Xiong, H., Xian, K., Zhao, C., Cao, Z., Li, X.: Sparse-to-dense depth
completion revisited: Sampling strategy and graph construction. In: European Con-
ference on Computer Vision. pp. 682–699. Springer (2020)
39. Xu, H., Wang, Y.R., Eppel, S., Aspuru-Guzik, A., Shkurti, F., Garg, A.: Seeing
glass: Joint point-cloud and depth completion for transparent objects. In: 5th An-
nual Conference on Robot Learning (2021)
40. Yue, X., Zhang, Y., Zhao, S., Sangiovanni-Vincentelli, A., Keutzer, K., Gong, B.:
Domain randomization and pyramid consistency: Simulation-to-real generalization
without accessing target domain data. In: Proceedings of the IEEE/CVF Interna-
tional Conference on Computer Vision. pp. 2100–2110 (2019)
41. Zakharov, S., Kehl, W., Ilic, S.: Deceptionnet: Network-driven domain random-
ization. In: Proceedings of the IEEE/CVF International Conference on Computer
Vision. pp. 532–541 (2019)
42. Zhang, X., Chen, R., Xiang, F., Qin, Y., Gu, J., Ling, Z., Liu, M., Zeng, P., Han,
S., Huang, Z., et al.: Close the visual domain gap by physics-grounded active
stereovision depth sensor simulation. arXiv preprint arXiv:2201.11924 (2022)
43. Zhu, L., Mousavian, A., Xiang, Y., Mazhar, H., van Eenbergen, J., Debnath, S.,
Fox, D.: Rgb-d local implicit function for depth completion of transparent objects.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 4649–4658 (2021)
18 Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, H. Wang
RealSense D415
Parallel-jaw Depth Sensor
Gripper
Panda Arm
Specular and
Transparent Objects
Box
Fig. 7. The setting of real robot experiment for specular and transparent
object grasping.
Domain Randomization-Enhanced Depth Simulation and Restoration 23
Confidence IoU25 IoU50 IoU75 5◦ 2cm 5◦ 5cm 10◦ 2cm 10◦ 5cm 10◦ 10cm
STD-CatKnown (Real)
91.5 85.7 56.2 51.3 52.2 76.6 78.8 78.8
✓ 91.5 85.7 55.7 53.3 54.1 77.6 79.7 79.7
Table 12. Ablation study for the scale of training data on depth restora-
tion. SwinDRNet is trained on DREDS-CatKnown and evaluated on the specular and
transparent objects of STD.
Examples of DREDS-CatKnown
Examples of DREDS-CatNovel
Fig. 9. CAD models of the STD object set. The 1st to 7th rows show 42 objects
in 7 categories, and the last row shows 8 objects in novel categories.