Abstract
Single-view 3D reconstruction aims to recover the 3D shape from one image of an object and has attracted increasingly attention in recent years. Mostly, previous works are devoted to learning a mapping from 2 to 3D, and lack of spatial information of objects will cause inaccurate reconstruction on the details of objects. To address this issue, for single-view 3D reconstruction, we propose a novel voxel-based network by fusing features of image and recovered volume, named IV-Net. By a pre-trained baseline, it achieves image feature and a coarse volume from each image input, where the recovered volume contains spatial semantic information. Specially, the multi-scale convolutional block is designed to improve 2D encoder by extracting multi-scale image information. To recover more accurate shape and details of the object, an IV refiner is further used to reconstruct the final volume. We conduct experimental evaluations on both synthetic ShapeNet dataset and real-world Pix3D dataset, and results of comparative experiments indicate that our IV-Net outperforms state-of-the-art approaches about accuracy and parameters.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Montefusco, L.B., Lazzaro, D., Papi, S., et al.: A fast compressed sensing approach to 3D MR image reconstruction. IEEE Trans. Med. Imaging 30(5), 1064–1075 (2010). https://doi.org/10.1109/TMI.2010.2068306
Sra, M., Garrido-Jurado, S., Schmandt, C., Maes, P.: Procedurally generated virtual reality from 3D reconstructed physical space. In: ACM Conference on Virtual Reality Software and Technology, pp. 191–200 (2016). https://doi.org/10.1145/2993369.2993372
Avetisyan, A., Dahnert, M., Dai, A., et al.: Scan2CAD: learning CAD model alignment in RGB-D scans. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2609–2618 (2019). https://doi.org/10.1109/CVPR.2019.00272
Popa, A.I., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2d and 3d human sensing. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4714–4723 (2017). https://doi.org/10.1109/CVPR.2017.501
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part I. IEEE Robot. Autom. Mag. 13(2), 99–110 (2006)
Choy, C.B, Xu, D., Gwak, J.Y., et al.: 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In: European Conference on Computer Vision, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Yang, B., Wang, S., Markham, A., et al.: Robust attentional aggregation of deep feature sets for multi-view 3D reconstruction. Int. J. Comput. Vis. 128(1), 53–73 (2020). https://doi.org/10.1007/s11263-019-01217-w
Xie, H., Yao, H., Zhang, S., Zhou, S., Sun, W.: Pix2Vox++: multi-scale context-aware 3D object reconstruction from single and multiple images. Int. J. Comput. Vis. 128(12), 2919–2935 (2020). https://doi.org/10.1007/s11263-020-01347-6
Wu, J., Zhang, C., Xue, T., et al.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. Adv. Neural Inf. Process. Syst. 29 (2016)
Yagubbayli, F., Tonioni, A., Tombari, F.: Legoformer: transformers for block-by-block multi-view 3D reconstruction. arXiv preprint arXiv:2106.12102 (2021)
Shi, Z., Meng, Z., Xing, Y., et al.: 3D-RETR: end-to-end single and multi-view 3D reconstruction with transformers. arXiv preprint arXiv:2110.08861 (2021)
Tatarchenko, M., Dosovitskiy, A., Brox T.: Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: IEEE International Conference on Computer Vision, pp. 2107–2115 (2017). https://doi.org/10.1109/ICCV.2017.230
Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1936–1944 (2018). https://doi.org/10.1109/CVPR.2018.00207
Wang, N., Zhang, Y., Li, Z., et al.: Pixel2mesh: generating 3D mesh models from single RGB images. In: European Conference on Computer Vision, pp. 52–67 (2018)
Fan, H., Su, H., Guibas, L.J..: A point set generation network for 3D object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2463–2471 (2017). https://doi.org/10.1109/CVPR.2017.264
Mandikal, P., Navaneet, K.L., Agarwal, M., et al.: 3D-LMNet: Latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image. arXiv preprint arXiv:1807.07796 (2018)
Nozawa, N., Shum, H.P.H., Feng, Q., et al.: 3D car shape reconstruction from a contour sketch using GAN and lazy learning. Vis. Comput. 38, 1317–1330 (2022). https://doi.org/10.1007/s00371-020-02024-y
Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. (2014)
Wiles, O., Zisserman, A.: SilNet: single- and multi-view reconstruction by learning from silhouettes. In: British Machine Vision Conference (2017)
Berman, M., Triki, A.R., Blaschko, M.B.: The lovasz-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4413–4421 (2018). https://doi.org/10.1109/CVPR.2018.00464
Sudre, C.H., Li, W., Vercauteren, T., et al.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 240–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Chang, A.X., Funkhouser, T., Guibas, L., et al.: Shapenet: an information-rich 3D model repository. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Sun, X., Wu, J., Zhang, X., et al.: Pix3d: dataset and methods for single-image 3D shape modeling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2974–2983 (2018). https://doi.org/10.1109/CVPR.2018.00314
Li, Y., Wang, Z., Yin, L., et al.: X-Net: a dual encoding–decoding method in medical image segmentation. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02328-7
Tatarchenko, M., Richter, S.R., Ranftl, R., Li, Z., Koltun, V., Brox, T.: What do single-view 3D reconstruction networks learn? In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3400–3409 (2019). https://doi.org/10.1109/CVPR.2019.00352
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. ACM Siggraph Comput. Graph. 21(4), 163–169 (1987). https://doi.org/10.1145/37402.37422
Groueix, T., Fisher, M., Kim, V.G., et al.: AtlasNet: a Papier–Mâché approach to learning 3D surface generation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 216–224 (2018)
Mescheder, L., Oechsle, M., Niemeyer, M., et al.: Occupancy networks: learning 3D reconstruction in function space. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4455–4465 (2019). https://doi.org/10.1109/CVPR.2019.00459
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5932–5941 (2019). https://doi.org/10.1109/CVPR.2019.00609
Su, H., Qi, C.R., Li, Y., et al.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: IEEE International Conference on Computer Vision, pp. 2686–2694 (2015). https://doi.org/10.1109/ICCV.2015.308
Xiao, J., Hays, J., Ehinger, K.A., et al.: Sun database: large-scale scene recognition from abbey to zoo. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010). https://doi.org/10.1109/CVPR.2010.5539970
Funding
This work was supported by National Natural Science Foundation of China (Grant Number [11471093]).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, B., Jiang, P., Kong, D. et al. IV-Net: single-view 3D volume reconstruction by fusing features of image and recovered volume. Vis Comput 39, 6237–6247 (2023). https://doi.org/10.1007/s00371-022-02725-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02725-6