Skip to main content

Advertisement

Log in

IV-Net: single-view 3D volume reconstruction by fusing features of image and recovered volume

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Single-view 3D reconstruction aims to recover the 3D shape from one image of an object and has attracted increasingly attention in recent years. Mostly, previous works are devoted to learning a mapping from 2 to 3D, and lack of spatial information of objects will cause inaccurate reconstruction on the details of objects. To address this issue, for single-view 3D reconstruction, we propose a novel voxel-based network by fusing features of image and recovered volume, named IV-Net. By a pre-trained baseline, it achieves image feature and a coarse volume from each image input, where the recovered volume contains spatial semantic information. Specially, the multi-scale convolutional block is designed to improve 2D encoder by extracting multi-scale image information. To recover more accurate shape and details of the object, an IV refiner is further used to reconstruct the final volume. We conduct experimental evaluations on both synthetic ShapeNet dataset and real-world Pix3D dataset, and results of comparative experiments indicate that our IV-Net outperforms state-of-the-art approaches about accuracy and parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Montefusco, L.B., Lazzaro, D., Papi, S., et al.: A fast compressed sensing approach to 3D MR image reconstruction. IEEE Trans. Med. Imaging 30(5), 1064–1075 (2010). https://doi.org/10.1109/TMI.2010.2068306

    Article  Google Scholar 

  2. Sra, M., Garrido-Jurado, S., Schmandt, C., Maes, P.: Procedurally generated virtual reality from 3D reconstructed physical space. In: ACM Conference on Virtual Reality Software and Technology, pp. 191–200 (2016). https://doi.org/10.1145/2993369.2993372

  3. Avetisyan, A., Dahnert, M., Dai, A., et al.: Scan2CAD: learning CAD model alignment in RGB-D scans. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2609–2618 (2019). https://doi.org/10.1109/CVPR.2019.00272

  4. Popa, A.I., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2d and 3d human sensing. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4714–4723 (2017). https://doi.org/10.1109/CVPR.2017.501

  5. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  6. Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part I. IEEE Robot. Autom. Mag. 13(2), 99–110 (2006)

    Article  Google Scholar 

  7. Choy, C.B, Xu, D., Gwak, J.Y., et al.: 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In: European Conference on Computer Vision, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38

  8. Yang, B., Wang, S., Markham, A., et al.: Robust attentional aggregation of deep feature sets for multi-view 3D reconstruction. Int. J. Comput. Vis. 128(1), 53–73 (2020). https://doi.org/10.1007/s11263-019-01217-w

    Article  MathSciNet  MATH  Google Scholar 

  9. Xie, H., Yao, H., Zhang, S., Zhou, S., Sun, W.: Pix2Vox++: multi-scale context-aware 3D object reconstruction from single and multiple images. Int. J. Comput. Vis. 128(12), 2919–2935 (2020). https://doi.org/10.1007/s11263-020-01347-6

    Article  Google Scholar 

  10. Wu, J., Zhang, C., Xue, T., et al.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. Adv. Neural Inf. Process. Syst. 29 (2016)

  11. Yagubbayli, F., Tonioni, A., Tombari, F.: Legoformer: transformers for block-by-block multi-view 3D reconstruction. arXiv preprint arXiv:2106.12102 (2021)

  12. Shi, Z., Meng, Z., Xing, Y., et al.: 3D-RETR: end-to-end single and multi-view 3D reconstruction with transformers. arXiv preprint arXiv:2110.08861 (2021)

  13. Tatarchenko, M., Dosovitskiy, A., Brox T.: Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: IEEE International Conference on Computer Vision, pp. 2107–2115 (2017). https://doi.org/10.1109/ICCV.2017.230

  14. Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1936–1944 (2018). https://doi.org/10.1109/CVPR.2018.00207

  15. Wang, N., Zhang, Y., Li, Z., et al.: Pixel2mesh: generating 3D mesh models from single RGB images. In: European Conference on Computer Vision, pp. 52–67 (2018)

  16. Fan, H., Su, H., Guibas, L.J..: A point set generation network for 3D object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2463–2471 (2017). https://doi.org/10.1109/CVPR.2017.264

  17. Mandikal, P., Navaneet, K.L., Agarwal, M., et al.: 3D-LMNet: Latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image. arXiv preprint arXiv:1807.07796 (2018)

  18. Nozawa, N., Shum, H.P.H., Feng, Q., et al.: 3D car shape reconstruction from a contour sketch using GAN and lazy learning. Vis. Comput. 38, 1317–1330 (2022). https://doi.org/10.1007/s00371-020-02024-y

    Article  Google Scholar 

  19. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. (2014)

  20. Wiles, O., Zisserman, A.: SilNet: single- and multi-view reconstruction by learning from silhouettes. In: British Machine Vision Conference (2017)

  21. Berman, M., Triki, A.R., Blaschko, M.B.: The lovasz-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4413–4421 (2018). https://doi.org/10.1109/CVPR.2018.00464

  22. Sudre, C.H., Li, W., Vercauteren, T., et al.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 240–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28

  23. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  24. Chang, A.X., Funkhouser, T., Guibas, L., et al.: Shapenet: an information-rich 3D model repository. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

  25. Sun, X., Wu, J., Zhang, X., et al.: Pix3d: dataset and methods for single-image 3D shape modeling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2974–2983 (2018). https://doi.org/10.1109/CVPR.2018.00314

  26. Li, Y., Wang, Z., Yin, L., et al.: X-Net: a dual encoding–decoding method in medical image segmentation. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02328-7

    Article  Google Scholar 

  27. Tatarchenko, M., Richter, S.R., Ranftl, R., Li, Z., Koltun, V., Brox, T.: What do single-view 3D reconstruction networks learn? In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3400–3409 (2019). https://doi.org/10.1109/CVPR.2019.00352

  28. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. ACM Siggraph Comput. Graph. 21(4), 163–169 (1987). https://doi.org/10.1145/37402.37422

    Article  Google Scholar 

  29. Groueix, T., Fisher, M., Kim, V.G., et al.: AtlasNet: a Papier–Mâché approach to learning 3D surface generation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 216–224 (2018)

  30. Mescheder, L., Oechsle, M., Niemeyer, M., et al.: Occupancy networks: learning 3D reconstruction in function space. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4455–4465 (2019). https://doi.org/10.1109/CVPR.2019.00459

  31. Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5932–5941 (2019). https://doi.org/10.1109/CVPR.2019.00609

  32. Su, H., Qi, C.R., Li, Y., et al.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: IEEE International Conference on Computer Vision, pp. 2686–2694 (2015). https://doi.org/10.1109/ICCV.2015.308

  33. Xiao, J., Hays, J., Ehinger, K.A., et al.: Sun database: large-scale scene recognition from abbey to zoo. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010). https://doi.org/10.1109/CVPR.2010.5539970

Download references

Funding

This work was supported by National Natural Science Foundation of China (Grant Number [11471093]).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ping Jiang.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, B., Jiang, P., Kong, D. et al. IV-Net: single-view 3D volume reconstruction by fusing features of image and recovered volume. Vis Comput 39, 6237–6247 (2023). https://doi.org/10.1007/s00371-022-02725-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02725-6

Keywords

Navigation