Published June 14, 2022 | Version 1.0.0
Dataset Open

MineDojo Internet Knowledge Base (Wiki)

  • 1. NVIDIA
  • 2. Caltech
  • 3. Stanford
  • 4. Columbia
  • 5. SJTU
  • 6. NVIDIA, UT Austin
  • 7. NVIDIA, Caltech

Description

Project website: minedojo.org

Paper: arxiv.org/abs/2206.08853

GitHub: github.com/MineDojo/MineDojo

The Minecraft Wiki pages cover almost every aspect of the game mechanics, and supply a rich source of unstructured knowledge in multimodal tables, recipes, illustrations, and step-by-step tutorials. We scrape 6,735 pages that interleave text, images, tables, and diagrams. To preserve the layout information, we also save the screenshots of entire pages and extract bounding boxes of the visual elements.

There are two files in our Wiki knowledge base.

  • wiki_samples.zip: A sample version of the full knowledge base (10 pages). 
  • wiki_full.zip: The full knowledge base (6,735 pages). 

Check out our paper!

@article{fan2022minedojo,
  title = {MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge},
  author = {Linxi Fan and Guanzhi Wang and Yunfan Jiang and Ajay Mandlekar and Yuncong Yang and Haoyi Zhu and Andrew Tang and De-An Huang and Yuke Zhu and Anima Anandkumar},
  year = {2022},
  journal = {arXiv preprint arXiv: Arxiv-2206.08853}
}

Files

logo.png

Files (32.9 GB)

Name Size Download all
md5:7276329efa8fbce442545075fd51b7dc
990.2 kB Preview Download
md5:5e5a590891072fbd0686fdc2a7883cd2
32.7 GB Preview Download
md5:0dcbb67b86360ef777b05516b530ba5c
128.1 MB Preview Download