Evaluating ChatGPT and GPT-4 for Visual Programming

Singla, Adish

Abstract:Generative AI and large language models have the potential to drastically improve the landscape of computing education by automatically generating personalized feedback and content. Recent works have studied the capabilities of these models for different programming education scenarios; however, these works considered only text-based programming, in particular, Python programming. Consequently, they leave open the question of how well these models would perform in visual programming domains popularly used for K-8 programming education. The main research question we study is: Do state-of-the-art generative models show advanced capabilities in visual programming on par with their capabilities in text-based Python programming? In our work, we evaluate two models, ChatGPT (based on GPT-3.5) and GPT-4, in visual programming domains for various scenarios and assess performance using expert-based annotations. In particular, we base our evaluation using reference tasks from the domains of Hour of Code: Maze Challenge by Code-dot-org and Karel. Our results show that these models perform poorly and struggle to combine spatial, logical, and programming skills crucial for visual programming. These results also provide exciting directions for future work on developing techniques to improve the performance of generative models in visual programming.

Comments:	This article is a full version of the poster (extended abstract) from ICER'23
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as:	arXiv:2308.02522 [cs.LG]
	(or arXiv:2308.02522v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2308.02522

Computer Science > Machine Learning

Title:Evaluating ChatGPT and GPT-4 for Visual Programming

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators