0% found this document useful (0 votes)

3 views106 pages

Foundations of Programming Languages Part 1

Undergraduate Topics in Computer Science (UTiCS) provides high-quality instructional content for undergraduate students in computing and information science, covering foundational to advanced topics. The text, authored by experts, emphasizes interactive learning through exercises and projects, focusing on three programming paradigms: object-oriented, functional, and logic programming. It aims to equip students with the tools necessary for a lifelong career in computer science, encouraging self-learning and creativity.

Uploaded by

sirodaniel48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views106 pages

Foundations of Programming Languages Part 1

Uploaded by

sirodaniel48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 106

Undergraduate Topics in Computer Science

Kent D. Lee

Foundations of
Programming
Languages

www.allitebooks.com
Undergraduate Topics in Computer
Science

www.allitebooks.com
Undergraduate Topics in Computer Science (UTiCS) delivers high-quality instructional
content for undergraduates studying in all areas of computing and information science.
From core foundational and theoretical material to final-year topics and applications, UTiCS
books take a fresh, concise, and modern approach and are ideal for self-study or for a one- or
two-semester course. The texts are all authored by established experts in their fields,
reviewed by an international advisory board, and contain numerous examples and problems.
Many include fully worked solutions.

More information about this series at http://www.springer.com/series/7592

www.allitebooks.com
Kent D. Lee

Foundations
of Programming
Languages

123

www.allitebooks.com
Kent D. Lee
Luther College
Decorah, IA
USA

Series editor
Ian Mackie

Advisory Board
Samson Abramsky, University of Oxford, Oxford, UK
Karin Breitman, Pontiﬁcal Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
Chris Hankin, Imperial College London, London, UK
Dexter Kozen, Cornell University, Ithaca, USA
Andrew Pitts, University of Cambridge, Cambridge, UK
Hanne Riis Nielson, Technical University of Denmark, Kongens Lyngby, Denmark
Steven Skiena, Stony Brook University, Stony Brook, USA
Iain Stewart, University of Durham, Durham, UK

ISSN 1863-7310 ISSN 2197-1781 (electronic)

Undergraduate Topics in Computer Science
ISBN 978-3-319-13313-3 ISBN 978-3-319-13314-0 (eBook)
DOI 10.1007/978-3-319-13314-0

Library of Congress Control Number: 2014956497

Springer Cham Heidelberg New York Dordrecht London

© Springer International Publishing Switzerland 2014
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made.

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media

(www.springer.com)

www.allitebooks.com
Preface

A career in Computer Science is a commitment to a lifetime of learning. You will

not be taught every detail you will need in your career while you are a student. The
goal of a Computer Science education is to give you the tools you need so you can
teach yourself new languages, frameworks, and architectures as they come along.
The creativity encouraged by a lifetime of learning makes Computer Science one of
the most exciting fields today.
There are engineering and theoretical aspects to the field of Computer Science.
Theory often is a part of the development of new programming languages and tools
to make programmers more productive. Computer programming is the process of
building complex systems with those tools. Computer programmers are program
engineers and this process is sometimes called software engineering. No matter
what kind of job you end up doing, understanding the tools of Computer Science,
and specifically the programming languages you use, will help you become a better
programmer.
As programmers it is important that we be able to predict what our programs will
do. Predicting what a program will do is easier if you understand the way the
programming language works. Programs execute according to a computational
model. A model may be implemented in many different ways depending on the
targeted hardware architecture. While there are currently a number of popular
hardware architectures, most can be categorized into one of two main areas: reg-
ister-based central processing units and stack-based virtual machines. While these
two types of architectures are different in some ways, they also share a number of
characteristics when used as the target for programming languages. This text
develops a stack-based virtual machine based on the Python virtual machine called
CoCo.
Computer scientists differentiate programming languages based on three para-
digms or ways of thinking about programming: object-oriented/imperative pro-
gramming, functional programming, and logic programming. This text covers these
three paradigms while using each of them in the implementation of a non-trivial
programming language.

www.allitebooks.com
vi Preface

It is expected that most readers of this text will have had some prior experience
with object-oriented languages. CoCo is implemented in C++, providing a chance
to learn C++ in some detail and see it used in a larger software project like the
CoCo implementation. The text proceeds in a bottom-up fashion by implementing
extensions to CoCo using C++. Then a full-featured functional language called
Small is implemented on top of the CoCo virtual machine. The Small language is a
subset of Standard ML. Standard ML is ﬁrst introduced in this text and then used to
implement the Small subset of the Standard ML language, which really isn’t that
small afterall. Finally, late in the text a type inference system for Small is developed
and implemented in Prolog. Prolog is an example of a logic programming language.
The text is meant to be used interactively. You should read a section and as you
read it, do the practice exercises. Each of the exercises are meant to give you a goal
in reading a section of the text.
The text website http://www.cs.luther.edu/∼leekent/PL includes code and other
support ﬁles that may be downloaded. These include the CoCo virtual machine and
the MLComp compiler/type inference system.
I hope you enjoy reading the text and working through the exercises and practice
problems. Have fun with it and get creative!

www.allitebooks.com
For Teachers

This book was written to fulfill two goals. The first is to introduce students to three
programming paradigms: object-oriented/imperative, functional, and logic pro-
gramming. To be ready for the content of this book students should have some
background in an imperative language, probably an object-oriented language like
Python, Java, or C++. They should have had an introductory course and a course in
Data Structures as a minimum. While the prepared student will have written
several programs, some of them fairly complex, most probably still struggle with
predicting exactly what their program will do. It is assumed that ideas like
polymorphism, recursion, and logical implication are relatively new to students
reading this book. The text assumes that students have little or no experience with
the functional and logic programming paradigms.
The Object-Oriented language presented in this book is C++. C++ has many
nuances that are worthy of several chapters, but because of the breadth of
information covered in this text many details of the language must be left out. To
thoroughly cover the whole language, students may be encouraged to pick up an
additional text focusing on just C++. However, significant topics of C++ are
presented in this text. Notably the pass by value and pass by reference mechanisms
in C++ create considerable complexity in the language. Polymorphism is another
interesting aspect of Object-Oriented languages that is studied in this text.
The text uses Standard ML as the functional language. ML has a polymorphic
type inference system to statically type programs of the language. In addition, the
type inference system of ML is formally proven sound and complete. This has some
implications in writing programs. While ML’s cryptic compiler error messages are
sometimes hard to understand at first, once a program compiles it will often work
correctly the first time. That is an amazing statement to make if your past experience
is in a dynamically typed language like Lisp, Scheme, Ruby, or Python.
The logic language used in this text is Prolog. While Prolog has traditionally
been an Artificial Intelligence language, it originated as a meta-language for
expressing other languages. The text concentrates on using Prolog to implement a

vii

www.allitebooks.com
viii For Teachers

type inference system. Students learn about logical implication and how a problem
they are familiar with can be re-expressed in a logic programming language.
The second goal of the text is to be interactive. This book is intended to be used
in and outside of class. It is my experience that we almost all learn more by doing
than by seeing. To that end, the text encourages teachers to actively teach. Each
chapter follows a pattern of presenting a topic followed by a practice exercise or
exercises that encourage students to try what they have just read. These exercises
can be used in class to help students check their understanding of a topic. Teachers
are encouraged to take the time to present a topic and then allow students time to
reflect and practice the concept just presented. In this way the text becomes a
lecture resource. Students get two things out of this. It forces them to be
interactively engaged in the lectures, not just passive observers. It also gives them
immediate feedback on key concepts to help them determine if they understand the
material or not. This encourages them to ask questions when they have difficulty
with an exercise. Tell students to bring the book to class along with a pencil and
paper. The practice exercises are easily identified.
The book presents several projects to reinforce topics outside the classroom.
Each chapter of the text suggests several nontrivial programming projects that
accompany the paradigm being covered to drive home the concepts covered in that
chapter. The projects and exercises described in this text have been tested in
practice and documentation and solutions are available upon request.
I have been fortunate to have good teachers throughout high school, college,
and graduate school. Good teachers are a valuable commodity and we need more
of them. Ken Slonneger was my advisor in graduate school and this book came
into being because of him. He inspired me to write a text that supports the same
teaching style he used in his classroom. I would also like to thank Dr. Eric Manley
of Drake University for working with me by trying the projects with his students
and for the valuable feedback he provided to me during the development of this
text. Thanks, Eric!

www.allitebooks.com
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Historical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Models of Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 The Origins of a Few Programming Languages . . . . . . . . . . 10
1.4 Language Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.7 Solutions to Practice Problems . . . . . . . . . . . . . . . . . . . . . . 25

2 Syntax .......................... . . . . . . . . . . . . . . . . . . 27
2.1 Terminology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Backus Naur Form (BNF) . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Context-Free Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Derivations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Parse Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Abstract Syntax Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 Lexical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.8 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.9 Top-Down Parsers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.10 Bottom-Up Parsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.11 Ambiguity in Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.12 Other Forms of Grammars . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.13 Limitations of Syntactic Definitions. . . . . . . . . . . . . . . . . . . 45
2.14 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.15 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.17 Solutions to Practice Problems . . . . . . . . . . . . . . . . . . . . . . 48

3 Assembly Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1 Overview of the CoCo VM . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Input/Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4 If-Then-Else Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5 While Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

www.allitebooks.com
x Contents

3.6 Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.7 List Constants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.8 Calling a Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.9 Iterating Over a List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.10 Range Objects and Lazy Evaluation . . . . . . . . . . . . . . . . . . 75
3.11 Functions and Closures . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.12 Recursion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.13 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.14 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.15 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.16 Solutions to Practice Problems . . . . . . . . . . . . . . . . . . . . . . 85

4 C++ . .............................. . . . . . . . . . . . . . . . 93
4.1 C++ Development Fundamentals. . . . . . . . . . . . . . . . . . . . . 95
4.2 Overview of the CoCo Virtual Machine . . . . . . . . . . . . . . . . 99
4.3 Building a Large Project . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4 Static Type Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.5 Declaring Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.6 Pointers and Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.7 Writing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.8 Parameter Passing and Return Values . . . . . . . . . . . . . . . . . 112
4.9 C++ References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.10 Const in C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.11 Header Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.12 OOP Using C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.13 Pre-defined C++ Classes . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.14 The Standard Template Library. . . . . . . . . . . . . . . . . . . . . . 122
4.15 Operator Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.16 Classes and Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.17 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.18 Constructors and Initilization Lists . . . . . . . . . . . . . . . . . . . 127
4.19 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.20 Abstract Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.21 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.22 Writing Templates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.23 C++ Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.24 Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.25 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.26 CoCo Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.27 Implementing Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.28 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.29 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.30 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4.31 Solutions to Practice Problems . . . . . . . . . . . . . . . . . . . . . . 170
Contents xi

5 Standard ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.1 Imperative Versus Functional Programming . . . . . . . . . . . . . 174
5.2 The Lambda Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
5.3 Getting Started with Standard ML . . . . . . . . . . . . . . . . . . . . 178
5.4 Expressions, Types, Structures, and Functions . . . . . . . . . . . 179
5.5 Recursive Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.6 Characters, Strings, and Lists . . . . . . . . . . . . . . . . . . . . . . . 183
5.7 Pattern Matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.8 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.9 Let Expressions and Scope . . . . . . . . . . . . . . . . . . . . . . . . . 188
5.10 Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.11 Parameter Passing in Standard ML . . . . . . . . . . . . . . . . . . . 193
5.12 Efficiency of Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.13 Tail Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.14 Currying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
5.15 Anonymous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
5.16 Higher-Order Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.17 Continuation Passing Style . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.18 Input and Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.19 Programming with Side-Effects. . . . . . . . . . . . . . . . . . . . . . 206
5.20 Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
5.21 Encapsulation in ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
5.22 Type Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
5.23 Building a Prefix Caclculator Interpreter . . . . . . . . . . . . . . . 212
5.24 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
5.25 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
5.26 Solutions to Practice Problems . . . . . . . . . . . . . . . . . . . . . . 220

6 Compiling Standard ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

6.1 ML-lex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.2 The Small AST Definition . . . . . . . . . . . . . . . . . . . . . . . . . 233
6.3 Using ML-yacc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.4 Compiling and Running the Compiler . . . . . . . . . . . . . . . . . 240
6.5 Function Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
6.6 Let Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
6.7 Unary Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
6.8 If-Then-Else Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 251
6.9 Short-Circuit Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
6.10 Defining Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
6.11 Reference Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
6.12 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
6.13 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
6.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
6.15 Solutions to Practice Problems . . . . . . . . . . . . . . . . . . . . . . 265
xii Contents

7 Prolog ...................... . . . . . . . . . . . . . . . . . . . . . . 267

7.1 Getting Started with Prolog . . . . . . . . . . . . . . . . . . . . . . . . 269
7.2 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
7.3 The Prolog Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
7.4 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
7.5 The Accumulator Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . 273
7.6 Built-In Predicates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
7.7 Unification and Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 275
7.8 Input and Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
7.9 Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
7.10 Parsing in Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
7.11 Prolog Grammar Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
7.12 Building an AST. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
7.13 Attribute Grammars. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
7.14 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
7.15 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
7.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
7.17 Solutions to Practice Problems . . . . . . . . . . . . . . . . . . . . . . 290

8 Type Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

8.1 Why Static Type Inference? . . . . . . . . . . . . . . . . . . . . . . . . 296
8.2 Type Inference Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.3 Using Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.4 The Type Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
8.5 Integers, Strings, and Boolean Constants . . . . . . . . . . . . . . . 303
8.6 List and Tuple Constants . . . . . . . . . . . . . . . . . . . . . . . . . . 304
8.7 Identifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
8.8 Function Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
8.9 Let Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
8.10 Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
8.11 Matches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
8.12 Anonymous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
8.13 Sequential Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
8.14 If-Then and While-Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
8.15 Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
8.16 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
8.17 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
8.18 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
8.19 Solutions to Practice Problems . . . . . . . . . . . . . . . . . . . . . . 322
Contents xiii

9 Appendix A: The CoCo Virtual Machine Speciﬁcation . . . . . . . . . 325

10 Appendix B: The Standard ML Basis Library . . . . . . . . . . . . . . . 335

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Introduction
1

This text on Programming Languages is intended to introduce you to new ways of

thinking about programming. Typically, computer science students start out learning
to program in an imperative model of programming where variables are created and
updated as a program executes. There are other ways to program. As you learn to
program in these new paradigms you will begin to understand that there are different
ways of thinking about problem solving. Each paradigm is useful in some contexts.
This book is not meant to be a survey of lots of different languages. Rather, its purpose
is to introduce you to the three styles of programming languages by using them to im-
plement a non-trivial programming language. These three style of programming are:

• Imperative/Object-Oriented Programming with languages like Java, C++, Python,

and other languages you may have used before.
• Functional Programming with languages like Standard ML, Haskell, Lisp, Scheme,
and others.
• Logic Programming with Prolog.

The book provides an introduction to programming in assembly language, C++,

Standard ML, and Prolog. However, the programming language concepts covered
in this text apply to all languages in use today. The goal of the text is to help you
understand how to use the paradigms and models of computation these languages
represent to solve problems. The text elaborates on when these languages may be
appropriate for a problem by showing you how they can be used to implement a
programming language. Many of the problems solved while implementing a pro-
gramming language are similar to other problems in computer science. The text
elaborates on techniques for problem solving that you may be able to apply in the
future. You might be surprised by what you can do and how quickly a program can
come together given the right choice of language.
To begin you should know something about the history of computing, particularly
as it applies to the models of computation that have been used in implementing many
of the programming languages we use today. All of what we know in Computer
Science is built on the shoulders of those who came before us. To understand where
© Springer International Publishing Switzerland 2014 1
K.D. Lee, Foundations of Programming Languages,
Undergraduate Topics in Computer Science, DOI 10.1007/978-3-319-13314-0_1
2 1 Introduction

we are, we really should know something about where we came from in terms of
Computer Science. Many great people have been involved in the development of
programming languages and to learn even a little about who these people are is
really fascinating and worthy of an entire book in itself.

1.1 Historical Perspective

Much of what we attribute to Computer Science actually came from Mathematics.

Many mathematicians are programmers that have written their programs, or proofs
in the words of Mathematics, using mathematical notation. In the mid 1800s abstract
algebra and geometry were hot topics of research among mathematicians. In the early
1800s Niels Henrik Abel, a Norwegian mathematician, was interested in solving
a problem called the quintic equation. Eventually he developed a new branch of
mathematics called Group Theory with which he was able to prove there was no
general algebraic solution to the quintic equation. Considering the proof of this
required a new branch of mathematics, much of Abel’s work involved developing
the mathematical notation or language to describe his work. Unfortunately, Abel
died of tuberculosis at twenty six years old.
Sophus Lie (pronounced Lee), pictured in Fig. 1.1, was another Norwegian math-
ematician who lived from 1842–1899 [20]. He began where Abel’s research ended
and explored the connection of Abstract Algebra and Group Theory with Geometry.
From this work he developed a set of group theories, eventually named Lie Groups.
From this discovery he found ways of solving Ordinary Differential Equations by
exploiting properties of symmetry within the equations [8]. One Lie group, the E8
group was too complicated to map in Lie’s time. In fact, it wasn’t until 2007 that
the structure of the E8 group could be mapped because the solution produced sixty
times more data than the human genome project [1].

Fig. 1.1 Sophus Lie [21]

1.1 Historical Perspective 3

While the techniques Lie and Abel discovered were hard for people to learn and
use at the time, today computer programs capable of symbolic manipulation use
Lie’s techniques to solve these and other equally complicated problems. And, the
solutions of these problems are very relevant in the world today. For example, the
work of Sophus Lie is used in the design of aircraft.
As mathematicians’ problem solving techniques became more sophisticated and
the problems they were solving became more complex, they were interested in finding
automated ways of solving these problems. Charles Babbage (1791–1871) saw the
need for a computer to do calculations that were too error-prone for humans to
perform. He designed a difference engine to compute mathematical tables when he
found that human computers weren’t very accurate [26]. However, his computer was
mechanical and couldn’t be built using engineering techniques known at that time.
In fact it wasn’t completed until 1990, but it worked just as he said it would over a
hundred years earlier.
Charles Babbage’s difference engine was an early attempt at automating a so-
lution to a problem, but others would follow of course. Alan Turing was a British
mathematician and one of the first computer scientists. He lived from 1912–1954. In
1936 he wrote a paper entitled, “On Computable Numbers, with an Application to the
Entscheidungsproblem” [23]. The Entscheidungsproblem, or decision problem, had
been proposed a decade earlier by a German mathematician named David Hilbert.
This problem asks: Can an algorithm be defined that decides if a statement given in
first order logic can be proved from a set of axioms and known truth values? The
problem was later generalized to the following question: Can we come up with a
general set of steps that given any algorithm and its data, will decide if it terminates?
In Alan Turing’s paper, he devised an abstract machine called the Turing Machine.
This Turing Machine was very general and simple. It consisted of a set of states and
a tape. The set of states were decided on by a programmer. The machine starts in
the start state as decided by the programmer. From that state it could read a symbol
from a tape. Based on the symbol it could write a symbol to the tape and move to
the left or right, while transitioning to another state. As the Turing machine ran, the
action that it took was dictated by the current state and the symbol on the tape. The
programmer got to decide how many states were a part of the machine, what each
state should do, and how to move from one state to another. In Turing’s paper he
proved that such a machine could be used to solve any computable function and that
the decision problem was not solvable by this machine. The more general statement
of this problem was named the Halting Problem. This was a very important result in
the field of theoretical Computer Science.
In 1939 John Atanasoff, at Iowa State University, designed what is arguably the
first computer, the ABC or Atanasoff-Berry Computer [27]. Clifford Berry was one of
his graduate students. The computer had no central processing unit, but it did perform
logical and other mathematical operations. Eckert and Mauchly, at the University of
Pennsylvania, were interested in building a computer during the second world war.
They were funded by the Department of Defense to build a machine to calculate
trajectory tables for launching shells from ships. The computer, called ENIAC for
Electronic Numerical Integrator and Computer, was unveiled in 1946, just after the
4 1 Introduction

war had ended. ENIAC was difficult to program since the program was written by
plugging cables into a switch, similar to an old telephone switchboard.
Around that same time a new computer, called EDVAC, was being designed. In
1945 John von Neumann proposed storing the computer programs on EDVAC in
memory along with the program data [25]. Alan Turing closely followed John von
Neumann’s paper by publishing a paper of his own in 1946 describing a more com-
plete design for stored-program computers [22]. To this day the computers we build
and use are stored-program computers. The architecture is called the von Neumann
architecture because of John von Neumann’s and Alan Turing’s contributions. While
Turing didn’t get the architecture named after him, he is famous in Computer Science
for other reasons like the Turing machine and the Halting problem.
In the early days of Computer Science, many programmers were interested in
writing tools that made it easier to program computers. Much of the programming
was based on the concept of a stored-program computer and many early programming
languages were extensions of this model of computation. In the stored-program
model the program and data are stored in memory. The program manipulates data
based on some input. It then produces output.
Around 1958, Algol was created and the second revision of this language, called
Algol 60, was the first modern, structured, imperative programming language. While
the language was designed by a committee, a large part of the success of the project
was due to the contributions of John Backus pictured in Fig. 1.2. He described the
structure of the Algol language using a mathematical notation that would later be
called Backus-Naur Format or BNF. Very little has changed with the underlying
computer architecture over the years. Of course, there have been many changes in
the size, speed, and cost of computers! In addition, the languages we use have become
even more structured over the years. But, the principles that Algol 60 introduced are
still in use today.
Recalling that most early computer scientists were mathematicians, it shouldn’t
be too surprising to learn that there were others that approached the problem of
programming differently. Much of the initial interest in computers was spurred by the

Fig. 1.2 John Backus [3]

1.1 Historical Perspective 5

invention of the stored-program computer and many of the early languages reflected
this excitement. The imperative style was closely tied to the architecture of a stored
program computer. Data was read from an input device and the program acted on
that data by updating memory as the program executed. There was another approach
developing at the same time. Back in 1936, Alonzo Church, a U.S. mathematician
who lived from 1903–1995, was also interested in the decision problem proposed
by David Hilbert. To try to solve the problem he devised a language called the
lambda calculus, usually written as the λ-calculus. Using his very simple language
he was able to describe computation as symbol manipulation. Alan Turing was a
doctoral student of Church and while they independently came up with two ways to
prove that the decision problem was not solvable, they later proved their two models
of computation, Turing machines and the λ-calculus, were equivalent. Their work
eventually led to a very important result called the Church-Turing Thesis. Informally,
the thesis states that all computable problems can be solved by a Turing Machine or
the λ-calculus. The two models are equivalent in power.
Ideas from the λ-calculus led to the development of Lisp by John McCarthy,
pictured in Fig. 1.3. The λ-calculus and Lisp were not designed based on the principle
of the stored-program computer. In contrast to Algol 60, the focus of these languages
was on functions and what could be computed using functions. Lisp was developed
around 1958, the same time that Algol 60 was being developed.
Logic is important both in Computer Science and Mathematics. Logicians were
also interested in solving problems in the early days of Computer Science. Many
problems in logic are expressed in the languages of propositional or predicate logic.
Of course, the development of logic goes all the way back to ancient Greece. Some
logicians of the 20th century were interested in understanding natural language and
they were looking for a way to use computers to solve at least some of the problems
related to processing natural language statements. The desire to use computers in
solving problems from logic led to the development of Prolog, a powerful program-
ming language based on predicate logic.

Fig. 1.3 John McCarthy [14]

6 1 Introduction

Practice 1.1 Find the answers to the following questions.

1. What are the origins of the three major computational models that early
computer scientists developed?
2. Who was Alan Turing and Alonzo Church and what were some of their
their contributions to Computer Science?
3. What idea did both John von Neumann and Alan Turing contribute to?
4. What notation did John Backus develop and what was one of its first uses?
5. What year did Alan Turing first propose the Turing machine and why?
6. What year did Alonzo Church first propose the λ-calculus and why?
7. Why are Eckert and Mauchly famous?
8. Why are the history of Mathematics and Computer Science so closely tied
together?

You can check your answer(s) in Section 1.7.1.

1.2 Models of Computation

While there is some controversy about who originally came up with the concept of
a stored program computer, John von Neumann is generally given credit for the idea
of storing a program as a string of 0’s and 1’s in memory along with the data used by
the program. Von Neumann’s architecture had very little structure to it. It consisted
of several registers and memory. The Program Counter (PC) register kept track of
the next instruction to execute. There were other registers that could hold a value or
point to other values stored in memory. This model of computation was useful when
programs were small. However, without additional structure, anything but a small
program would quickly get hard to manage. This was what was driving the need for
better and newer programming languages. Programmers needed tools that let them
organize their code so they could focus on problem solving instead of the details of
the hardware.

1.2.1 The Imperative Model

As programs grew in size it was necessary to provide the means for applying addi-
tional structure to them. In the early days a function was often called a sub-routine.
Functions, procedures, and sub-routines were introduced by languages like Algol
60 so that programs could be decomposed into simpler sub-programs, providing
a way for programmers to organize their code. Terms like top-down or bottom-
up design were used to describe this process of subdividing programs into simpler

www.allitebooks.com
1.2 Models of Computation 7

Fig. 1.4 Imperative model

sub-programs. This process of subdividing programs was often called structured

programming, referring to the decomposition of programs into simpler, more man-
ageable pieces. Most modern languages provide the means to decompose problems
into simpler subproblems. We often refer to this structured approach as the imperative
model of programming.
To implement functions and function calls in the von Neumann architecture, it
was necessary to apply some organization to the data of a program. In the imperative
model, memory is divided into regions which hold the program and the data. The
data area is further subdivided into the static or global data area, the run-time stack,
and the heap as pictured in Fig. 1.4.
In the late 1970s and 1980s people like Niklaus Wirth and Bjarne Stroustrup were
interested in developing languages that supported an additional level of organization
called Object-Oriented Programming, often abbreviated OOP. Object-oriented pro-
gramming still uses the imperative model of programming. The addition of a means
to describe classes of objects gives programmers another way of organizing their
code into functions that are related to a particular type of object.
When a program executes it uses a special register called the stack pointer (SP)
to point to the top activation record on the run-time stack. The run-time stack
contains one activation record for each function or procedure invocation that is
currently unfinished in the program. The top activation record corresponds to the
current function invocation. When a function call is made an activation record is
pushed onto the run-time stack. When a function returns, the activation record
is popped by decrementing the stack pointer to point to the previous activation
record.
An activation record contains information about the currently executing function.
The local variables of the function are stored there. The program counter’s value
before the function call was made is stored there. This is often called the return
address. Other state information may also be stored there depending on the language
and the details of the underlying von Neumann architecture. For instance, parameters
passed to the function may also be stored there.
8 1 Introduction

Static or global data refers to data and functions that are accessible globally in
the program. Global data and functions are defined throughout the program. Where
global data is stored depends on the implementation of the compiler or interpreter. It
might be part of the program code in some instances. In any case, this area is where
constants, global variables, and possibly built-in globally accessible functions are
stored.
The heap is an area for dynamic memory allocation. The word dynamic means
that it happens while the program is running. All data that is created at run-time is
located in the heap. The data in the heap has no names associated with the values
stored there. Instead, named variables called pointers or references point to the data
in the heap. In addition, data in the heap may contain pointers that point to other data
in the heap.
Like the original von Neumann architecture, the primary goal of the imperative
model is to get data as input, transform it via updates to memory, and then produce
output based on this imperatively changed data. The imperative model of computation
parallels the underlying von Neumann architecture and is used by many modern
languages. Some variation of this model is used by languages like Algol 60, C++,
C, Java, VB.net, Python, and many other languages.

Practice 1.2 Find the answers to the following questions.

1. What are the three divisions of data memory called?

2. When does an item in the heap get created?
3. What goes in an activation record?
4. When is an activation record created?
5. When is an activation record deleted?
6. What is the primary goal of imperative, object-oriented programming?

You can check your answer(s) in Section 1.7.2.

1.2.2 The Functional Model

In the functional model the goal of a program is slightly different. This slight change
in the way the model works has a big influence on how you program. In the functional
model of computation the focus is on function calls. Functions and parameter passing
are the primary means of accomplishing data transformation.
Data is generally not changed in the functional model. Instead, new values are
constructed from old values. A pure functional model wouldn’t allow any updates
to existing values. However, most functional languages allow limited updates to
memory in the imperative style.
The conceptual view presented in Fig. 1.4 is similar to the view in the functional
world. However, the difference between program and data is eliminated. A function
1.2 Models of Computation 9

is data like any other data element. Integers and functions are both first-class citizens
of the functional world.
The static data area is still present, but takes on a minor role in the functional model.
The run-time stack becomes more important because most work is accomplished
by calling functions. Functional languages are much more careful about how they
allow programmers to access the heap and as a result, you really aren’t aware of
the heap when programming in a functional language. Data is certainly dynamically
allocated, but once data is created on the heap it is not modified in a pure functional
model. Impure models might allow some modification of storage but this is the
influence of imperative languages creeping into the functional model as a way to
deal with performance issues. The result is that you spend less time thinking about
the underlying architecture when programming in a functional language.
Examples of functional languages include Standard ML, which is covered in
this text. Lisp, Scheme, Haskell, Caml, and Scala are all examples of functional
languages. Functional languages may be pure, which means they do not support
variable updates like the imperative model. Scheme is a pure functional language.
Most functional languages are not pure. Standard ML and Lisp are examples of
impure functional languages. Scala is a recent functional language that also supports
object-oriented programming.

Practice 1.3 Answer the following questions.

1. What are some examples of functional languages?

2. What is the primary difference between the functional and imperative mod-
els?
3. Immutable data is data that cannot be changed once created. The presence
of immutable data simplifies the conceptual model of programming. Does
the imperative or functional model emphasize immutable data?

You can check your answer(s) in Section 1.7.3.

1.2.3 The Logic Model

The logic model of computation, pictured in Fig. 1.5, is quite different from either the
imperative or functional model. In the logic model the programmer doesn’t actually
write a program at all. Instead, the programmer provides a database of facts or rules.
From this database, a single program tries to answer questions with a yes or no
answer. In the case of Prolog, the program acts in a predictable manner allowing
the programmer to provide the facts in an order that determines how the program
will work. The actual implementation of this conceptual view is accomplished by a
virtual machine, a technique for implementing languages that is covered later in this
text.
10 1 Introduction

Fig. 1.5 Logic model of computation

There is still the concept of a heap in Prolog. One can assert new rules and retract
rules as the program executes. To dynamically add rules or retract them there must
be an underlying heap. In fact, the run-time stack is there too. However, the run-time
stack and heap are so hidden in this view of the world that it is debatable whether
they should appear in the conceptual model at all.

Practice 1.4 Answer these questions on what you just read.

1. How many programs can you write in a logic programming language like
Prolog?
2. What does the programmer do when writing in Prolog?
You can check your answer(s) in Section 1.7.4.

1.3 The Origins of a Few Programming Languages

This book explores language implementation using several small languages and
exercises that illustrate each of these models of computation. In addition, exercises
within the text will require implementation in four different languages: assembly
language, C++, Standard ML, and Prolog. But where did these languages come from
and why are we interested in learning how to use them?

1.3.1 A Brief History of C++

C++ was designed by Bjarne Stroustrup, pictured in Fig. 1.6, between 1980 and
1985 while working at Bell Labs. C++ was designed as a superset of C which was
an immensely popular language in the seventies and eighties and still is today. In
C, the ++ operator is called the increment operator. It adds one to the variable that
precedes it. C++ was the next increment after C.
1.3 The Origins of a Few Programming Languages 11

Fig. 1.6 Bjarne Stroustrup [18]

In 1972, the Unix operating system was written in C, which was the reason the
language was created. Ken Thompson was working on the design of Unix with
Dennis Ritchie. It was their project that encouraged Ritchie to create the C language.
C was more structured than the assembly language most operating systems were
written in at the time and it was portable and could be compiled to efficient machine
code. Thompson and Ritchie wanted an operating system that was portable, small,
and well organized.
While C was efficient, there were other languages that had either been developed or
were being developed that encouraged a more structured approach to programming.
For several years there had been ideas around about how to write code in object-
oriented form. Simula, created by Ole-Johan Dahl and Kristen Nygaard around 1967,
was an early example of a language that supported Object-Oriented design and
Modula-2, created by Niklaus Wirth around 1978, was also taking advantage of
these ideas. Smalltalk, an interpreted language, was object-oriented and was also
developed in the mid 1970s and released in 1980.
In 1980 Bjarne Stroustrup began working on the design of C++ as a language that
would allow C programmers to keep their old code while allowing new code to be
written using these Object-Oriented concepts. In 1983 he named this new language
C++ and with much anticipation, in 1985 the language was released. About the same
time Dr. Stroustrup released a book called The C++ Programming Language [19],
which described the language. The language is still evolving. For instance, templates,
an important part of C++ were first described by Stroustrup in 1988 [17] and it wasn’t
until 1998 that it was standardized as ANSI C++. Today an ANSI committee oversees
the continued development of C++. The latest C++ standard was released in 2011
as of this writing. The previous standard was released in 1998. C++ is a mature
language, but is still growing and evolving.

1.3.2 A Brief History of Python

Python was designed and implemented by Guido van Rossum, pictured in Fig. 1.7.
He started Python as a hobby project during the winter months of 1989. A more
complete history of this language is available on the web at http://python-history.
blogspot.com. Python is an object-oriented language like C++. Unlike C++, Python
is an interpreted language. Mr. van Rossum designed Python’s interpreter as a virtual
12 1 Introduction

Fig. 1.7 Guido van Rossum [24]

machine. Virtual machines have been around for some time including an operating
system for IBM mainframe computer called VM. Using a virtual machine when
implementing a programming language can make the language and its programs more
portable across platforms. Python runs on many different platforms like Apple’s Mac
OS X, Linux, and Microsoft Windows. Virtual machines can also provide services
that make language implementation easier.
Programmers world-wide have embraced Python and have developed many
libraries for Python and written many programs. Python has gained popularity among
developers because of its portability and the ability to provide libraries to others.
Guido van Rossum states in his history of Python, “A large complex system should
have multiple levels of extensibility. This maximizes the opportunities for users,
sophisticated or not, to help themselves.” Extensibility refers to the abililty to define
libraries of classes to solve problems from many different application areas. Python
is used in internet programming, server scripting, computer graphics, visualization,
Mathematics, Computer Science education, and many, many other application areas.
Mr. van Rossum continues, saying “In many ways, the design philosophy I used
when creating Python is probably one of the main reasons for its ultimate success.
Rather than striving for perfection, early adopters found that Python worked “well
enough” for their purposes. As the user-base grew, suggestions for improvement were
gradually incorporated into the language.” Growing the user-base has been key to the
success of Python. As the number of programmers that know Python has increased so
has interest in improving the language. Python now has two major versions, Python 2
and Python 3. Python 3 is not backward compatible with Python 2. This break in
compatibility gave the Python developers an opportunity to make improvements in
the language. Chapters 3 and 4 cover the implementation of the Python programming
language.

1.3.3 A Brief History of Standard ML

Standard ML originated in 1986, but was the follow-on of ML which originated in

1973 [16]. Like many other languages, ML was implemented for a specific purpose.
The ML stands for Meta Language. Meta means above or about. So a metalanguage
1.3 The Origins of a Few Programming Languages 13

Fig. 1.8 Robin Milner [15]

is a language about language. In other words, a language used to describe a language.

ML was originally designed for a theorem proving system. The theorem prover was
called LCF, which stands for Logic for Computable Functions. The LCF theorem
prover was developed to check proofs constructed in a particular type of logic first
proposed by Dana Scott in 1969 and now called Scott Logic. Robin Milner, pictured
in Fig. 1.8, was the principal designer of the LCF system. Milner designed the first
version of LCF while at Stanford University. In 1973, Milner moved to Edinburgh
University and hired Lockwood Morris and Malcolm Newey, followed by Michael
Gordon and Christopher Wadsworth, as research associates to help him build a new
and better version called Edinburgh LCF [9].
For the Edinburgh version of LCF, Dr. Milner and his associates created the ML
programming language to allow proof commands in the new LCF system to be
extended and customized. ML was just one part of the LCF system. However, it
quickly became clear that ML could be useful as a general purpose programming
language. In 1990 Milner, together with Mads Tofte and Robert Harper, published
the first complete formal definition of the language; joined by David MacQueen,
they revised this standard to produce the Standard ML that exists today [16].
ML was influenced by Lisp, Algol, and the Pascal programming languages. In
fact, ML was originally implemented in Lisp. There are now two main versions of
ML: Moscow ML and Standard ML. Today, ML’s main use is in academia in the
research of programming languages. But, it has been used successfully in several
other types of applications including the implementation of the TCP/IP protocol
stack [4] and a web server as part of the Fox Project. A goal of the Fox Project was
the development of system software using advanced programming languages [10].
An important facet of ML is the strong type checking provided by the language.
The type inference system, commonly called Hindley-Milner type inference, sta-
tically checks the types of all expressions in the language. In addition, the type
checking system is polymorphic, meaning that it handles types that may contain
type variables. The polymorphic type checker is sound. It will never say a program
is typed correctly when it is not. Interestingly, the type checker has also been proven
complete, which means that all correctly typed programs will indeed pass the type
checker. No correctly typed program will be rejected by the type checker. We expect
soundness out of type checkers but completeness is much harder to prove and it has
been proven for Standard ML. Important ML features include:
14 1 Introduction

• ML is higher-order supporting functions as first-class values. This means functions

may be passed as parameters to functions and returned as values from functions.
• The strong type checking means it is pretty infrequent that you need to debug your
code. What a great thing!
• Pattern-matching is used in the specification of functions in ML. Pattern-matching
is convenient for writing recursive functions.
• The exception handling system implemented by Standard ML has been proven type
safe, meaning that the type system encompasses all possible paths of execution in
an ML program.

ML is a very good language to use in learning to implement other languages.

It includes tools for automatically generating parts of a language implementation
including components called a scanner and a parser which are introduced in Chap. 6.
These tools, along with the polymorphic strong type checking provided by Standard
ML, make implementing a compiler or interpreter a much easier task. Much of the
work of implementing a program in Standard ML is spent in making sure all the
types in the program are correct. This strong type checking often means that once a
program is properly typed it will run the first time. This is quite a statement to make,
but nonetheless it is often true.

1.3.4 A Brief History of Prolog

Prolog was developed in 1972 by Alain Colmerauer with Philippe Roussel (Fig. 1.9).
Colmerauer and Roussel and their research group had been working on natural lan-
guage processing for the French language and were studying logic and automated
theorem proving [7] to answer simple questions in French. Their research led them
to invite Robert Kowalski, pictured in Fig. 1.10, who was working in the area of logic
programming and had devised an algorithm called SL-Resolution, to work with them
in the summer of 1971 [11, 28]. Colmerauer and Kowalski, while working together
in 1971, discovered a way formal grammars could be written as clauses in predicate
logic. Colmerauer soon devised a way that logic predicates could be used to express

Fig. 1.9 Alain Colmerauer [6]

1.3 The Origins of a Few Programming Languages 15

Fig. 1.10 Robert Kowalski [12]

grammars that would allow automated theorem provers to parse natural language
sentences efficiently. This is covered in some detail in Chap. 7.
In the summer of 1972, Kowalski and Colmerauer worked together again and
Kowalski was able to describe the procedural interpretation of what are known as
Horn Clauses. Much of the debate at the time revolved around whether logic pro-
gramming should focus on procedural representations or declarative representations.
The work of Kowalski showed how logic programs could have a dual meaning, both
procedural and declarative.
Colmerauer and Roussel used this idea of logic programs being both declarative
and procedural to devise Prolog in the summer and fall of 1972. The first large
Prolog program, which implemented a question and answering system in the French
language, was written in 1972 as well.
Later, the Prolog language interpreter was rewritten at Edinburgh to compile
programs into DEC-10 machine code. This led to an abstract intermediate form
that is now known as the Warren Abstract Machine or WAM. WAM is a low-level
intermediate representation that is well-suited for representing Prolog programs.
The WAM virtual machine can be (and has been) implemented on a wide variety
of hardware. This means that Prolog implementations exist for most computing
platforms.

Practice 1.5 Answer the following questions.

1. Who invented C++? C? Standard ML? Prolog? Python?

2. What do Standard ML and Prolog’s histories have in common?
3. What do Prolog and Python have in common?
4. What language or languages is Standard ML based on?
You can check your answer(s) in Section 1.7.5.
16 1 Introduction

1.4 Language Implementation

There are three ways that languages can be implemented.

• A language can be interpreted.

• A language can be compiled to a machine language.
• A language can be implemented by some combination of the first two methods.

Computers are only capable of executing machine language. Machine language is

the language of the Central Processing Unit (CPU) and is very simple. For instance,
typical instructions are fetch this value into the CPU, store this value into memory
from the CPU, add these two values together, and compare these two values and if they
are equal, jump here next. The goal of any programming language implementation
is to translate a source program into this simpler machine language so it can be
executed by the CPU. The overall process is pictured in Fig. 1.11.

Fig. 1.11 Language implementation

www.allitebooks.com
1.4 Language Implementation 17

All language implementations translate a source program to some intermediate

representation before translating the intermediate representation to machine lan-
guage. Exactly how these two translations are packaged varies significantly from
one programming language to the next, but luckily most language implementations
follow one of a few methodologies. The following sections will present some case
studies of different languages so you can see how this translation is accomplished
and packaged.

1.4.1 Compilation

The most direct method of translating a program to machine language is called

compilation. The process is shown in Fig. 1.12. A compiler is a program that internally
is composed of several parts. The parser reads a source program and translates it
into an intermediate form called an abstract syntax tree (AST ). An AST is a tree-
like data structure that internally represents the source program. We’ll read about

Fig. 1.12 The compilation process

18 1 Introduction

abstract syntax trees in later chapters. The code generator then traverses the AST
and produces another intermediate form called an assembly language program. This
program is not machine language, but it is much closer. Finally, an assembler and
linker translate an assembly language program to machine language making the
program ready to execute.
This whole process is encapsulated by a tool called a compiler. In most instances,
the assembler and linker are separate from the compiler, but normally the com-
piler runs the assembler and linker automatically when a program is compiled so
as programmers we tend to think of a compiler compiling our programs and don’t
necessarily think about the assembly and link phases.
Programming in a compiled language is a three-step process.

• First, you write a source program.

• Then you compile the source program, producing an executable program.
• Then you run the executable program.

When you are done, you have a source program and an executable program that
represent the same computation, one in the source language, the other in machine
language. If you make further changes to the source program, the source program and
the machine language program are not in sync. After making changes to the source
program you must remember to recompile before running the executable program
again.
Machine language is specific to a CPU architecture and operating system.
Compiling a source program on Linux means it will run on most Linux machines
with a similar CPU. However, you cannot take a Linux executable and put it on a
Microsoft Windows machine and expect it to run, even if the two computers have the
same CPU. The Linux and Windows operating systems each have their own format
for executable machine language programs. In addition, compiled programs use
operating system services for printing, reading input, and doing other Input/Output
(I/O) operations. These services are invoked differently between operating systems.
Languages like C++ hide these implementation details from you in the code gener-
ator, but the end result is that a program compiled for one operating system will not
work on another operating system without being recompiled.
C, C++, Pascal, Fortran, COBOL and many others are typically compiled lan-
guages. On the Linux operating system the C compiler is called gcc and the C++
compiler is called g++. The g in both names reflects the fact that both compilers
come out of the GNU project and the Free Software Foundation. Linux, gcc, and
g++ are freely available to anyone who wants to download them. The best way to
get these tools is to download a Linux distribution and install it on a computer. The
gcc and g++ compilers come standard with Linux.
There are implementations of C and C++ for many other platforms. The web site
http://gcc.gnu.org contains links to source code and to prebuilt binaries for the g++
compiler. You can also download C++ compilers from Apple and Microsoft. For
Mac OS X computers you can get C++ by downloading the XCode Developer Tools.
You can also install g++ and gcc for Mac OS X computers using a tool called brew.
1.4 Language Implementation 19

If you run Microsoft Windows you can install Visual C++ Express from Microsoft.
It is free for educational use.

1.4.2 Interpretation

An interpreter is a program that is written in some other language and compiled

into machine language. The interpreter itself is the machine language program. The
interpreter is written to read source programs from the interpreted language and in-
terpret them. For instance, Python is an interpreted language. The Python interpreter
is written in C and is compiled for a particular platform like Linux, Mac OS X, or
Microsoft Windows. To run a Python program, you must download and install the
Python interpreter that goes with your operating system and CPU.
When you run an interpreted source program, as depicted in Fig. 1.13, you are
actually running the interpreter. Your program is not running because your program
is never translated to machine language. The interpreter is the machine language

Fig. 1.13 The interpretation process

20 1 Introduction

program that executes all the programs you write in the interpreted language. The
source program you write controls the behavior of the interpreter program.
Programming in an interpreted language is a two step process.

• First you write a source program.

• Then you execute the source program with the interpreter.

Each time your program is executed it is translated into an AST by a part of

the interpreter called the parser. There may be an additional step that translates the
AST to some lower-level representation, often called bytecode. In an interpreter, this
lower-level representation is still internal to the interpreter program. Then a part of
the interpreter, often called a virtual machine, executes the byte code instructions.
Not every interpreter translates the AST to bytecode. Sometimes the interpreter
directly interprets the AST but it is often convenient to translate the source program’s
AST to some simpler representation before executing it.
Eliminating the compile step has a few implications.

• Since you have one less step in development you may be encouraged to run your
code more fequently during development. This is a generally a good thing and can
shorten the development cycle.
• Secondly, because you don’t have an executable version of your code, you don’t
have to manage the two versions. You only have a source code program to keep
track of.
• Finally, because the source code is not platform dependent, you can usually easily
move your program between platforms. The interpreter insulates your program
from platform dependencies.

Of course, source programs for compiled languages are generally platform

independent too. But, they must be recompiled to move the executable program
from one platform to another. The interpreter itself isn’t platform independent. There
must be a version of an interpreter for each platform/language combination. So there
is a Python interpreter for Linux, another for Microsoft Windows, and yet another
for Mac OS X. Thankfully, because the Python interpreter is written in C the same
Python interpreter program can be compiled (with some small differences) for each
platform.
There are many interpreted languages available including Python, Ruby, Standard
ML, Unix scripting languages like Bash and Csh, Prolog, and Lisp. The portability of
interpreted languages has made them very popular among programmers, especially
when writing code that needs to run across multiple platforms.
One huge problem that has driven research into interpreted languages is that
of heap memory management. Recall that the heap is the place where memory is
dynamically allocated. Large C and C++ programs are notorious for having memory
leaks. Every time a C++ programmer reserves some space on the heap he/she must
remember to free that space. If they don’t free the space when they are done with it the
space will never be available again while the program continues to execute. The heap
1.4 Language Implementation 21

is a big space, but if a program runs long enough and continues to allocate and not
free space, eventually the heap will fill up and the program will terminate abnormally.
In addition, even if the program doesn’t terminate abnormally, the performance of
the system will degrade as more and more time is spent managing the large heap
space.
Many interpreted languages don’t require programmers to free space on the heap.
Instead, there is a special task or thread that runs periodically as part of the interpreter
to check the heap for space that can be freed. This task is called the garbage collector.
Programmers can allocate space on the heap but don’t have to be worried about
freeing that space. For a garbage collector to work correctly space on the heap has to
be allocated and accessed in the right way. Many interpreted languages are designed
to insure that a garbage collector will work correctly.
The disadvantage of an interpreted language is in speed of execution. Interpreted
programs typically run slower than compiled programs. In a compiled program,
parsing and code generation happen once when the program is compiled. When
running an interpreted program, parsing and code generation happen each time the
program is executed. In addition, if an application has real-time dependencies then
having the garbage collector running at more or less random intervals may not be
desirable. As you’ll read in the next section some steps have been taken to reduce
the difference in execution time between compiled and interpreted languages.

1.4.3 Virtual Machines

The advantages of interpretation over compilation are pretty significant. It turns out
that one of the biggest advantages is the portability of programs. It’s nice to know
when you invest the time in writing a program that it will run the same on Linux,
Microsoft Windows, Mac OS X, or some other operating system. This portability
issue has driven a lot of research into making interpreted programs run as fast as
compiled languages.
As discussed earlier in this chapter, the concept of a virtual machine has been
around quite a while. A virtual machine is a program that provides insulation from
the actual hardware and operating system of a machine while supplying a consistent
implementation of a set of low-level instructions, often called bytecode. Figure 1.14
shows how a virtual machine sits on top of the operating system/CPU to act as this
insulator.
There is no one specification for bytecode instructions. They are specific to the
virtual machine being defined. Python has a virtual machine buried within the inter-
preter. Prolog is another interpreter that uses a virtual machine as part of its imple-
mentation. Some languages, like Java have taken this idea a step further. Java has a
virtual machine that executes bytecode instructions as does Python. The creators of
Java separated the virtual machine from the compiler. Instead of storing the bytecode
instructions internally as in an interpreter, the Java compiler, called javac, compiles
a Java source code program to a bytecode file. This file is not machine language
so it cannot be executed directly on the hardware. It is a Java bytecode file which
22 1 Introduction

Fig. 1.14 Virtual machine implementation

is interpreted by the Java virtual machine, called java in the Java set of tools. Java
bytecode files all end with a .class extension. You may have noticed these files at
some point after compiling a Java program.
Programs written using a hybrid language like Java are compiled. However, the
compiled bytecode program is interpreted. Source programs in the language are not
interpreted directly. By adding this intermediate step the interpreter can be smaller
and faster than traditional interpreters. Very little parsing needs to happen to read
the program and executing the program is straightforward because each bytecode
instruction usually has a simple implementation.
Languages that fall into this virtual machine category include Java, ML, Python,
C#, Visual Basic .NET, JScript, and other .NET platform languages. You might notice
that Standard ML and Python were included as examples of interpreted languages.
Both ML and Python include interactive interpreters as well as the ability to compile
and run low-level bytecode programs. Python bytecode files are named with a .pyc
extension. Standard ML compiled files are named with a -platform as the last part of
the compiled file name. In the case of Python and Standard ML the virtual machine
1.4 Language Implementation 23

is not a separate program. Both interpreters are written to recognize a bytecode file
and execute it just like a source program.
Java and the .NET programming environments do not include interactive inter-
preters. The only way to execute programs with these platforms is to compile the
program and then run the compiled program using the virtual machine. Programs
written for the .NET platform run under Microsoft Windows and in some cases
Linux. Microsoft submitted some of the .NET specifications to the ISO to allow
third party software companies to develop support for .NET on other platforms. In
theory all .NET programs are portable like Java, but so far implementations of the
.NET framework are not as generally available as Java. The Java platform has been
implemented and released on all major platforms. In fact, in November 2006 Sun, the
company that created Java, announced they were releasing the Java Virtual Machine
and related software under the GNU Public License to encourage further develop-
ment of the language and related tools. Since then the rights to Java have now been
purchased by Oracle where it continues to be supported.
Java and .NET language implementations maintain backwards compatibility of
their virtual machines. This means that a program compiled for an earlier version of
Java or .NET will continue to run on newer implementations of the language’s virtual
machine. In contrast, Python’s virtual machine is regarded as an internal design issue
and does not maintain backwards compatibility. A .pyc file compiled for one version
of Python will not run on a newer version of Python. This distinction makes Python
more of an interpreted language, while Java and .NET languages are truly virtual
machine implementations.
Maintaining backwards compatibility of the virtual machine means that program-
mers can distribute application for Java and .NET implementations without releasing
their source code. .NET and Java applications can be distributed while maintain-
ing privacy of the source code. Since intellectual property is an important asset of
companies, the abililty to distribute programs in binary form is important. The de-
velopment of virtual machines made memory management and program portability
much easier in languages like Java, Standard ML, and the various .NET languages
while also providing a means for programmers to distribute programs in binary for-
mat so source code could be kept private.

1.5 Chapter Summary

The history of languages is fascinating and a lot more detail is available than was
covered in this chapter. There are many great resources on the web where you
can get more information. Use Google or Wikipedia and search for “History of
your_favorite_language” as a place to begin. However, be careful. You can’t believe
everything you read on the web and that includes Wikipedia. While the web is a great
source, you should always research your topic enough to independently verify the
information you find there.
24 1 Introduction

While learning new languages and studying programming language implementa-

tion it becomes important to understand models of computation. A compiler translates
a high-level programming language into a lower level computation. These low-level
computations are usually expressed in terms of machine language but not always.
More important than the actual low-level language is the model of computation.
Some models are based on register machines. Some models are based on stack
machines. Still other models may be based on something entirely different. Chapters
3 and 4 explore stack-based virtual machines in much more detail.
The next chapter provides the foundations for understanding how the syntax of
a language is formally defined by a grammar. Then Chap. 3 introduces a Python
Virtual Machine implementation called CoCo. CoCo is an interpreter of Python
bytecode instructions. Chapter 3 introduces assembly language programming using
CoCo, providing some insight into how programming languages are implemented.
Subsequent chapters in the book will again look at language implementation to
better understand the languages you are learning, their strengths and weaknesses.
While learning these languages you will also be implementing a compiler for a high
level functional language called Small which is a robust subset of Standard ML. This
will give you even more insight into language implementation and knowledge of
how to use these languages to solve problems.
Finally, in the last two chapters of this text, you will learn about type checking
and type inference using Prolog, a language that is well-suited to logic problems like
type inference. Learning how to use Prolog and implement a type checker is a great
way to cap off a text on programming languages and language implementation.
A great way to summarize the rest of this text is to see it moving from very
prescriptive approaches to programming to very descriptive approaches to program-
ming. The word prescriptive means that you dwell on details, thinking very carefully
about the details of what you are writing. For instance, in a prescriptive approach
you might ask yourself, how do you set things up to invoke a particular type of
instruction?
Descriptive programming relies on your describing relationships between things.
Prolog is a descriptive programming language. In fact, this entire text moves from
very prescriptive programming to increasingly descriptive approaches. Read on to
begin that journey!

1.6 Review Questions

1. What are the three ways of thinking about programming, often called program-
ming paradigms?
2. Name at least one language for each of the three methods of programming
described in the previous question?
3. Name one person who had a great deal to do with the development of the im-
perative programming model. Name another who contributed to the functional
model. Finally, name a person who was responsible for the development of the
logic model of programming?
1.6 Review Questions 25

4. What are the primary characteristics of each of the imperative, functional, and
logic models?
5. Who are the main people involved in each of the four languages this text covers:
C++, Python, Standard ML, and Prolog?
6. Where are the people you mentioned in the previous question today? What do
they do now?
7. Why is compiling a program preferred over interpreting a program?
8. Why is interpreting a program preferred over compiling a program?
9. What benefits do virtual machine languages have over interpreted languages?
10. What is a bytecode program? Name two languages that use bytecode in their
implementation.

1.7 Solutions to Practice Problems

These are solutions to the practice problems. You should only consult these answers
after you have tried each of them for yourself first. Practice problems are meant to
help reinforce the material you have just read so make use of them.

1.7.1 Solution to Practice Problem 1.1

1. The origins of the three models are the Turing Machine, the λ-calculus, and
propositional and predicate logic.
2. Alan Turing as a Ph.D. student of Alonzo Church. Alan Turing developed the
Turing Machine and Alonzo Church developed the λ-calculus to answer prove
there were somethings that are not computable. They later proved the two
approaches were equivalent in their power to express computation.
3. Both von Neumann and Turing contributed to the idea of a stored-program com-
puter.
4. Backus developed BNF notation which was used in the development of Algol 60.
5. 1936 was a big year for Computer Science.
6. So was 1946. That was the year ENIAC was unveiled. Eckert and Mauchly
designed and built ENIAC.
7. The problems in Mathematics were growing complex enough that many
mathematicians were developing models and languages for expressing their
algorithms. This was one of the driving factors in the development of computers
and Computer Science as a discipline.

1.7.2 Solution to Practice Problem 1.2

1. The run-time stack, global memory, and the heap are the three divisions of data
memory.
26 1 Introduction

2. Data on the heap is created at run-time.

3. An activation record holds information like local variables, the program counter,
the stack pointer, and other state information necessary for a function invocation.
4. An activation record is created each time a function is called.
5. An activation record is deleted when a function returns.
6. The primary goal of imperative, object-oriented programming is to update mem-
ory by updating variables and/or objects as the program executes. The primary
operation is memory updates.

1.7.3 Solution to Practice Problem 1.3

1. Functional languages include Standard ML, Lisp, Haskell, and Scheme.

2. In the imperative model the primary operation revolves around updating memory
(the assignment statement). In the functional model the primary operation is
function application.
3. The functional model emphasizes immutable data. However, some imperative
languages have some immutable data as well. For instance, Java strings are im-
mutable.

1.7.4 Solution to Practice Problem 1.4

1. You never write a program in Prolog. You write a database of rules in Prolog that
tell the single Prolog program (depth first search) how to proceed.
2. The programmer provides a database of facts and predicates that tell Prolog
about a problem. In Prolog the programmer describes the problem instead of
programming the solution.

1.7.5 Solution to Practice Problem 1.5

1. C++ was invented by Bjourne Stroustrup. C was created by Dennis Ritchie. Stan-
dard ML was primarily designed by Robin Milner. Prolog was designed by Alain
Colmerauer and Philippe Roussel with the assistance of Robert Kowalski. Python
was created by Guido van Rossum.
2. Standard ML and Prolog were both designed as languages for automated theorem
proving first. Then they became general purpose programming languages later.
3. Both Python and Prolog run on virtual machine implementations. Python’s virtual
machine is internal to the interpreter. Prolog’s virtual machine is called WAM
(Warren Abstract Machine).
4. Standard ML is influenced by Lisp, Pascal, and Algol.

www.allitebooks.com
Syntax
2

Once you’ve learned to program in one language, learning a similar programming

language isn’t all that hard. But, understanding just how to write in the new language
takes looking at examples or reading documentation to learn its details. In other
words, you need to know the mechanics of putting a program together in the new
language. Are the semicolons in the right places? Do you use begin … end or do
you use curly braces (i.e. { and }). Learning how a program is put together is called
learning the syntax of the language. Syntax refers to the words and symbols of a
language and how to write the symbols down in some meaningful order.
Semantics is the word that is used when deriving meaning from what is written.
The semantics of a program refers to what the program will do when it is executed.
Informally it is much easier to say what a program does than to describe the syntactic
structure of the program. However, syntax is a lot easier to formally describe than
semantics. In either case, if you are learning a new language, you need to learn
something about both the syntax and semantics of the language.

2.1 Terminology

Once again, the syntax of a programming language determines the well-formed or

grammatically correct programs of the language. Semantics describes how or whether
such programs will execute.

• Syntax is how things look

• Semantics is how things work

Many questions we might like to ask about a program either relate to the syntax
of the language or to its semantics. It is not always clear which questions pertain to
syntax and which pertain to semantics. Some questions may concern semantic issues
that can be determined statically, meaning before the program is run. Other semantic
issues may be dynamic issues, meaning they can only be determined at run-time.

© Springer International Publishing Switzerland 2014 27

K.D. Lee, Foundations of Programming Languages,
Undergraduate Topics in Computer Science, DOI 10.1007/978-3-319-13314-0_2
28 2 Syntax

The difference between static semantic issues and syntactic issues is sometimes a
difficult distinction to make.
The code
a=b+c;

is correct syntax in many languages. But is it a correct C++ statement?

1. Do b and c have values?

2. Have b and c been declared as a type that allows the + operation? Or, do the
values of b and c support the + operation?
3. Is a assignment compatible with the result of the expression b+c?
4. Does the assignment statement have the proper form?

There are lots of questions that need to be answered about this assignment state-
ment. Some questions could be answered sooner than others. When a C++ program
is compiled it is translated from C++ to machine language as described in the pre-
vious chapter. Questions 2 and 3 are issues that can be answered when the C++
program is compiled. However, the answer to the first question might not be known
until the C++ program executes in some cases. The answers to questions 2 and 3
can be answered at compile-time and are called static semantic issues. The answer
to question 1 is a dynamic issue and is probably not determinable until run-time. In
some circumstances, the answer to question 1 might also be a static semantic issue.
Question 4 is definitely a syntactic issue.
Unlike the dynamic semantic issues, the correct syntax of a program is statically
determinable. Said another way, determining a syntactically valid program can be
accomplished without running the program. The syntax of a programming language
is specified by a grammar. But before discussing grammars, the parts of a grammar
must be defined. A terminal or token is a symbol in the language.

• C++ and Python terminals: while, for, (, ;, 5, b

• Terminal types are keywords, operators, numbers, identifiers, etc.

A syntactic category or nonterminal is a set of phrases, or strings of tokens,

that will be defined in terms of symbols in the language (terminal and nonterminal
symbols).

• C++ nonterminals: statement, expression, if-statement, etc.

• Syntactic categories define parts of a program like statements, expressions, decla-
rations, and so on.

A metalanguage is a higher-level language used to specify, discuss, describe, or

analyze another language. English is used as a metalanguage for describing pro-
gramming languages, but because of the ambiguities in English, more formal meta-
languages have been developed. The next section describes a formal metalanguage
for describing programming language syntax.
2.2 Backus Naur Form (BNF) 29

2.2 Backus Naur Form (BNF)

Backus Naur Format (i.e. BNF) is a formal metalanguage for describing language
syntax. The word formal is used to indicate that BNF is unambiguous. Unlike English,
the BNF language is not open to our own interpretations. There is only one way to
read a BNF description.
BNF was used by John Backus to describe the syntax of Algol in 1963. In 1960,
John Backus and Peter Naur, a computer magazine writer, had just attended a confer-
ence on Algol. As they returned from the trip it became apparent that they had very
different views of what Algol would look like. As a result of this discussion, John
Backus worked on a method for describing the grammar of a language. Peter Naur
slightly modified it. The notation is called BNF, or Backus Naur Form or sometimes
Backus Normal Form. BNF consists of a set of rules that have this form:

<syntactic category> ::= a string of terminals and nonterminals

The symbol ::= can be read as is composed of and means the syntactic category
is the set of all items that correspond to the right hand side of the rule.
Multiple rules defining the same syntactic category may be abbreviated using the |
character which can be read as “or” and means set union. That is the entire language.
It’s not a very big metalanguage, but it is powerful.

2.2.1 BNF Examples

Here are a couple BNF examples from Java.

<primitive-type> ::= boolean
<primitive-type> ::= char

BNF syntax is often abbreviated when there are multiple similar rules like these
primitive type rules. Whether abbrieviated or not, the meaning is the same.
<primitive-type> ::= boolean | char | byte | short | int | long | float | ...
<argument-list> ::= <expression> | <argument-list> , <expression>
<selection-statement> ::=
if ( <expression> ) <statement> |
if ( <expression> ) <statement> else <statement> |
switch ( <expression> ) <block>
<method-declaration> ::=
<modifiers> <type-specifier> <method declarator> <throws-clause> <method-body> |
<modifiers> <type-specifier> <method-declarator> <method-body> |
<type-specifier> <method-declarator> <throws-clause> <method-body> |
<type-specifier> <method-declarator> <method-body>

This description can be described in English: The set of method declarations is

the union of the sets of method declarations that explicitly throw an exception with
those that don’t explicitly throw an exception with or without modifiers attached to
their definitions. The BNF is much easier to understand and is not ambiguous like
this English description.
30 2 Syntax

2.2.2 Extended BNF (EBNF)

Since a BNF description of the syntax of a programming language relies heavily on

recursion to provide lists of items, many definitions use these extensions:

1. item? or [item] means the item is optional.

2. item* or {item} means zero or more occurrences of an item are allowable.
3. item+ means one or more occurrences of an item are allowable.
4. Parentheses may be used for grouping.

2.3 Context-Free Grammars

A BNF is a way of describing the grammar of a language. Most interesting grammars

are context-free, meaning that the contents of any syntactic category in a sentence are
not dependent on the context in which it is used. A context-free grammar is defined
as a four tuple:
G = (N , T , P , S )
where

• N is a set of symbols called nonterminals or syntactic categories.

• T is a set of symbols called terminals or tokens.
• P is a set of productions of the form n → α where n ∈ N and α ∈ {N ∪ T }∗ .
• S ∈ N is a special nonterminal called the start symbol of the grammar.

Informally, a context-free grammar is a set of nonterminals and terminals. For

each nonterminal there are one or more productions with strings of zero or more
nonterminals and terminals on the right hand side as described in the BNF description.
There is one special nonterminal called the start symbol of the grammar.

2.3.1 The Infix Expression Grammar

A context-free grammar for infix expressions can be specified as G = (N , T , P ,E)

where

N = {E, T , F}
T = {identifier, number, +, −, ∗, /, (, )}
P is defined by the set of productions
E→E + T |E − T |T
T →T ∗ F |T /F |F
F → ( E ) | identifier | number
2.4 Derivations 31

2.4 Derivations

A sentence of a grammar is a string of tokens from the grammar. A sentence belongs

to the language of a grammar if it can be derived from the grammar. This process
is called constructing a derivation. A derivation is a sequence of sentential forms
that starts with the start symbol of the grammar and ends with the sentence you are
trying to derive. A sentential form is a string of terminals and nonterminals from
the grammar. In each step in the derivation, one nonterminal of a sentential form,
call it A, is replaced by a string of terminals and nonterminals, β, where A → β
is a production in the grammar. For a grammar, G, the language of G is the set of
sentences that can be derived from G and is usually written as L(G).

2.4.1 A Derivation

Here we prove that the expression (5 ∗ x) + y is a member of the language defined

by the grammar given in Sect. 2.3.1 by constructing a derivation for it. The derivation
begins with the start symbol of the grammar and ends with the sentence.
E ⇒ E + T ⇒ T + T ⇒ F + T ⇒ (E) + T ⇒ (T ) + T ⇒ (T ∗ F) + T
⇒ (F ∗ F) + T ⇒ (5 ∗ F) + T ⇒ (5 ∗ x) + T ⇒ (5 ∗ x) + F ⇒ (5 ∗ x) + y
Each step is a sentential form. The underlined nonterminal in each sentential
form is replaced by the right hand side of a production for that nonterminal. The
derivation proceeds from the start symbol, E, to the sentence (5 ∗ x) + y. This proves
that (5 ∗ x) + y is in the language L(G) as G is defined in Sect. 2.3.1.

Practice 2.1 Construct a derivation for the infix expression 4 + (a − b) ∗ x.

You can check your answer(s) in Section 2.17.1.

2.4.2 Types of Derivations

A sentence of a grammar is valid if there exists at least one derivation for it using
the grammar. There are typically many different derivations for a particular sentence
of a grammar. However, there are two derivations that are of some interest to us in
understanding programming languages.

• Left-most derivation—Always replace the left-most nonterminal when going from

one sentential form to the next in a derivation.
• Right-most derivation—Always replace the right-most nonterminal when going
from one sentential form to the next in a derivation.
32 2 Syntax

The derivation of the sentence (5 ∗ x) + y in Sect. 2.4.1 is a left-most derivation.

A right-most derivation for the same sentence is:
E ⇒ E + T ⇒ E + F ⇒ E + y ⇒ T + y ⇒ F + y ⇒ (E) + y ⇒ (T ) + y
⇒ (T ∗ F) + y ⇒ (T ∗ x) + y ⇒ (F ∗ x) + y ⇒ (5 ∗ x) + y

Practice 2.2 Construct a right-most derivation for the expression x ∗ y + z.

You can check your answer(s) in Section 2.17.2.

2.4.3 Prefix Expressions

Infix expressions are expressions where the operator appears between the operands.
Another type of expression is called a prefix expression. In prefix expressions the
operator appears before the operands. The infix expression 4 + (a − b) ∗ x would
be written +4 ∗ −abx as a prefix expression. Prefix expressions are in some sense
simpler than infix expressions because we don’t have to worry about the precedence
of operators. The operator precedence is determined by the order of operations in
the expression. Because of this, parentheses are not needed in prefix expressions.

2.4.4 The Prefix Expression Grammar

A context-free grammar for prefix expressions can be specified as G = (N , T , P ,E)
where

N = {E}
T = {identifier, number, +, −, ∗, /}
P is defined by the set of productions
E → + E E | − E E | ∗ E E | / E E | identifier | number

Practice 2.3 Construct a left-most derivation for the prefix expression +4 ∗

−abx.
You can check your answer(s) in Section 2.17.3.

2.5 Parse Trees

A grammar, G, can be used to build a tree representing a sentence of L(G), the

language of the grammar G. This kind of tree is called a parse tree. A parse tree is
another way of representing a sentence of a given language. A parse tree is constructed
with the start symbol of the grammar at the root of the tree. The children of each
2.5 Parse Trees 33

Fig. 2.1 A parse tree

node in the tree must appear on the right hand side of a production with the parent
on the left hand side of the same production.A program is syntactically valid if there
is a parse tree for it using the given grammar.
While there are typically many different derivations of a sentence in a language,
there is only one parse tree. This is true as long as the grammar is not ambiguous.
In fact that’s the definition of ambiguity in a grammar. A grammar is ambiguous if
and only if there is a sentence in the language of the grammar that has more than one
parse tree.
The parse tree for the sentence derived in Sect. 2.4.1 is depicted in Fig. 2.1. Notice
the similarities between the derivation and the parse tree.

Practice 2.4 What does the parse tree look like for the right-most derivation
of (5 ∗ x) + y?
You can check your answer(s) in Section 2.17.4.

Practice 2.5 Construct a parse tree for the infix expression 4 + (a − b) ∗ x.

HINT: What has higher precedence, “+” or “∗”? The given grammar auto-
matically makes “∗” have higher precedence. Try it the other way and see
why!
You can check your answer(s) in Section 2.17.5.

Practice 2.6 Construct a parse tree for the prefix expression +4 ∗ −abx.
You can check your answer(s) in Section 2.17.6.
34 2 Syntax

Fig. 2.2 An AST

2.6 Abstract Syntax Trees

There is a lot of information in a parse tree that isn’t really needed to capture the
meaning of the program that it represents. An abstract syntax tree is like a parse tree
except that non-essential information is removed. More specifically,

• Nonterminal nodes in the tree are replaced by nodes that reflect the part of the
sentence they represent.
• Unit productions in the tree are collapsed.

For example, the parse tree from Fig. 2.1 can be represented by the abstract syntax
tree in Fig. 2.2. The abstract syntax tree eliminates all the unnecessary information
and leaves just what is essential for evaluating the expression. Abstract syntax trees,
often abbreviated ASTs, are used by compilers while generating code and may be
used by interpreters when running your program. Abstract syntax trees throw away
superfluous information and retain only what is essential to allow a compiler to
generate code or an interpreter to execute the program.

Practice 2.7 Construct an abstract syntax tree for the expression 4 +

(a − b) ∗ x.
You can check your answer(s) in Section 2.17.7.

2.7 Lexical Analysis

The syntax of modern programming languages are defined via grammars. A grammar,
because it is a well-defined mathematical structure, can be used to construct a program
called a parser. A language implementation, like a compiler or an interpreter, has
a parser that reads the program from the source file. The parser reads the tokens,
or terminals, of a program and uses the language’s grammar to check to see if the
stream of tokens form a syntactically valid program.
For a parser to do its job, it must be able to get the stream of tokens from the
source file. Forming tokens from the individual characters of a source file is the job
of another program often called a tokenizer, or scanner, or lexer. Lex is the Latin
2.7 Lexical Analysis 35

word for word. The words of a program are its tokens. In programming language
implementations a little liberty is taken with the definition of word. A word is any
terminal or token of a language. It turns out that the tokens of a language can be
described by another language called the language of regular expressions.

2.7.1 The Language of Regular Expressions

The language of regular expression is defined by a context-free grammar. The context-

free grammar for regular expressions is RE = (N , T , P ,E) where

N = {E, T , K, F}
T = {character, ∗, +, · , (, )}
P is defined by the set of productions
E →E+T |T
T →T · K |K
K → F∗ | F
F → character | ( E )

The + operator is the choice operator, meaning either E or T, but not both. The
dot operator means that T is followed by K. The ∗ operator, called Kleene Star
for the mathematician that first defined it, means zero or more occurrences of F.
The grammar defines the precedence of these operators. Kleene star has the highest
precedence followed by the dot operator, followed by the choice operator. At its most
primitive level, a regular expression may be just a single character.
Frequently, a choice between many different characters may be abbreviated with
some sensible name. For instance, letter may be used to abbreviate A + B + · · · +
Z + a + b + · · · z and digit may abbreviate 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9.
Usually these abbreviations are specified explicity before the regular expression is
given.
The tokens of the infix grammar are identifier, number, +, −, ∗, /, (, and ). For
brevities sake, assume that letter and digit have the usual definitions. We’ll also
put each operator character in single quotes so as not to confuse them with the
metalanguage. Then, these tokens might be defined by the regular expression
letter.letter* + digit.digit* + ‘+’ + ‘−‘ + ‘*’ + ‘/’ + ‘(‘ + ‘)’

From this regular expression specification a couple of things come to light. Iden-
tifiers must be at least one character long, but can be as long as we wish them to be.
Numbers are only non-negative integers in the infix expression language. Floating
point numbers cannot be specified in the language as the tokens are currently defined.

Practice 2.8 Define a regular expression so that negative and non-negative

integers can both be specified as tokens of the infix expression language.
You can check your answer(s) in Section 2.17.8.
36 2 Syntax

2.7.2 Finite State Machines

A finite state machine is a mathematical model that accepts or rejects strings of

characters for some regular expression. A finite state machine is often called a finite
state automaton. The word automaton is just another word for machine. Every regular
expression has at least one finite state machine and vice versa, every finite state
machine has at least one matching regular expression. In fact, there is an algorithm
that given any regular expression can be used to construct a finite state machine for it.
Formally a finite state automata is defined as follows.

M = (, S, F, s0 , δ) where (pronounced sigma) is the input alphabet (the charac-

ters understood by the machine), S is a set of states, F is a subset of S usually
written as F ⊆ S, s0 is a special state called the start state, and δ (pronounced
delta) is a function that takes as input an alphabet symbol and a state and
returns a new state. This is usually written as δ : × S → S.

A finite state machine has a current state which initially is the start state. The
machine starts in the start state and reads characters one at a time. As characters are
read, the finite state machine changes state. Each state has transitions to other states
based on the last character read. Each time the machine transitions to a new state,
another character is read from the stream of characters.
After reading all the characters of a token, if the current state is in the set of
final states, F, then the token is accepted by the finite state machine. Otherwise, it is
rejected. Finite state machines are typically represented graphically by drawing the
states, transitions, start state, and final states. States in a graphical representation are
depicted as nodes in a graph. The start state has an arrow going into it with nothing at
the back side of the arrow. The transitions are represented as arrows going from one
state to another and are labelled with the characters that trigger the given transition.
Finally, final or accepting states are denoted with a double circle.
Figure 2.3 depicts a finite state machine for the language of infix expression tokens.
The start state is 1. Each of states 2 through 9 are accepting states, denoted with a
double circle. State 2 accepts identifier tokens. State 3 accepts number tokens. States
4 to 9 accept operators and the parenthesis tokens. The finite state machine accepts
one token at a time. For each new token, the finite state machine starts over in state 1.
If, while reading a token, an unexpected character is read, then the stream of tokens
is rejected by the finite state machine as invalid. Only valid strings of characters are
accepted as tokens. Characters like spaces, tabs, and newline characters are not
recognized by the finite state machine. The finite state machine only responds with
yes the string of tokens is in the language accepted by the machine or no it is not.

www.allitebooks.com
2.7 Lexical Analysis 37

Fig. 2.3 A finite state machine

2.7.3 Lexer Generators

It is relatively easy to construct a lexer by writing a regular expression, drawing a

finite state machine, and then writing a program that mimics the finite state machine.
However, this process is largely the same for all programming languages so there
are tools that have been written to do this for us. Typically these tools are called
lexer generators. To use a lexer generator you must write regular expressions for the
tokens of the language and provide these to the lexer generator.
A lexer generator will generate a lexer program that internally uses a finite state
machine like the one pictured in Fig. 2.3, but instead of reporting yes or no, for each
token the lexer will return the string of characters, called the lexeme or word of
the token, along with a classification of the token. So, identifiers are categorized as
identifier tokens while ‘+’ is categorized as an add token.
The lex tool is an example of a lexical generator for the C language. If you are
writing an interpreter or compiler using C as the implementation language, then
you would use lex or a similar tool to generate your lexer. lex was a tool included
with the original Unix operating system. The Linux alternative is called flex. Java,
Python, Standard ML, and most programming languages have equivalent available
lexer generators.
38 2 Syntax

Fig. 2.4 Parser data flow

2.8 Parsing

Parsing is the process of detecting whether a given string of tokens is a valid

sentence of a grammar. Every time you compile a program or run a program in
an interpreter the program is first parsed using a parser. When a parser isn’t able to
parse a program the programmer is told there is a syntax error in the program. A
parser is a program that given a sentence, checks to see if the sentence is a member
of the language of the given grammar. A parser usually does more than just answer
yes or no. A parser frequently builds an abstract syntax tree representation of the
source program. There are two types of parsers that are commonly constructed.

• A top-down parser starts with the root of the parse tree.

• A bottom-up parser starts with the leaves of the parse tree.

Top-down and bottom-up parsers check to see if a sentence belongs to a gram-

mar by constructing a derivation for the sentence, using the grammar. A parser either
reports success (and possibly returns an abstract syntax tree) or reports failure (hope-
fully with a nice error message). The flow of data is pictured in Fig. 2.4.

2.9 Top-Down Parsers

Top-down parsers are generally written by hand. They are sometimes called recursive
descent parsers because they can be written as a set of mutually recursive functions.
A top-down parser performs a left-most derivation of the sentence (i.e. source
program).
A top-down parser operates by (possibly) looking at the next token in the source
file and deciding what to do based on the token and where it is in the derivation.
To operate correctly, a top-down parser must be designed using a special kind of
grammar called an LL(1) grammar. An LL(1) grammar is simply a grammar where
the next choice in a left-most derivation can be deterministically chosen based on the
current sentential form and the next token in the input. The first L refers to scanning
2.9 Top-Down Parsers 39

the input from left to right. The second L signifies that while performing a left-most
derivation, there is only 1 symbol of lookahead that is needed to make the decision
about which production to choose next in the derivation.

2.9.1 An LL(1) Grammar

The grammar for prefix expressions is LL(1). Examine the prefix expression grammar
G = (N , T , P ,E) where

N = {E}
T = {identifier, number, +, −, ∗, /}
P is defined by the set of productions
E → + E E | − E E | ∗ E E | / E E | identifier | number

While constructing any derivation for a sentence of this language, the next pro-
duction chosen in a left-most derivation is going to be obvious because the next token
of the source file must match the first terminal in the chosen production.

2.9.2 A Non-LL(1) Grammar

Some grammars are not LL(1). The grammar for infix expressions is not LL(1).
Examine the infix expression grammar G = (N , T , P ,E) where

N = {E, T , F}
T = {identifier, number, +, −, ∗, /, (, )}
P is defined by the set of productions
E→E + T |E − T |T
T →T ∗ F |T /F |F
F → ( E ) | identifier | number

Consider the infix expression 5 ∗ 4. A left-most derivation of this expression

would be
E ⇒T ⇒T ∗F ⇒F∗F ⇒5∗F ⇒5∗4
Consider looking at only the 5 in the expression. We have to choose whether to use
the production E → E + T or E → T . We are only allowed to look at the 5 (i.e.
we can’t look beyond the 5 to see the multiplication operator). Which production do
we choose? We can’t decide based on the 5. Therefore the grammar is not LL(1).
Just because this infix expression grammar is not LL(1) does not mean that infix
expressions cannot be parsed using a top-down parser. There are other infix expres-
sion grammars that are LL(1). In general, it is possible to transform any context-free
grammar into an LL(1) grammar. It is possible, but the resulting grammar is not
always easily understandable.
40 2 Syntax

The infix grammar given in Sect. 2.9.2 is left recursive. That is, it contains the
production E → E + T and another similar production for terms in infix expressions.
These rules are left recursive. Left recursive rules are not allowed in LL(1) grammars.
A left recursive rule can be eliminated in a grammar through a straightforward
transformation of its production.
Common prefixes in the right hand side of two productions for the same nontermi-
nal are also not allowed in an LL(1) grammar. The infix grammar given in Sect. 2.9.2
does not contain any common prefixes. Common prefixes can be eliminated by
introducing a new nonterminal to the grammar, replacing all common prefixes with
the new nonterminal, and then defining one new production so the new nonterminal
is composed of the common prefix.

2.9.3 An LL(1) Infix Expression Grammar

The following grammar is an LL(1) grammar for infix expressions. G = (N , T , P ,E)

where

N = {E, RestE, T , RestT , F}

T = {identifier, number, +, −, ∗, /, (, )}
P is defined by the set of productions
E → T RestE
RestE → + T RestE | − T RestE |
T → F RestT
RestT → ∗ F RestT | / F RestT |
F → ( E ) | identifier | number

In this grammar the (pronounced epsilon) is a special symbol that denotes an

empty production. An empty production is a production that does not consume any
tokens. Empty productions are sometimes convenient in recursive rules.
Once common prefixes and left recursive rules are eliminated from a context-free
grammar, the grammar will be LL(1). However, this transformation is not usually
performed because there are more convenient ways to build a parser, even for non-
LL(1) grammars.

Practice 2.9 Construct a left-most derivation for the infix expression 4 +

(a − b) ∗ x using the grammar in Sect. 2.9.3, proving that this infix expression
is in L(G) for the given grammar.
You can check your answer(s) in Section 2.17.9.
2.10 Bottom-Up Parsers 41

2.10 Bottom-Up Parsers

While the original infix expression language is not LL(1) it is LALR(1). In fact,
most grammars for programming languages are LALR(1). The LA stands for look
ahead with the 1 meaning just one symbol of look ahead. The LR refers to scanning
the input from left to right while constructing a right-most derivation. A bottom-
up parser constructs a right-most derivation of a source program in reverse. So, an
LALR(1) parser constructs a reverse right-most derivation of a program.
Building a bottom-up parser is a somewhat complex task involving the computa-
tion of item sets, look ahead sets, a finite state machine, and a stack. The finite state
machine and stack together are called a pushdown automaton. The construction of
the pushdown automaton and the look ahead sets are calculated from the grammar.
Bottom-up parsers are not usually written by hand. Instead, a parser generator is used
to generate the parser program from the grammar. A parser generator is a program
that is given a grammar and builds a parser for the language of the grammar by
constructing the pushdown automaton and lookahead sets needed to parse programs
in the language of the grammar.
The original parser generator for Unix was called yacc, which stood for yet another
compiler compiler since it was a compiler for grammars that produced a parser for
a language. Since a parser is part of a compiler, yacc was a compiler compiler. The
Linux version of yacc is called Bison. Hopefully you see the pun that was used
in naming it Bison. The Bison parser generator generates a parser for compilers
implemented in C, C++, or Java. There are versions of yacc for other languages
as well. Standard ML has a version called ml-yacc for compilers implemented in
Standard ML. ML-yacc is introduced and used in Chap. 6.
Parser generators like Bison produce what is called a bottom-up parser because
the right-most derivation is constructed in reverse. In other words, the derivation is
done from the bottom up. Usually, a bottom-up parser is going to return an AST
representing a successfully parsed source program. Figure 2.5 depicts the dataflow
in an interpreter or compiler. The parser generator is given a grammar and runs once
to build the parser. The generated parser runs each time a source program is parsed.

Fig. 2.5 Parser generator data flow

42 2 Syntax

A bottom-up parser parses a program by constructing a reverse right-most deriva-

tion of the source code. As the reverse derivation proceeds the parser shifts tokens
from the input onto the stack of the pushdown automaton. Then at various points
in time it reduces by deciding, based on the look ahead sets, that a reduction is
necessary.

2.10.1 Parsing an Infix Expression

Consider the grammar for infix expressions as G = (N , T , P ,E) where

N = {E, T , F}
T = {identifier, number, +, −, ∗, /, (, )}
P is defined by the set of productions
(1) E → E + T
(2) E → T
(3) T → T * F
(4) T → F
(5) F → number
(6) F → ( E )

Now assume we are parsing the expression 5 ∗ 4 + 3. A right-most derivation for

this expression is as follows.
E ⇒E+T ⇒E+F ⇒E+3⇒T +3
⇒T ∗ F + 3⇒T ∗ 4 + 3⇒F ∗ 4 + 3⇒5 ∗ 4 + 3
A bottom-up parser does a right-most derivation in reverse using a pushdown
automaton. It can be useful to look at the stack of the pushdown automaton as it
parses the expression as pictured in Fig. 2.6. In step A the parser is beginning. The
dot to the left of the 5 indicates the parser has not yet processed any tokens of the
source program and is looking at the 5. The stack is empty. From step A to step B
one token, the 5 is shifted onto the stack. From step B to C the parser looks at the
multiplication operator and realizes that a reduction using rule 5 of the grammar
must be performed. It is called a reduction because the production is employed in
reverse order. The reduction pops the right hand side of rule 5 from the stack and
replaces it with the nonterminal F. If you look at this derivation in reverse order, the
first step is to replace the number 5 with F.
The rest of the steps of parsing the source program follow the right-most derivation
either shifting tokens onto the stack or reducing using rules of the grammar. In step
O the entire source has been parsed, the stack is empty, and the source program is
accepted as a valid program. The actions taken while parsing include shifting and
reducing. These are the two main actions of any bottom-up parser. In fact, bottom-up
parsers are often called shift-reduce parsers.
2.10 Bottom-Up Parsers 43

Fig. 2.6 A pushdown automaton stack

44 2 Syntax

Practice 2.10 For each step in Fig. 2.6, is there a shift or reduce operation
being performed? If it is a reduce operation, then what production is being
reduced? If it is a shift operation, what token is being shifted onto the stack?
You can check your answer(s) in Section 2.17.10.

Practice 2.11 Consider the expression (6 + 5) ∗ 4. What are the contents of

the pushdown automaton’s stack as the expression is parsed using a bottom-up
parser? Show the stack after each shift and each reduce operation.
You can check your answer(s) in Section 2.17.11.

2.11 Ambiguity in Grammars

A grammar is ambiguous if there exists more than one parse tree for a given sentence
of the language. In general, ambiguity in a grammar is a bad thing. However, some
ambiguity may be allowed by parser generators for LALR(1) languages.
A classic example of ambiguity in languages arises from nested if-then-else state-
ments. Consider the following Pascal statement:
if a<b then
if b<c then
writeln("a<c")
else
writeln("?")

Which if statement does the else go with? It’s not entirely clear. The BNF for an
if-then-else statement might look something like this.
<statement> ::= if <expression> then <statement> else <statement>
| if <expression> then <statement>
| writeln ( <expression> )

The recursive nature of this rule means that if-then-else statements can be arbir-
trarily nested. Because of this recursive definition, the else in this code is dangling.
That is, it is unclear if it goes with the first or second if statement.
When a bottom-up parser is generated using this grammar, the parser generator
will detect that there is an ambiguity in the grammar. The problem manifests itself
as a conflict between a shift and a reduce operation. The first rule says when looking
at an else keyword the parser should shift. The second rule says when the parser is
looking at an else it should reduce. To resolve this conflict there is generally a way
2.11 Ambiguity in Grammars 45

to specify whether the generated parser should shift or reduce. The default action is
usually to shift and that is what makes the most sense in this case. By shifting, the
else would go with the nearest if statement. This is the normal behavior of parsers
when encountering this if-then-else ambiguity.

2.12 Other Forms of Grammars

As a computer programmer you will likely learn at least one new language and
probably a few during your career. New application areas frequently cause new
languages to be developed to make programming applications in that area more
convenient. Java, JavaScript, and ASP.NET are three languages that were created
because of the world wide web. Ruby and Perl are languages that have become
popular development languages for database and server side programming. Objective
C is another language made popular by the rise of iOS App programming for Apple
products. A recent trend in programming languages is to develop domain specific
languages for particular embedded platforms.
Programming language manuals contain some kind of reference that describes
the constructs of the language. Many of these reference manuals give the grammar
of the language using a variation of a context free grammar. Examples include CBL
(Cobol-like) grammars, syntax diagrams, and as we have already seen, BNF and
EBNF. All these syntax metalanguages share the same features as grammars. They
all have some way of defining parts of a program or syntactic categories and they all
have a means of defining a language through recursively defined productions. The
definitions, concepts, and examples provided in this chapter will help you understand
a language reference when the time comes to learn a new language.

2.13 Limitations of Syntactic Definitions

The concrete syntax for a language is almost always an incomplete description. Not
all syntactically valid strings of tokens should be regarded as valid programs. For
instance, consider the expression 5 + 4/0. Syntactically, this is a valid expression,
but of course cannot be evaluated since division by zero is undefined. This is a
semantic issue. The meaning of the expression is undefined because division by zero
is undefined. This is a semantic issue and semantics are not described by a syntactic
definition. All that a grammar can ensure is that the program is syntactically valid.
In fact, there is no BNF or EBNF grammar that generates only legal programs in
any programming language including C++, Java, and Standard ML. A BNF grammar
46 2 Syntax

defines a context-free language: the left-hand side of each rules contains only one
syntactic category. It is replaced by one of its alternative definitions regardless of the
context in which it occurrs.
The set of programs in any interesting language is not context-free. For instance,
when the expression a + b is evaluated, are a and b type compatible and defined
over the + operator? This is a context sensitive issue that can’t be specified using a
context-free grammar. Context-sensitive features may be formally described as a set
of restrictions or context conditions. Context-sensitive issues deal mainly with dec-
larations of identifiers and type compatibility. Sometimes, context-sensitive issues
like this are said to be part of the static semantics of the language.
While a grammar describes how tokens are put together to form a valid program
the grammar does not specify the semantics of the language nor does it describe the
static semantics or context-sensitive characteritics of the language. Other means are
necessary to describe these language characteristics. Some methods, like type infer-
ence rules, are formally defined. Most semantic characteristics are defined informally
in some kind of English language description.
These are all context-sensitive issues.

• In an array declaration in C++, the array size must be a nonnegative value.

• Operands for the && operation must be boolean in Java.
• In a method definition, the return value must be compatible with the return type
in the method declaration.
• When a method is called, the actual parameters must match the formal parameter
types.

2.14 Chapter Summary

This chapter introduced you to programming language syntax and syntactic descrip-
tions. Reading and understanding syntactic descriptions is worthwhile since you will
undoubtedly come across new languages in your career as a computer scientist. There
is certainly more that can be said about the topic of programming language syntax.
Aho et al. [2] have written the widely recognized definitive book on compiler imple-
mentation which includes material on syntax definition and parser implementation.
There are many other good compiler references as well. The Chomsky hierarchy of
languages is also closely tied to grammars and regular expressions. Many books on
Discrete Structures in Computer Science introduce this topic and a few good books
explore the Chomsky hierarchy more deeply including an excellent text by Peter
Linz [13].
In the next chapter you put this knowledge of syntax definition to good use learning
a new language: the CoCo assembly language. CoCo is a virtual machine for inter-
preting Python bytecode instructions. Learning assembly language helps in having a

www.allitebooks.com
2.14 Chapter Summary 47

better understanding of how higher level languages work and Chap. 3 provides many
examples of Python programs and their corresponding CoCo assembly language
programs to show you how a higher level language is implemented.

2.15 Review Questions

1. What does the word syntax refer to? How does it differ from semantics?
2. What is a token?
3. What is a nonterminal?
4. What does BNF stand for? What is its purpose?
5. What kind of derivation does a top-down parser construct?
6. What is another name for a top-down parser?
7. What does the abstract syntax tree for 3 ∗ (4 + 5) look like for infix expressions?
8. What is the prefix equivalent of the infix expression 3 ∗ (4 + 5)? What does the
prefix expression’s abstract syntax tree look like?
9. What is the difference between lex and yacc?
10. Why aren’t all context-free grammars good for top-down parsing?
11. What kind of machine is needed to implement a bottom-up parser?
12. What is a context-sensitive issue in a language? Give an example in Java.
13. What do the terms shift and reduce apply to?

2.16 Exercises

1. Rewrite the BNF in Sect. 2.2.1 using EBNF.

2. Given the grammar in Sect. 2.3.1, derive the sentence 3 ∗ (4 + 5) using a right-
most derivation.
3. Draw a parse tree for the sentence 3 ∗ (4 + 5).
4. Describe how you might evaluate the abstract syntax tree of an expression to get
a result? Write out your algorithm in English that describes how this might be
done.
5. Write a regular expression to describe identifier tokens which must start with a
letter and then can be followed by any number of letters, digits, or underscores.
6. Draw a finite state machine that would accept identifier tokens as specified in the
previous exercise.
7. For the expression 3 ∗ (4 + 5) show the sequence of shift and reduce operations
using the grammar in Sect. 2.10.1. Be sure to say what is shifted and which rule
is being used to reduce at each step. See the solution to practice problem 2.11 for
the proper way to write the solution to this problem.
8. Construct a left-most derivation of 3 ∗ (4 + 5) using the grammar in Sect. 2.9.3.
48 2 Syntax

2.17 Solutions to Practice Problems

2.17.1 Solution to Practice Problem 2.1

This is a left-most derivation of the expression. There are other derivations that would
be correct as well.
E ⇒E+T ⇒T +T ⇒F+T ⇒4+T ⇒4+T ∗F ⇒4+F∗F
⇒ 4 + (E) ∗ F ⇒ 4 + (E − T ) ∗ F ⇒ 4 + (T − T ) ∗ F ⇒ 4 + (F − T ) ∗ F
⇒ 4 + (a − T ) ∗ F ⇒ 4 + (a − F) ∗ F ⇒ 4 + (a − b) ∗ F ⇒ 4 + (a − b) ∗ x

2.17.2 Solution to Practice Problem 2.2

This is a right-most derivation of the expression x ∗ y + z. There is only one correct

right-most derivation.
E ⇒E+T ⇒E+F ⇒E+z ⇒T +z ⇒T ∗F+z ⇒T ∗y+z ⇒F∗y+z
⇒x∗y+z

2.17.3 Solution to Practice Problem 2.3

This is a left-most derivation of the expression +4 ∗ −abx.

E ⇒ +EE ⇒ +4E ⇒ +4 ∗ EE ⇒ +4 ∗ −EEE ⇒ +4 ∗ −aEE ⇒ +4 ∗ −abE
⇒ +4 ∗ −abx

2.17.4 Solution to Practice Problem 2.4

Exactly like the parse tree for any other derivation of (5 ∗ x) + y. There is only one
parse tree for the expression given this grammar.
2.17 Solutions to Practice Problems 49

2.17.5 Solution to Practice Problem 2.5

50 2 Syntax

2.17.6 Solution to Practice Problem 2.6

2.17.7 Solution to Practice Problem 2.7

2.17.8 Solution to Practice Problem 2.8

In order to define both negative and positive numbers, we can use the choice operator.

letter.letter* + digit.digit* + ‘−‘.digit.digit* ‘+’ + ‘−‘ + ‘*’ + ‘/’ + ‘(‘ + ‘)’

2.17 Solutions to Practice Problems 51

2.17.9 Solution to Practice Problem 2.9

E ⇒ T RestE ⇒ F RestT RestE ⇒ 4 RestT RestE ⇒ 4 RestE ⇒ 4 + T RestE

⇒ 4 + F RestT RestE ⇒ 4 + (E) RestT RestE ⇒ 4 + (T RestE)RestT RestE
⇒ 4 + (F RestT RestE) RestT RestE ⇒ 4 + (a RestT RestE)RestT RestE
⇒ 4 + (a RestE) RestT RestE ⇒ 4 + (a − T RestE) RestT RestE
⇒ 4 + (a − F RestE) RestT RestE ⇒ 4 + (a − b RestE)
⇒ 4 + (a − b) RestT RestE ⇒ 4 + (a − b) ∗ F RestT RestE
⇒ 4 + (a − b) ∗ x RestT RestE ⇒ 4 + (a − b) ∗ x RestE ⇒ 4 + (a − b) ∗ x

2.17.10 Solution to Practice Problem 2.10

In the parsing of 5 ∗ 4 + 3 the following shift and reduce operations: step A initial
condition, step B shift, step C reduce by rule 5, step D reduce by rule 4, step E shift,
step F shift, step G reduce by rule 5, step H reduce by rule 3, step I reduce by rule
2, step J shift, step K shift, step L reduce by rule 5, step M reduce by rule 4, step N
reduce by rule 1, step O finished parsing with dot on right side and E on top of stack
so pop and complete with success.

2.17.11 Solution to Practice Problem 2.11

To complete this problem it is best to do a right-most derivation of (6 + 5) ∗ 4

first. Once that derivation is complete, you go through the derivation backwards. The
difference in each step of the derivation tells you whether you shift or reduce. Here
is the result.
E ⇒ T ⇒ T ∗ F ⇒ T ∗ 4 ⇒ F ∗ 4 ⇒ (E) ∗ 4 ⇒ (E + T ) ∗ 4
⇒ (E + F) ∗ 4 ⇒ (E + 5) ∗ 4 ⇒ (T + 5) ∗ 4 ⇒ (F + 5) ∗ 4 ⇒ (6 + 5) ∗ 4
We get the following operations from this. Stack contents have the top on the right
up to the dot. Everything after the dot has not been read yet. We shift when we must
move through the tokens to get to the next place we are reducing. Each step in the
reverse derivation provides the reduce operations. Since there are seven tokens there
should be seven shift operations.

1. Initially: . ( 6 + 5 ) ∗ 4
2. Shift: ( . 6 + 5 ) ∗ 4
3. Shift: ( 6 . + 5 ) ∗ 4
4. Reduce by rule 5: ( F . + 5 ) ∗ 4
5. Reduce by rule 4: ( T . + 5 ) ∗ 4
6. Reduce by rule 2: ( E . + 5 ) ∗ 4
7. Shift: ( E + . 5 ) ∗ 4
52 2 Syntax

8. Shift: ( E + 5 . ) ∗ 4
9. Reduce by rule 5: ( E + F . ) ∗ 4
10. Reduce by rule 4: ( E + T . ) ∗ 4
11. Shift: ( E + T ) . ∗ 4
12. Reduce by rule 1: ( E ) . ∗ 4
13. Reduce by rule 6: F . ∗ 4
14. Reduce by rule 4: T . ∗ 4
15. Shift: T ∗ . 4
16. Shift: T ∗ 4 .
17. Reduce by rule 5: T ∗ F .
18. Reduce by rule 3: T .
19. Reduce by rule 2: E .
Assembly Language
3

Python is an object-oriented, interpreted language. Internally to the Python

interpreter, a Python program is converted to bytecode and interpreted using a virtual
machine. Most modern programming languages have support for high-level abstrac-
tions while the instructions of a virtual machine are closer to the machine language
instructions supported by hardware architectures, making the interpretation of byte-
code easier than interpretation of the original source program. The advantage of
virtual machine implementations results from dividing the mapping from high-level
abstractions to low-level machine instructions into two parts: high-level abstractions
to bytecode and bytecode to machine instructions.
While bytecode is a higher level abstraction than machine language, it is not greatly
so. As programmers, if we understand how the underlying machine executes our
programs, we better equip ourselves to make good choices about how we program.
Just as importantly, having an understanding of how programs are executed can help
us diagnose problems when things go wrong.
This chapter introduces assembly language programming in the bytecode lan-
guage of the Python virtual machine. The Python virtual machine is an internal
component of the Python interpreter and is not available to use directly. Instead,
a bytecode interpreter called CoCo has been developed that mimics a subset of the
behavior of the Python 3.2 virtual machine. Instead of writing bytecode files directly,
CoCo supports a Python virtual machine assembly language.
While learning assembly language, we’ll limit ourselves to a subset of Python.
CoCo supports boolean values, integers, strings, floats, tuples, and lists. It supports
functions definitions and function calls. It also supports most of the instructions of the
Python virtual machine including support for conditional execution, iteration, and
exception handling. It does not support importing modules or module level code.
CoCo differs from Python by requiring a main function where execution of a CoCo
assembled program begins.
To run an assembly language program it must first be assembled, then it can be exe-
cuted. The CoCo virtual machine includes the assembler so assembly isn’t a separate
step. An assembly language programmer writes a program in the CoCo assembly lan-
guage format, providing it to CoCo, which then assembles and interprets the program.

K.D. Lee, Foundations of Programming Languages,
Undergraduate Topics in Computer Science, DOI 10.1007/978-3-319-13314-0_3
54 3 Assembly Language

The main difference between CoCo assembly language and bytecode is the
presence of labels in the assembly language format. Labels are the targets of instruc-
tions that change the normal sequence of execution of instructions. Instructions like
branch and jump instructions are much easier to decipher if it says “jump to loop1”
rather than “jump to address 63”. Of course, bytecode instructions are encoded as
numbers themselves, so the assembler translates “jump to loop1” to something like
“48 63” which of course would require a manual to decipher.
Learning to program in assembly isn’t all that hard once you learn how constructs
like while loops, for loops, if-then statements, function definitions, and function calls
are implemented in assembly language. String and list manipulation is another skill
that helps if you have examples to follow. A disassembler is a tool that will take a
machine language program and produce an assembly language version of it. Python
includes a module called dis that includes a disassembler. When you write a Python
program it is parsed and converted to bytecode when read by the interpreter. The dis
module disassembler produces an assembly language program from this bytecode.
CoCo includes its own disassembler which uses the Python dis module and produces
output suitable for the CoCo virtual machine.
The existence of the disassembler for CoCo means that learning assembly lan-
guage is as easy as writing a Python program and running it through the disassembler
to see how it is implemented in assembly language. That means you can discover how
Python is implemented while learning assembly language! Because Python’s virtual
machine is not guaranteed to be backwards compatible, you must use Python 3.2 when
disassembling programs so make sure that version 3.2 is installed on your system. To
test this you can try typing “python3.2” in a terminal window in your favorite operat-
ing system. If it says command not found, you likely don’t have Python 3.2 installed.
In that case you can download it from http://python.org. The rest of this chapter
introduces you to assembly language programming using the CoCo virtual machine.
You can get the CoCo virtual machine by going to http://cs.luther.edu/~leekent/
CoCo. The download link provides binary executables for Microsoft Windows, Apple
Mac OS X, and Linux operating systems. You can also get the source code from
Github and compile it yourself. There are instructions on the web page for down-
loading the source and compiling to build your own executable.

3.1 Overview of the CoCo VM

CoCo, like Python, is a virtual machine, or interpreter, for bytecode instructions.
CoCo is written in C++ using object-oriented principles and does not store its in-
structions in actual bytecode format. Instead, it reads an assembly language file and
assembles it building an internal representation of the program as a sequence of
functions each with their own sequence of bytecode instructions.
A CoCo program, like programs in other programming languages, utilizes a
run-time stack to store information about each function called while the program
is executing. Each function call in a CoCo program results in a new stack frame
object being created and pushed onto the run-time stack. When a function returns,
3.1 Overview of the CoCo VM 55

its corresponding stack frame is popped from the run-time stack and discarded.
Figure 3.1 depicts four active function calls. Function A called function B, which
called function C, which called function D before any of the functions returned. The
top of the stack is at the top of Fig. 3.1. Each stack frame contains all local variables
that are defined in the function. Each stack frame also contains two additional stacks,
an operand stack and a block stack.

Fig. 3.1 The CoCo virtual machine

56 3 Assembly Language

CoCo, like the Python virtual machine, is a stack based architecture. This means
that operands for instructions are pushed onto an operand stack. Virtual machine
instructions pop their operands from the operand stack, do their intended operation,
and push their results onto the operand stack. Most CPUs are not stack based. Instead
they have general purpose registers for holding intermediate results. Stack based
architectures manage the set of intermediate results as a stack rather than forcing the
programmer to keep track of which registers hold which results. The stack abstraction
makes the life of an assembly language programmer a little easier. The operand
stack is used by the virtual machine to store all intermediate results of instruction
execution. This style of computation has been in use a long time, from Hewlett
Packard mainframe computers of the 1960s through the 1980s to calculators still
made by Hewlett Packard today. The Java Virtual Machine, or JVM, is another
example of a stack machine.
The other stack utilized by CoCo is a block stack. The block stack keeps track of
exit points for blocks of code within a CoCo function. When a loop is entered, the exit
address of the loop is pushed onto the block stack. The instructions of each function
are at zero-based offsets from the beginning of the function, so we can think of each
function having its own instruction address space starting at 0. By storing each loop’s
exit point address on the block stack, if a break instruction is executed inside a loop,
the exit point of the loop can be found and the execution of the break instruction will
jump to that address. Exception handlers also push the address of the handler onto
the block stack. If an exception occurs, execution jumps to the exception handler by
popping the address from the block stack. When a loop or try block is exited, the
corresponding block stack address is popped from the block stack.
A program counter, or PC, is responsible for holding the address of the next
instruction to be executed. The machine proceeds by fetching an instruction from
the code, incrementing the PC, and executing the fetched instruction. Execution
proceeds this way until a RETURN_VALUE instruction is executed or an exception
occurs. When a function call is executed, the current program counter is stored in
the stack frame until the called function returns, when the PC is restored to the next
instruction in the current stack frame. This is depicted in Fig. 3.1 with the arrows
from the stack frames to the code of their corresponding functions.
When an exception occurs, if no matching exception handler is found, execution
of the function terminates and control is passed to the previously called function
where the exception continues to propagate back until a matching exception handler
is found. If no matching handler is found, the complete traceback of the exception is
printed. If no exception occurs during the running of a program, execution terminates
when the main function executes the RETURN_VALUE instruction.
The specification for CoCo, including all instructions, global functions, and the
complete assembly language syntax supported by CoCo can be found in Chap. 9.
The rest of this chapter examines various Python language constructs and the
corresponding assembly language that implement these constructs. CoCo assembly
language can be learned by examining Python code and learning how it is imple-
mented in assembly language. The rest of this chapter proceeds in this fashion.

www.allitebooks.com
3.2 Getting Started 57

3.2 Getting Started

CoCo includes a disassembler that works with Python 3.2 to disassemble Python
programs into CoCo assembly language programs, providing a great way to learn
assembly language programming using the CoCo virtual machine. Consider the
following Python program that adds 5 and 6 together and prints the sum to the
screen.
1 from disassembler import *
2 def main():
3 x = 5
4 y = 6
5 z = x + y
6 print(z)
7 #main()
8 disassembler.disassemble(main)

Running this with python3.2 as follows produces this output.

1 MyComputer> python3.2 addtwo.py
2 Function: main/0
3 Constants: None, 5, 6
4 Locals: x, y, z
5 Globals: print
6 BEGIN
7 LOAD_CONST 1
8 STORE_FAST 0
9 LOAD_CONST 2
10 STORE_FAST 1
11 LOAD_FAST 0
12 LOAD_FAST 1
13 BINARY_ADD
14 STORE_FAST 2
15 LOAD_GLOBAL 0
16 LOAD_FAST 2
17 CALL_FUNCTION 1
18 POP_TOP
19 LOAD_CONST 0
20 RETURN_VALUE
21 END
22 MyComputer> python3.2 addtwo.py > addtwo.casm

The disassembler prints the assembly language program to standard output, which
is usually the screen. The second run of the addtwo.py program redirects the standard
output to a file called addtwo.casm. The casm is the extension chosen for CoCo
assembly language files and stands for CoCo Assembly. This CASM file holds all
the lines between the two MyComputer prompts above. To run this program you can
invoke the CoCo virtual machine as shown here.
58 3 Assembly Language

1 MyComputer> coco addtwo.casm

2 Function: main/0
3 Constants: None, 5, 6
4 Locals: x, y, z
5 Globals: print
6 BEGIN
7 LOAD_CONST 1
8 STORE_FAST 0
9 LOAD_CONST 2
10 STORE_FAST 1
11 LOAD_FAST 0
12 LOAD_FAST 1
13 BINARY_ADD
14 STORE_FAST 2
15 LOAD_GLOBAL 0
16 LOAD_FAST 2
17 CALL_FUNCTION 1
18 POP_TOP
19 LOAD_CONST 0
20 RETURN_VALUE
21 END
22
23 11
24 MyComputer> coco addtwo.casm 2> /dev/null
25 11
26 MyComputer>

The first run invokes coco which assembles the program producing the assembled
output and then runs the program producing the 11 that appears below the assembled
output. The assembled output is printed to a stream called standard error which is
separate from the standard output stream where the 11 is printed. To only print the
exact output of the program, standard error can be redirected to /dev/null which is
an output device that simply throws away anything sent to it. The second run of coco
demonstrates how to throw away the standard error stream. Throwing away standard
error would also throw away any error output the program might produce so is not
recommended in most cases.
In this CoCo program there is one function called main. The assembly indicates
main has 0 formal parameters. Constants that are used in the code include None, 5,
and 6. There are three local variables in the function: x, y, and z. The global print
function is called and so is in the list of globals. Every function in CoCo has these
categories of identifiers and values within each defined function. Sometimes one or
more of these categories may be empty and can be omitted in that case.
The instructions follow the begin keyword and precede the end keyword. LOAD_
CONST loads the constant value at its index (zero based and 1 in this case) into
the constants onto the operand stack. CoCo is a stack machine and therefore all
operations are performed with operands pushed and popped from the operand stack.
The STORE_FAST instruction stores a value in the locals list, in this case at
offset 0, the location of x. LOAD_FAST does the opposite of STORE_FAST, push-
ing a value on the operand stack from the locals list of variables. BINARY_ADD
pops two operands from the stack and adds them together, pushing the result.
CALL_FUNCTION pops the number of arguments specified in the instruction (1 in
this case) and then pops the function from the stack. Finally, it calls the popped
function with the popped arguments. The result of the function call is left on the
3.2 Getting Started 59

top of the operand stack. In the case of the print function, None is returned and left
on the stack. The POP_TOP instruction pops the None from the stack and discards
it only to have the main function push a None on the stack just before returning.
RETURN_VALUE pops the top argument from the operand stack and returns that
value to the calling function. Since main was the only function called, returning from
it ends the coco interpretation of the program.
To run this code, you must have the coco executable somewhere in your path.
Then you can execute the following code to try it out.
MyComputer> python3.2 addtwo.py > addtwo.casm
MyComputer> coco addtwo.casm

3.3 Input/Output
CoCo provides one built-in function to read input from the keyboard and several
functions for writing output to this screen or standard output. The following program
demonstrates getting input from the keyboard and printing to standard output.
1 import disassembler
2
3 def main():
4 name= input("Enter your name: ")
5 age = int(input("Enter your age: "))
6 print(name + ", a year from now you will be", age+1, "years old.")
7
8 #main()
9 disassembler.disassemble(main)

In the Python code in Sect. 3.3, the input function is called. Calling input requires
a string prompt and returns a string of the input that was entered. Calling the int
function on a string, as is done in the line that gets the age from the user, returns the
integer representation of the string’s value. Finally, the print function takes a random
number of arguments, converts each to a string using the _ _str_ _ magic method,
and prints each string separated by spaces. The first argument to print in the code of
Sect. 3.3 is the result of concatenating name and the string “, a year from now you
will be”. String concatenation was used because there shouldn’t be a space between
the name value and the comma.
The assembly language that implements the program in Sect. 3.3 is given in
Fig. 3.2. Notice that built-in functions like input, int, and print are declared under
the Globals list. The name and age variables are the locals.
Line 9 pushes the input function onto the operand stack. Line 10 pushes the string
prompt for input. Line 11 calls the input function with the one allowed argument
given to it. The 1 in line 11 is the number of arguments. When the input function
returns it leaves string entered by the user on the operand stack. Line 12 stores that
string in the name location in the locals.
Line 13 prepares to convert the next input to an integer by first pushing the int
function on the operand stack. Then line 14 loads the input function. Line 15 loads
the prompt like line 10 did previously. Line 16 calls the input function. The result
60 3 Assembly Language

Fig. 3.2 CoCo I/O

is immediately passed to the int function by calling it on line 17. The int function
leaves an integer on the top of the operand stack and line 18 stores that in the age
variable location.
The next part of the program prints the output. To prepare for calling the print
function, the arguments must be evaluated first, then print can be called. Line 19
pushes the print function onto the stack but does not call print. There are three
arguments to the print function. The first argument is the result of concatenating
two strings together. Line 20 pushes the name variable’s value on the stack. Line
21 pushes the string “, a year from now you will be” onto the stack. Line 22 calls
the _ _add_ _ magic method to concatenate the two strings. The BINARY_ADD
instruction pops two operands from the stack, calls the _ _add_ _ method on the first
object popped with the second object as the argument which is described in more
detail in Chap. 9.
Lines 23–25 add together age and 1 to get the correct age value to pass to print.
Line 26 pushes the last string constant on the operand stack and line 27 finally calls
3.3 Input/Output 61

the print function leaving None on the operand stack afterwards. Line 28 pops the
None value and immediately None is pushed back on the stack in line 29 because
the main function returns None in this case, which is returned in line 30, ending the
iotest.casm program’s execution.
A few important things to learn from this section:

• Getting input and producing output rely on the built-in functions input and print.
• Before a function can be called, it must be pushed on the operand stack. All required
arguments to the function must also be pushed onto the stack on top of the function
to be called.
• Finally, when a function returns, it leaves its return value on the operand stack.

Practice 3.1 The code in Fig. 3.2 is a bit wasteful which often happens when
compiling a program written in a higher level language. Optimize the code in
Fig. 3.2 so it contains fewer instructions.
You can check your answer(s) in Section 3.16.1.

3.4 If-Then-Else Statements

Programming languages must be able to execute code based on conditions, either

externally provided via input or computed from other values as the program exe-
cutes. If-then statements are one means of executing code conditionally. The code
provided here isolates just an if-then statement to show how it is implemented in
CoCo assembly.
1 import disassembler
2
3 def main():
4 x = 5
5 y = 6
6 if x > y:
7 z = x
8 else:
9 z = y
10
11 print(z)
12
13 disassembler.disassemble(main)

Disassembling this Python code results in the code in Fig. 3.3. There are new
instructions in Fig. 3.3 that haven’t been encountered until now, but just as impor-
tantly, there are labels in this code. A label provides a symbolic target to jump to
in the code. Labels, like label00 and label01, are defined by writing them before an
instruction and are terminated with a colon. A label to the right of an instruction is a
62 3 Assembly Language

Fig. 3.3 If-Then-Else assembly

target for that instruction. Labels are a convenience in all assembly languages. They
let assembly language programmers think of jumping to a target in a program, rather
than changing the contents of the PC register, which is what actually happens. When
a program is executed using CoCo the labels disappear because CoCo assembles the
code, replacing the labels with the actual PC target addresses. The CoCo code in
Fig. 3.4 shows the CoCo code after it has been assembled. The assembled code is
printed by coco when the program is executed.
The first instruction, the LOAD_CONST, is at offset 0 in the code. The instructions
of each function are at zero-based offsets from the beginning of the function, so we
can think of each function as having its own address space starting at zero. In the
code in Figs. 3.3 and 3.4 the line number of the first instruction is 6, so 6 can be
subtracted from the line numbers to determine any instruction’s address within the
function and 6 can be added to any target to determine the line number of the target
location. In Fig. 3.4 the target of line 13 is 11 which corresponds to line 17. Looking
at Fig. 3.3 this corresponds to the line where label00 is defined. Likewise, the target
of the JUMP_FORWARD instruction in Fig. 3.4 is label01 which is defined on line
19. Subtracting 6, we expect to see 13 as the target PC address in the assembled code
of Fig. 3.4.
3.4 If-Then-Else Statements 63

Fig. 3.4 Assembled code

Consulting the CoCo BNF in Chap. 9, there can be multiple labels on one in-
struction. In addition, instruction addresses have nothing to do with which line they
are on. That only appears to be the case in Fig. 3.4 because the instructions are on
consecutive lines. But, adding blank lines to the program would do nothing to change
the instruction addresses. So, we could have a program like this where one instruc-
tion has two labels. These three instructions would be at three addresses within the
program even though there are four lines in the code.
1 onelabel: LOAD_FAST 1
2 STORE_FAST 2
3 twolabel:
4 threelabel: LOAD_GLOBAL 0

Labels can be composed of any sequence of letters, digits, underscores, or the @

character, but must start with a letter, underscore, or the @ character. They can be
any number of characters long.
In Fig. 3.3, lines 6–11 load the two values to be compared on the stack. The
COMPARE_OP instruction on line 12 has an argument of 4. Consulting the COM-
PARE_OP instruction in Chap. 9 reveals that a 4 corresponds to a greater than com-
parison. The comparison is done by calling the _ _gt_ _ magic method on the second
item from the top of the operand stack and passing it the top of the operand stack. The
64 3 Assembly Language

two operands are popped by the COMPARE_OP instruction and a boolean value,
either True or False, is pushed on the operand stack as the result.
The next instruction jumps to the target location if the value left on the operand
stack was False. Either way, the POP_JUMP_IF_FALSE instruction pops the top
value from the operand stack.
Take note of line 16 in Fig. 3.3. In assembly there is nothing like an if-then-
else statement. Line 15 is the end of the code that implements the then part of the
statement. Without line 16, CoCo would continue executing and would go right into
the else part of the statement. The JUMP_FORWARD instruction is necessary to
jump past the else part of the code if the then part was executed. Line 17 begins
the else code and line 18 is the last instruction of the if-then-else statement. The
label definition for label01 is still part of the if-then-else statement, but labels the
instruction immediately following the if-then-else statement.

Practice 3.2 Without touching the code that compares the two values, the
assembly in Fig. 3.4 can be optimized to remove at least three instructions.
Rewrite the code to remove at least three instructions from this code. With a
little more work, five instructions could be removed.
You can check your answer(s) in Section 3.16.2.

3.4.1 If-Then Statements

Frequently if-then statements are written without an else clause. For instance, this
program prints x if x is greater than y. In either case y is printed.
import disassembler

def main():
x = 5
y = 6
if x > y:
print(x)

print(y)

disassembler.disassemble(main)

Disassembling this code produces the program in Fig. 3.5. The code is very similar
to the code presented in Fig. 3.3. Line 13 once again jumps past the then part of the
program. Lines 14–17 contain the then code. Interestingly, line 18 jumps forward to
line 19. Comparing this to the code in Fig. 3.3 where the jump forward jumps past the
else part, the same happens in Fig. 3.5 except that there is no else part of the statement.
Some assembly languages do not have an equivalent to POP_JUMP_IF_FALSE.
Instead, only an equivalent to POP_JUMP_IF_TRUE is available. In that case, the
opposite of the condition can be tested and the jump will be executed if the opposite
is true, skipping over the then part. For instance, if testing for greater than is the
3.4 If-Then-Else Statements 65

Fig. 3.5 If-Then assembly

intent of the code, less than or equal to can be tested to jump around the then part
of an if-then-else statement.
Whether testing the original condition or the opposite, clearly the JUMP_
FORWARD is not needed in the code in Fig. 3.5. As was seen in Practice 3.1, the
Python compiler generated a wasteful instruction. It isn’t wrong to jump forward,
it’s just not needed. The convenience of writing in a language like Python far out-
weighs the inconvenience of writing in a language like CoCo assembly language, so
an extra instruction now and then is not that big a deal. In this case though, the Python
compiler could be written in such a way as to recognize when the extra instruction
is not needed.

Practice 3.3 Rewrite the code in Fig. 3.5 so it executes with the same result
using POP_JUMP_IF_TRUE instead of the jump if false instruction. Be sure to
optimize your code when you write it so there are no unnecessary instructions.
You can check your answer(s) in Section 3.16.3.
66 3 Assembly Language

3.5 While Loops

Consider this code which computes the Fibonacci number for the value stored in the
variable f. The sequence of Fibonacci numbers are computed by adding the previous
two numbers in the sequence together to get the next number. The sequence consists
of 1, 1, 2, 3, 5, 8, 13, 21, and so on, the eighth element of the sequence being 21.
1 import disassembler
2
3 def main():
4 f = 8
5 i = 1
6 j = 1
7 n = 1
8 while n < f:
9 n = n + 1
10 tmp = j
11 j = j + i
12 i = tmp
13
14 print("Fibonacci("+str(n)+") is",i)
15
16 disassembler.disassemble(main)

The CoCo assembly for this program implements the while loop of the Python
program using JUMP_ABSOLUTE and POP_JUMP_IF_FALSE instructions. Prior
to the loop, the SETUP_LOOP instruction’s purpose is not readily apparent. In
Python a loop may be exited using a break instruction. Using break inside a loop is not
a recommended programming style. A break is never needed. It is sometimes used as
a convenience. To handle the break instruction when it is executed there must be some
knowledge about where the loop ends. In the code in Fig. 3.6 the first instruction after
the loop is on line 33, where label02 is defined. The SETUP_LOOP instruction pushes
the address of that instruction on the block stack. If a break instruction is executed,
the block stack is popped and the PC is set to the popped instruction address.
Lines 15–18 of Fig. 3.6 implement the comparison of n < f similarly to the way if-
then-else comparisons are performed. The first line of this code is labeled with label00
because the end of the loop jumps back there to see if another iteration should be
performed. A while loop continues executing until the condition evaluates to False so
the POP_JUMP_IF_FALSE instruction jumps to label01 when the loop terminates.
The instruction at label01 labels the POP_BLOCK instruction. This instruction
is needed if the loop exits normally, not as the result of a break statement. The block
stack is popped, removing the loop exit point from it. When exiting as a result of
a break, execution jumps to the instruction at line 33, skipping the POP_BLOCK
instruction since the break statement already popped the block stack.
An important thing to notice is that a while loop and an if-then-else statement
are implemented using the same instructions. There is no special loop instruction
in assembly language. The overall flow of a while loop is a test before the body of
the loop corresponding to the while loop condition. If the loop condition is not met,
execution jumps to the next instruction after the loop. After the body of the loop a
jump returns execution to the while loop condition code to check if another iteration
3.5 While Loops 67

Fig. 3.6 While loop assembly

68 3 Assembly Language

of the body will be performed. This idiom, or pattern of instructions, is used to im-
plement loops and similar patterns are used for loops in other assembly languages
as well.

Practice 3.4 Write a short program that tests the use of the BREAK_LOOP
instruction. You don’t have to write a while loop to test this. Simply write some
code that uses a BREAK_LOOP and prints something to the screen to verify
that it worked.
You can check your answer(s) in Section 3.16.4.

3.6 Exception Handling

Exception handling occurs in Python within a try-except statement. Statements within

the try block are executed and if an exception occurs execution jumps to the except
block of statements. If main were called on the Python program given here, any error
condition would send it to the except block which simply prints the exception in this
case. The except block is only executed if there is an error in the try block. Errors
that could occur in this program would be a conversion error for either of the two
floating point number conversions or a division by zero error. The code catches an
exception if a zero is entered for the second value.

1 import disassembler
2

3 def main():
4 try:
5 x = float(input("Enter a number: "))
6 y = float(input("Enter a number: "))
7 z = x / y
8 print(x,"/",y,"=",z)
9 except Exception as ex:
10 print(ex)
11

12 disassembler.disassemble(main)

Implementing exception handling in CoCo is similar in some ways to implement-

ing the BREAK_LOOP instruction. The difference is that the exception causes the
program to jump from one place to the next instead of the BREAK_LOOP instruc-
tion. Both exception handling and the break instruction make use of the block stack.
When a loop is entered, the SETUP_LOOP instruction pushes the exit point of the
loop onto the block stack; the exit point being an integer referring to the address of
the first instruction after the loop.
To distinguish between loop exit points and exception handling, the SETUP_
EXCEPT instruction pushes the negative of the except handler’s address (i.e.
3.6 Exception Handling 69

-1*address). So a negative number on the block stack refers to an exception han-

dler while a positive value refers to a loop exit point. In the code in Fig. 3.7 the
exception handler’s code begins at label00.
The try block code begins on line 7 with the SETUP_EXCEPT. This pushes
the handler’s address for label00 on the block stack which corresponds to a −27.
Execution proceeds by getting input from the user, converting the input to floats,
doing the division, and printing the result. The print completes on line 31 where
None, which is returned by print, is popped from the operand stack.
If execution makes it to the end of the try block, then no exception occurred and
line 32 pops the −27 from the block stack, ending the try block. Line 33 jumps past
the end of the except block.
If an exception occurs, three things are pushed onto the operand stack before any
handling of the exception occurs. The traceback is pushed first. The traceback is a
copy of the run-time stack containing each function call and the stored PC of all
pending functions including the current function’s stack frame and PC. Above the
traceback there are two copies of the exception object pushed on the operand stack
when an exception occurs.
If an exception occurs in the try block, CoCo consults the block stack and pops
values until a negative address is found corresponding to some except block. Multiple
try-except statements may be nested, so it is possible that the block stack will contain
more than one negative address. When a negative address is found, the PC is set to
its positive value causing execution to jump to the except block. In Fig. 3.7, that’s
line 34. The traceback and two copies of the exception are pushed onto the stack
prior to line 34 being executed.
Why are three objects pushed on the operand stack when an exception occurs?
Python’s RAISE_VARARGS instruction describes the contents of the operand stack
as TOS2 containing the traceback, TOS1 the parameter, and TOS the exception
object. In the CoCo implementation the parameter to an exception can be retrieved
by converting the exception to a string, so the object at TOS1 is simply the exception
again. For the sake of compatibility with the Python disassembler CoCo pushes three
operands pushed onto the operand stack when an exception is raised.
Exception handlers in Python may be written to match only certain types of ex-
ceptions. For instance, in Python a division by zero exception is different than a float
conversion error. The CoCo virtual machine currently only has one type of exception,
called Exception. It is possible to extend CoCo to support other types of exceptions,
but currently there is only one type of exception object that can be created. The argu-
ment to the exception object can be anything that is desired. The program in Fig. 3.7
is written to catch any type of exception, but it could be written to catch only a certain
type of exception. Line 34 duplicates the exception object on the top of the operand
stack. Line 35 loads a global Exception object onto the stack. The COMPARE_OP
10 instruction compares the exception using the exception match comparison which
calls the _ _excmatch_ _ magic method to see if there is a match between the thrown
exception and the specified pattern. If there is not a match, line 37 jumps to the end of
70 3 Assembly Language

Fig. 3.7 Exception handling assembly

3.6 Exception Handling 71

the except block. The END_FINALLY instruction on line 54 detects if the exception
was handled and if not, it re-throws the exception for some outer exception handling
block.
If the exception was a match, execution of the handler code commences as it does
on line 38 of the program. The top of the operand stack contains the extra exception
object so it is thrown away by line 38. Line 39 takes the remaining exception object
and makes the ex reference point to it. Line 40 pops the traceback from the operand
stack.
Should an exception occur while executing an exception handler, then CoCo must
clean up from the exception. Line 41 executes the SETUP_FINALLY instruction to
push another block stack record to keep track of the end of the exception handler.
Lines 42–45 print the exception named ex in the code.
Line 46 pops the exit address that was pushed by the SETUP_FINALLY instruc-
tion. The POP_EXCEPT instruction on line 47 then pops the block stack address
for the exception handler exit address. Line 48 pushes a None on the operand stack.
Line 49 is either the next instruction executed or it is jumped to as a result of an
exception while executing the handler code for the previous exception. Either way,
the ex variable is made to refer to None. The DELETE_FAST instruction doesn’t
appear to do much in this code. It is generated by the disassembler, but appears to
delete None which doesn’t seem to need to be done.
The last instruction of the handler code, the END_FINALLY instruction checks
to see if the exception was handled. In this case, it was handled and the instruction
does nothing. If execution jumps to line 54 then the exception handler did not match
the raised exception and therefore the exception is re-raised. Line 55 wraps up by
setting up to return None from the main function.

Practice 3.5 Write a short program that tests creating an exception, raising
it, and printing the handled exception. Write this as a CoCo program without
using the disassembler.
You can check your answer(s) in Section 3.16.5.

3.7 List Constants

Building a compound value like a list is not too hard. To build a list constant using
CoCo you push the elements of the list on the operand stack in the order you want
them to appear in the list. Then you call the BUILD_LIST instruction. The argument
to the instruction specifies the length of the list. This code builds a list and prints it
to the screen.
72 3 Assembly Language

Fig. 3.8 Assembly for building a list

1 import disassembler
2

3 def main():
4 lst = ["hello","world"]
5 print(lst)
6

7 disassembler.disassemble(main)

The assembly language program in Fig. 3.8 builds a list with two elements:
[‘hello’, ‘world’]. Lines 6 and 7 push the two strings on the operand stack. Line
8 pops the two operands from the stack, builds the list object, and pushes the result-
ing list on the operand stack. Python defines the _ _str_ _ magic method for built-in
type of value, which is called on the list on line 12.
If you run this program using the CoCo interpreter you will notice that [‘hello’,
‘world’] is not printed to the screen. Instead, [hello, world] is printed. This is because
currently the _ _str_ _ method is called on each element of the list to convert it to
a string for printing. This is not the correct method to call. Instead, the _ _repr_ _
magic method should be called which returns a printable representation of the value
retaining any type information. In the next chapter there will be an opportunity to
fix this.

3.8 Calling a Method

Calling functions like print and input was relatively simple. Push the function name
followed by the arguments to the function on the operand stack. Then, call the
function with the CALL_FUNCTION instruction. But, how about methods? How
3.8 Calling a Method 73

Fig. 3.9 Assembly for calling a method

does a method like split get called on a string? Here is a program that demonstrates
how to call split in Python.
1 import disassembler
2

3 def main():
4 s = input("Enter list of integers:")
5 lst = s.split()
6

7 print(lst)
8

9 disassembler.disassemble(main)

Line 6 of the assembly language code in Fig. 3.9 prepares to call the input function
by loading the name input onto the operand stack. Line 7 loads the argument to
input, the prompt string. Line 8 calls the input function leaving the entered text on
the operand stack. Calling split is done similarly.
In this Python code the syntax of calling input and split is quite different. Python
sees the difference and uses the LOAD_ATTR instruction in the assembly language
instructions to get the split attribute of the object referred to by s. Line 10 loads the
object referred to by s on the stack. Then line 11 finds the split attribute of that object.
Each object in CoCo and Python contains a dictionary of all the object’s attributes.
This LOAD_ATTR instruction examines the dictionary and with the key found in the
globals list at the operands index. It then loads that attribute onto the operand stack.
The CALL_FUNCTION instruction then calls the method that was located with the
LOAD_ATTR instruction.
74 3 Assembly Language

The STORE_ATTR instruction stores an attribute in an object in much the same

way that an attribute is loaded. CoCo does not presently support the STORE_ATTR
instruction but could with relatively little effort. The ability to load and store object
attributes means that CoCo could be used to implement an object-oriented language.
This makes sense since Python is an object-oriented language.

Practice 3.6 Normally, if you want to add to numbers together in Python,

like 5 and 6, you write 5 + 6. This corresponds to using the BINARY_ADD
instruction in CoCo which in turn calls the magic method _ _add_ _ with the
method call 5._ _add_ _(6). Write a short CoCo program where you add two
integers together without using the BINARY_ADD instruction. Print the result
to the screen.
You can check your answer(s) in Section 3.16.6.

3.9 Iterating Over a List

Iterating through a sequence of any sort in CoCo requires an iterator. There are
iterator objects for every type of sequence: lists, tuples, strings, and other types of
sequences that have yet to be introduced. Here is a Python program that splits a string
into a list of strings and iterates over the list.
1 from disassembler import *
2
3 def main():
4 x = input("Enter a list: ")
5 lst = x.split()
6
7 for b in lst:
8 print(b)
9
10 disassemble(main)

Lines 6–8 of the assembly code in Fig. 3.10 gets an input string from the user,
leaving it on the operand stack. Line 9 stores this in the variable x. Lines 10–12 call
the split method on this string, leaving a list object on the top of the operand stack.
The list contains the list of space separated strings from the original string in x. Line
13 stores this list in the variable lst.
Line 14 sets up the exit point of a loop as was covered earlier in this chapter. Line
15 loads the lst variable onto the operand stack. The GET_ITER instruction creates
an iterator with the top of the operand stack. The lst is popped from the operand
stack during this instruction and the resulting iterator is pushed onto the stack.
An iterator has a _ _next_ _ magic method that is called by the FOR_ITER instruc-
tion. When FOR_ITER executes the iterator is popped from the stack, _ _next_ _ is
called on it, and the iterator and the next value from the sequence are pushed onto
3.9 Iterating Over a List 75

Fig. 3.10 List iteration assembly

the operand stack. The iterator is left below the next value in the sequence at TOS1.
When _ _next_ _ is called on the iterator and there are no more elements left in the
sequence, the PC is set to the label of the FOR_ITER instruction, ending the loop.
When the loop is finished the block stack is popped to clean up from the loop.
Line 25 loads the None on the stack before returning from the main function.

Practice 3.7 Write a CoCo program that gets a string from the user and iterates
over the characters of the string, printing them to the screen.
You can check your answer(s) in Section 3.16.7.

3.10 Range Objects and Lazy Evaluation

Indexing into a sequence is another way to iterate in a program. When you index
into a list, you use a subscript to retrieve an element of the list. Generally, indices
are zero-based. So the first element of a sequence is at index 0, the second at index
1, and so on.
76 3 Assembly Language

There are two versions of Python in use today. Version 2, while older is still widely
used because there are many Python programs that were written using it and there is a
cost to converting them to use Python 3. Python 3 was created so new features could
be added that might be incompatible with the older version. One difference was in
the range function. In Python 2, the range function generated a list of integers of the
specified size and values. This is inefficient because some ranges might consist of
millions of integers. A million integers takes up a lot of space in memory and takes
some time to generate. In addition, depending on how code is written, not all the
integers in a range may be needed. These problems are a result of eager evaluation of
the range function. Eager evaluation is when an entire sequence is generated before
any element of the sequence will actually be used. In Python 2 the entire list of
integers is created as soon as the range function is called even though the code can
only use one integer at a time.
Python 3 has dealt with the eager evaluation of the range function by defining a
range object that is lazily evaluated. This means that when you call the range function
to generate a million integers, you don’t get any of them right away. Instead, you get
a range object. From the range object you can access an iterator. When _ _next_ _
is called on an iterator you get the next item in the sequence. When _ _next_ _ is
called on a range object iterator you get the next integer in the range’s sequence.
Lazy evaluation is when the next value in a sequence is generated only when it is
ready to be used and not before. This code creates a range object. The range object
is designed to provide lazy evaluation of integer sequences.

1 from disassembler import *

3 def main():
4 x = input("Enter list: ")
5 lst = x.split()
6

7 for i in range(len(lst)-1,-1,-1):
8 print(lst[i])
9

10 disassemble(main)

This Python code uses indices to iterate backwards through a list. In this case an
iterator over the range object yields a descending list of integers which are the indices
into the list of values entered by the user. If the use enters four space separated values,
then the range object will yield the sequence [3, 2, 1, 0]. The first argument to range
is the start value, the second is one past the stop value, and the third argument is the
increment. So the sequence in the Python code in Sect. 3.10 is a descending sequence
that goes down one integer at a time from the length of the list minus one to zero.
The CoCo assembly code in Fig. 3.11 implements this same program. Lines 15–23
set up for calling the range function with the three integer values. Lines 15–20 call
the len function to get the length of the list and subtract one. Lines 21 and 22 put
two −1 values on the operand stack. Line 23 calls the range function which creates
and pushes a range object onto the operand stack as its result.
3.10 Range Objects and Lazy Evaluation 77

Fig. 3.11 Range assembly

Line 24 creates an iterator for the range object. As described in the last section,
the FOR_ITER instruction calls the _ _next_ _ magic method on the iterator to get the
next integer in the range’s sequence. The lazy evaluation occurs because an iterator
keeps track of which integer is the next value in the sequence. Line 26 stores the next
integer in the variable i.
The BINARY_SUBSCR instruction is an instruction that has not been encountered
yet in this chapter. Line 28 loads the list called lst onto the operand stack. Line 29
loads the value of i onto the operand stack. The BINARY_SUBSCR instruction indexes
into lst at position i and pushes the value found at that position onto the operand
stack. That value is printed by the print function call on line 31 of the program.
78 3 Assembly Language

Lazy evaluation is an important programming language concept. If you ever find

yourself writing code that must generate a predictable sequence of values you prob-
ably want to generate that sequence lazily. Iterators, like range iterators, are the
means by which we can lazily access a sequence of values and range objects define
a sequence of integers without eagerly generating all of them.

3.11 Functions and Closures

Up to this point in the chapter all the example programs have been defined in a
main function. CoCo supports the definition of multiple functions and even nested
functions. Here is a Python program that demonstrates how to write nested functions
in the Python programming language. The main function calls the function named
f which returns the function g nested inside the f function. The g function returns
x. This program demonstrates nested functions in CoCo along with how to build a
closure.

1 def main():
2 x = 10
3 def f(x):
4 def g():
5 return x
6 return g
7 print(f(3)())
8 #main()
9 disassembler.disassemble(main)

Notice the Python code in Sect. 3.11 calls the disassembler on the top-level func-
tion main. It is not called on f or g because they are nested inside main and the
disassembler automatically disassembles any nested functions of a disassembled
function.
The format of the corresponding CoCo program in Fig. 3.12 is worth noting as
well. The top level main function is defined along the left hand side. Indentation has
no effect on CoCo but visually you see that f is nested inside main. The function g
is nested inside f because it appears immediately after the first line of the definition
of f on line 3. The rest of the definition of f starts again on line 10 and extends to
line 21. The definition of g starts on line 3 and extends to line 9.
The number of arguments for each function is given by the integer after the slash.
The f/1 indicates that f expects one argument. The main and g functions expect zero
arguments. These values are used during a function call to verify that the function is
called with the required number of arguments.
Examine the Python code in Sect. 3.11 carefully. The main function calls the
function f which returns the function g. Notice that f returns g, it does not call g. In the
print statement of main the function f is called, passing 3 to the function that returns
g. The extra set of parens after the function call f (3) calls g. This is a valid Python
3.11 Functions and Closures 79

Fig. 3.12 Nested functions assembly

program, but not a common one. The question is: What does the program print?
There are two possible choices it seems: either 10 or 3. Which seems more likely?
On the one hand, g is being called from the main function where x is equal to 10. If
the program printed 10, we would say that Python is a dynamically scoped language,
meaning that the function executes in the environment in which it is called. Since g is
called from main the value of x is 10 and in a dynamically scoped language 10 would
be printed. The word dynamic is used because if g were called in another environment
80 3 Assembly Language

it may return something completely different. We can only determine what g will
return by tracing the execution of the program to the point where g is called.
On the other hand, g was defined in the scope of an x whose value was 3. In that
case, the environment in which g executes is the environment provided by f. If 3
is printed then Python is a statically scoped language meaning that we need only
understand what the environment contained when g was defined, not when it was
called. In a statically scoped language this specific instance of g will return the same
value each and every time it is called, not matter where it is called in the program.
The value of x is determined when g is defined.
Dynamically scoped languages are rare. Lisp, when it was first defined, was dy-
namically scoped. McCarthy quickly corrected that and made Lisp a statically scoped
language. It is interesting to note that Emacs Lisp is dynamically scoped. Python is
statically scoped as are most modern programming languages.
To execute functions in a statically scoped language, two pieces are needed when
a function may return another function. To execute g not only is the code for g re-
quired, but so also is the environment in which this instance of g was defined. A
closure is formed. A closure is the environment in which a function is defined and
the code for the function itself. This closure is what is called when the function g is
finally called in main.
Take a look at the CoCo code for this program in Fig. 3.12. Line 14 begins creating
a new closure object in the body of function f by loading the cell variable named x
onto the stack. A cell variable is an indirect reference to a value. Figure 3.13 depicts
what is happening in the program just before the x is returned in the function g. A
variable in Python, like Java and many other languages, is actually a reference that
points to a value. Values exist on the heap and are created dynamically as the program
executes. When a variable is assigned to a new value, the variables reference is made
to point to a new value on the heap. The space for values on the heap that are no
longer needed is reclaimed by a garbage collector that frees space on the heap so
it can be re-used. In Fig. 3.13 there are three values on the heap, a 10, a 3, and one
other value called a cell in CoCo and the Python virtual machine.
Because the function g needs access to the variable x outside the function f, the
3 is indirectly referenced through a cell variable. The LOAD_CLOSURE instruction
pushes that cell variable onto the stack to be used in the closure. Since only one
value is needed from the environment, the next instruction on line 15 builds a tuple
of all the values needed from the environment. Line 16 loads the code for g onto the
stack. Line 17 forms the closure by popping the function and the environment from
the stack and building a closure object.
The variable x is a local variable for the function f. But, because x is referenced
in g and g is nested inside f, the variable x is also listed as a cell variable in f. A
cell variable is an indirect reference to a value. This means there is one extra step to
finding the value that x refers to. We must go through the cell to get to the 3.
The LOAD_DEREF instruction on line 7 is new. A LOAD_DEREF loads the
value that is referenced by the reference pointed to in the list of cellvars. So, this
instructions pushes the 3 onto the operand stack. Finally, line 35 calls the closure
consisting of the function and its data.
3.11 Functions and Closures 81

Fig. 3.13 Execution of nested.casm

In the function g the freevars refer to the tuple of references in the closure that was
just called, so the first instruction, the LOAD_DEREF, loads the 3 onto the operand
stack. Figure 3.13 depicts this state right before the RETURN_VALUE instruction is
executed.
To finish up the execution of this program a 3 is returned from the call to g and
its frame is popped from the run-time stack. Control returns to main where the 3 is
printed. After returning from main its frame is also popped from the run-time stack
which ends the program.

Practice 3.8 The program in Fig. 3.12 would work just fine without the cell.
The variable x could refer directly to the 3 in both the f and g functions without
any ramifications. Yet, a cell variable is needed in some circumstances. Can
you come up with an example where a cell variable is absolutely needed?
You can check your answer(s) in Section 3.16.8.
82 3 Assembly Language

3.12 Recursion

Functions in CoCo can call themselves. A function that calls itself is a recursive func-
tion. Recursive functions are studied in some detail in Chap. 5 of this text. Learning
to write recursive functions well is not hard if you follow some basic rules. The
mechanics of writing a recursive function include providing a base case that comes
first in the function. Then, the solution to the problem you are solving must be solved
by calling the same function on some smaller piece of data while using that result to
construct a solution to the bigger problem.
Consider the factorial definition. Factorial of zero, written 0!, is defined to be
1. This is the base case. For integer n greater than 0, n! = n*(n − 1)!. This is a
recursive definition because factorial is defined in terms of itself. It is called on
something smaller, meaning n−1 which is closer to the base case, and the result is
used in computing n!. Here is a Python program that computes 5!.

1 import disassembler
2

3 def factorial(n):
4 if n==0:
5 return 1
6

7 return n*factorial(n-1)
8

9 def main():
10 print(factorial(5))
11

12 disassembler.disassemble(factorial)
13 disassembler.disassemble(main)

The CoCo implementation of this program is given in Fig. 3.14. The program
begins in main by loading 5 on the operand stack and calling the factorial function.
The result is printed to the screen with the print function.
Calling factorial jumps to the first instruction of the function where n is loaded
onto the operand stack, which in this case is 5. Lines 7–8 compare n to 0 and if the two
values are equal, 1 is returned. Notice that the RETURN_VALUE instruction appears
in the middle of the factorial function in this case. A return instruction doesn’t have
to appear at the end of a function. It can appear anywhere it makes sense and in this
case, it makes sense to return from the base case as soon as soon as possible.
The code from label00 forward is the recursive case since otherwise we would
have returned already. The code subtracts one from n and calls factorial with that
new, smaller value. Notice that the recursive function call is identical to any other
function call. Finally, after the function call the result of the call is on the operand
stack and it is multiplied by n to get n! which is returned.
Because this is a recursive function, the preceding two paragraphs are repeated
5 more times, each time reducing n by 1. The program continues to count down
until 1 is returned for the factorial of 0. At its deepest, there are 7 stack frames on
the run-time stack for this program: one for the main function, and six more for the
3.12 Recursion 83

Fig. 3.14 Execution of fact.casm

recursive factorial function calls. The run-time stack grows to 7 stack frames deep
when the base case is executed and then shrinks again as the recursion unwinds.
Finally, when the program returns to main, 120 is printed to the screen.

Practice 3.9 Draw a picture of the run-time stack just before the instruction
on line 11 of Fig. 3.14 is executed. Use Fig. 3.13 as a guide to how you draw
this picture. Be sure to include the code, the values of n, and the PC values.
You can check your answer(s) in Section 3.16.9.
84 3 Assembly Language

3.13 Chapter Summary

An understanding of assembly language is key to learning how higher level pro-

gramming languages work. This chapter introduced assembly language program-
ming through a series of examples, drawing parallels between Python and Python
virtual machine or CoCo instructions. The use of a disassembler was key to gaining
this insight and is a great tool to be able to use with any platform.
Most of the key constructs of programming languages were presented as both
Python programs and CoCo programs. The obvious omission was in the area of
object-oriented programming and class definitions. Python is an object-oriented lan-
guage but the disassembler does not currently support disassembling object-oriented
Python programs. This would be a great project if someone were interested in ex-
tending the disassembler.
The assembly language covered in this chapter comes up again in Chaps. 4 and 6.
Chapter 4 covers the implementation of the CoCo virtual machine and Chap. 6
implements a high-level functional language compiler that produces CoCo assembly
language programs.
CoCo is an assembler/virtual machine for Python virtual machine instructions.
Of course, there are other assembly languages. MIPS is a CPU architecture that has
wide support for writing assembly language programs including a MIPS simulator
called SPIM. In fact, assemblers are available for pretty much any hardware/operating
system combination in use today. Intel/Linux, Intel/Windows, Intel/Mac OS X all
support assembly language programming. The Java Virtual Machine can be pro-
grammed with the instructions of the JVM using a java assembler called Jasmin.
Assembly language is the fundamental language that all higher level programming
languages use in their implementations.

3.14 Review Questions

1. How do the Python virtual machine and CoCo differ? Name three differences
between the two implementations.
2. What is a disassembler?
3. What is an assembler?
4. What is a stack frame? Where are they stored? What goes inside a stack frame?
5. What is the purpose of the block stack and where is it stored?
6. What is the purpose of the Program Counter?
7. Name an instruction that is responsible for creating a list object and describe
how it works.
8. Describe the execution of the STORE_FAST and LOAD_FAST instructions.
9. How can CoCo read a line of input from the keyboard?
3.14 Review Questions 85

10. What is the difference between a disassembled Python program and an assembled
CoCo program? Provide a short example and point out the differences.
11. When a Python while loop is implemented in CoCo, what is the last instruction
of the loop and what is its purpose?
12. What do exception handling and loops have in common in the CoCo implemen-
tation?
13. What is lazy evaluation and why is it important to Python and CoCo?
14. What is a closure and why are closures needed?

3.15 Exercises

1. Consulting the CoCo assembly language program in the solution to Practice

Problem 3.2, provide the contents of the operand stack after each instruction is
executed.
2. Write a CoCo program which reads an integer from the user and then creates a
list of all the even numbers from 0 up to and including that integer. The program
should conclude printing the list to the screen. Test your program with CoCo to
be sure it works.
3. Add some exception handling to the previous exercise to print “You didn’t enter
an integer!” if the user fails to enter an integer in their program.
4. Using a range object, write a CoCo program that computes the sum of the first n
integers where the non-negative n is read from the user.
5. Write a recursive CoCo program that adds up the first n numbers where n is
read from the user. Remember, there must be a base case that comes first in this
function and the recursive case must be called on something smaller which is
used in computing the solution to the whole problem.

3.16 Solutions to Practice Problems

3.16.1 Solution to Practice Problem 3.1

The assembly code in Fig. 3.2 blindly pops the None at the end and then pushes
None again before returning from main. This can be eliminated resulting in two
fewer instructions. This would also mean that None is not needed in the constants,
but this was not eliminated below.
86 3 Assembly Language

Function: main/0
Constants: None,
"Enter your name: ", "Enter your age: ",
", a year from now you will be",
1, "years old."
Locals: name, age
Globals: input, int, print
BEGIN
LOAD_GLOBAL 0
LOAD_CONST 1
CALL_FUNCTION 1
STORE_FAST 0
LOAD_GLOBAL 1
LOAD_GLOBAL 0
LOAD_CONST 2
CALL_FUNCTION 1
CALL_FUNCTION 1
STORE_FAST 1
LOAD_GLOBAL 2
LOAD_FAST 0
LOAD_CONST 3
BINARY_ADD
LOAD_FAST 1
LOAD_CONST 4
BINARY_ADD
LOAD_CONST 5
CALL_FUNCTION 3
RETURN_VALUE
END

3.16.2 Solution to Practice Problem 3.2

As in Practice 3.1 the POP_TOP and LOAD_CONST from the end can be eliminated.
In the if-then-else code both the then part and the else part execute exactly the same
STORE_FAST instruction. That can be moved after the if-then-else code and written
just once, resulting in one less instruction and three less overall. Furthermore, if we
move the LOAD_GLOBAL for the call to print before the if-then-else statement, we
can avoid storing the maximum value in z at all and just leave the result on the top
of the operand stack: either x or y. By leaving the bigger of x or y on the top of the
stack, the call to print will print the correct value. This eliminates five instructions
from the original code.
Function: main/0
Constants: None, 5, 6
Locals: x, y
Globals: print
BEGIN
LOAD_CONST 1
STORE_FAST 0
LOAD_CONST 2
STORE_FAST 1
LOAD_GLOBAL 0
LOAD_FAST 0
LOAD_FAST 1
COMPARE_OP 4
3.16 Solutions to Practice Problems 87

POP_JUMP_IF_FALSE label00
LOAD_FAST 0
JUMP_FORWARD label01
label00: LOAD_FAST 1
label01: CALL_FUNCTION 1
RETURN_VALUE
END

It is worth noting that the code above is exactly the disassembled code from this
Python program.
import disassembler%

def main():
x = 5
y = 6
print(x if x > y else y)

disassembler.disassemble(main)

When main is called, this code prints the result of a conditional expression. The
if-then-else expression inside the print statement is different than an if-then-else state-
ment. An if-then-else statement updates a variable or has some other side-effect. An
if-then-else expression, or conditional expression as it is called in Python documen-
tation, yields a value: either the then value or the else value. In the assembly language
code we see that the yielded value is passed to the print function as its argument.

3.16.3 Solution to Practice Problem 3.3

Function: main/0
Constants: None, 5, 6
Locals: x, y
Globals: print
BEGIN
LOAD_CONST 1
STORE_FAST 0
LOAD_CONST 2
STORE_FAST 1
LOAD_FAST 0
LOAD_FAST 1
COMPARE_OP 1
POP_JUMP_IF_TRUE label00
LOAD_GLOBAL 0
LOAD_FAST 0
CALL_FUNCTION 1
POP_TOP
label00: LOAD_GLOBAL 0
LOAD_FAST 1
CALL_FUNCTION 1
RETURN_VALUE
END
88 3 Assembly Language

3.16.4 Solution to Practice Problem 3.4

The following code behaves differently if the BREAK_LOOP instruction is removed

from the program.
Function: main/0
Constants: None, 7, 6
Locals: x, y
Globals: print
BEGIN
SETUP_LOOP label01
LOAD_CONST 1
STORE_FAST 0
LOAD_CONST 2
STORE_FAST 1
LOAD_FAST 0
LOAD_FAST 1
COMPARE_OP 1
POP_JUMP_IF_TRUE label00
BREAK_LOOP
LOAD_GLOBAL 0
LOAD_FAST 0
CALL_FUNCTION 1
POP_TOP
label00: POP_BLOCK
label01: LOAD_GLOBAL 0
LOAD_FAST 1
CALL_FUNCTION 1
RETURN_VALUE
END

3.16.5 Solution to Practice Problem 3.5

This is the hello world program with exception handling used to raise and catch
an exception. This solution does not include code for finally handling in case an
exception happened while handling the exception. It also assumes the exception will
match when thrown since CoCo only supports one type of exception.
Function: main/0
Constants: None, "Hello World!"
Locals: ex
Globals: Exception, print
BEGIN
SETUP_EXCEPT label00
LOAD_GLOBAL 0
LOAD_CONST 1
CALL_FUNCTION 1
RAISE_VARARGS 1
POP_BLOCK
JUMP_FORWARD label01
label00: LOAD_GLOBAL 1
ROT_TWO
CALL_FUNCTION 1
POP_TOP
POP_EXCEPT
label01: LOAD_CONST 0
RETURN_VALUE
END
3.16 Solutions to Practice Problems 89

3.16.6 Solution to Practice Problem 3.6

This program adds 5 and 6 together using the _ _add_ _ magic method associated
with integer objects. First 5 is loaded onto the operand stack. Then LOAD_ATTR
is used to load the _ _add_ _ of the 5 object onto the stack. This is the func-
tion. The argument to _ _add_ _ is loaded next which is the 6. The 6 is loaded
by the LOAD_CONST instruction. Then _ _add_ _ is called with one argument.
The 11 is left on the operand stack after the function call. It is stored in x, the
print is loaded, x is loaded onto the operand stack, and print is called to print the
value. Since print leaves None on the stack, that value is returned from the main
function.

Function: main/0
Constants: None, 5, 6
Locals: x
Globals: _ _add_ _, print
BEGIN

LOAD_CONST 1
LOAD_ATTR 0
LOAD_CONST 2
CALL_FUNCTION 1
STORE_FAST 0
LOAD_GLOBAL 1
LOAD_FAST 0
CALL_FUNCTION 1
RETURN_VALUE
END

3.16.7 Solution to Practice Problem 3.7

Function: main/0
Constants: None, "Enter a string: "
Locals: x, a
Globals: input, print
BEGIN
LOAD_GLOBAL 0
LOAD_CONST 1
CALL_FUNCTION 1
STORE_FAST 0
SETUP_LOOP label02
LOAD_FAST 0
GET_ITER
label00: FOR_ITER label01
STORE_FAST 1
LOAD_GLOBAL 1
LOAD_FAST 1
CALL_FUNCTION 1
POP_TOP
JUMP_ABSOLUTE label00
label01: POP_BLOCK
label02: LOAD_CONST 0
RETURN_VALUE
END
90 3 Assembly Language

3.16.8 Solution to Practice Problem 3.8

A cell variable is needed if an inner function makes a modification to a variable that

is located in the outer function. Consider the CoCo program below. Without the cell
the program below would print 10 to the screen and with the cell it prints 11. Why is
that? Draw the run-time stack both ways to see what happens with and without the
cell variable.

Function: f/1
Function: g/1
Constants: None, 1
Locals: y
FreeVars: x
BEGIN
LOAD_DEREF 0
LOAD_CONST 1
BINARY_ADD
STORE_DEREF 0
LOAD_DEREF 0
LOAD_FAST 0
BINARY_ADD
RETURN_VALUE
END
Constants: None, code(g)
Locals: x, g
CellVars: x
BEGIN
LOAD_CLOSURE 0
BUILD_TUPLE 1
LOAD_CONST 1
MAKE_CLOSURE 0
STORE_FAST 1
LOAD_FAST 1
LOAD_DEREF 0
CALL_FUNCTION 1
LOAD_DEREF 0
BINARY_ADD
RETURN_VALUE
END
Function: main/0
Constants: None, 3
Globals: print, f
BEGIN
LOAD_GLOBAL 0
LOAD_GLOBAL 1
LOAD_CONST 1
CALL_FUNCTION 1
CALL_FUNCTION 1
POP_TOP
LOAD_CONST 0
RETURN_VALUE
END

Interestingly, this program cannot be written in Python. The closest Python equiv-
alent of this program is given below. However, it is not the equivalent of the program
written above. In fact, the program below won’t even execute. There is an error on
3.16 Solutions to Practice Problems 91

the line x = x + 1. The problem is that as soon as Python sees x = in the function g,
it decides there is another x that is a local variable in g. But, then x = x + 1 results
in an error because x in g has not yet been assigned a value.
def f(x):
def g(y):
x = x + 1
return x + y

return g(x) + x

def main():
print(f(3))

main()

Fig. 3.15 Execution of fact.casm

92 3 Assembly Language

3.16.9 Solution to Practice Problem 3.9

A couple things to notice in Fig. 3.15. The run-time stack contains one stack frame for
every function call to factorial. Each of the stack frames, except the one for the main
function, point at the factorial code. While there is only one copy of each function’s
code, there may be multiple stack frames executing the code. This happens when a
function is recursive. There also multiple n values, one for each stack frame. Again
this is expected in a recursive function.