Software II: Principles of Programming Languages: Some Basic Definitions
Software II: Principles of Programming Languages: Some Basic Definitions
Programming Languages
Lecture 6 – Data Types
Enumeration Types
• All possible values, which are named
constants, are provided in the definition
• C# example
– enum days {mon, tue, wed, thu, fri, sat, sun};
Enumeration Types
• Design issues
– Is an enumeration constant allowed to appear in
more than one type definition, and if so, how is
the type of an occurrence of that constant
checked?
– Are enumeration values coerced to integer?
– Any other type coerced to an enumeration type?
Design issues
• Is an enumeration constant allowed to
appear in more than one type definition, and
if so, how is the type of an occurrence of
that constant checked?
• Are enumeration values coerced to integer?
• Any other type coerced to an enumeration
type?
Evaluation of Enumerated Type
Subrange Types
• An ordered contiguous subsequence of an
ordinal type
– Example: 12..18 is a subrange of integer type
• Ada’s design
type Days is (mon, tue, wed, thu, fri, sat,
sun);
subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;
Day1: Days;
Day2: Weekday;
Day2 := Day1;
Subrange Evaluation
• Aid to readability
– Make it clear to the readers that variables of
subrange can store only certain range of values
• Reliability
– Assigning a value to a subrange variable that is
outside the specified range is detected as an
error
Implementation of User-Defined
Ordinal Types
• Enumeration types are implemented as
integers
• Subrange types are implemented like the
parent types with code inserted (by the
compiler) to restrict assignments to
subrange variables
Array Types
• An array is a homogeneous aggregate of
data elements in which an individual
element is identified by its position in the
aggregate, relative to the first element.
Array Indexing
• Indexing (or subscripting) is a mapping
from indices to elements
array_name (index_value_list) → an element
• Index Syntax
– Fortran and Ada use parentheses
• Ada explicitly uses parentheses to show uniformity
between array references and function calls because
both are mappings
– Most other languages use brackets
Arrays Index (Subscript) Types
• FORTRAN, C: integer only
• Ada: integer or enumeration (includes
Boolean and char)
• Java: integer types only
Heterogeneous Arrays
• A heterogeneous array is one in which the
elements need not be of the same type
• Supported by Perl, Python, JavaScript, and
Ruby
Array Initialization
• C-based languages
– int list [] = {1, 3, 5, 7}
– char *names [] = {″Mike″, ″Fred″, ″Mary
Lou″};
• Ada
– List : array (1..5) of Integer :=
(1 => 17, 3 => 34, others => 0);
Array Initialization
• Python
– List comprehensions
list = [x ** 2 for x in range(12) if x % 3 == 0]
puts [0, 9, 36, 81] in list
Arrays Operations
• APL provides the most powerful array processing
operations for vectors and matrixes as well as unary
operators (for example, to reverse column elements)
• Ada allows array assignment but also catenation
• Python’s array assignments, but they are only reference
changes. Python also supports array catenation and
element membership operations
• Ruby also provides array catenation
• Fortran provides elemental operations because they are
between pairs of array elements
– For example, + operator between two arrays results in an array of
the sums of the element pairs of the two arrays
Slice Examples
• Python
vector = [2, 4, 6, 8, 10, 12, 14, 16]
mat = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Accessing Multi-dimensioned
Arrays
• Two common ways:
– Row major order (by rows) – used in most languages
– Column major order (by columns) – used in Fortran
– A compile-time descriptor
for a multidimensional
array
Locating an Element in a Multi-
dimensioned Array
• General format
– Location (a[I,j]) = address of a [row_lb,col_lb]
+ (((I - row_lb) * n)
+ (j - col_lb)) * element_size
Compile-Time Descriptors
Associative Arrays
• Design issues:
- What is the form of references to elements?
- Is the size static or dynamic?
• Built-in type in Perl, Python, Ruby, and
Lua
– In Lua, they are supported by tables
Associative Arrays in Perl
• Names begin with %; literals are
delimited by parentheses
%hi_temps = ("Mon" => 77, "Tue" => 79, "Wed"
=> 65, …);
Record Types
• A record is a possibly heterogeneous
aggregate of data elements in which the
individual elements are identified by names
• Design issues:
– What is the syntactic form of references to the
field?
– Are elliptical references allowed
Definition of Records in COBOL
• COBOL uses level numbers to show nested
records; others use recursive definition
01 EMP-REC.
02 EMP-NAME.
05 FIRST PIC X(20).
05 MID PIC X(10).
05 LAST PIC X(20).
02 HOURLY-RATE PIC 99V99.
Elliptical References
• Elliptical references allow leaving out record
names as long as the reference is unambiguous,
for example in COBOL
FIRST, FIRST OF EMP-NAME, and FIRST
of EMP-REC are elliptical references to the
employee’s first name
Operations on Records
• Assignment is very common if the types are
identical
• Ada allows record comparison
• Ada records can be initialized with
aggregate literals
• COBOL provides MOVE CORRESPONDING
– Copies a field of the source record to the
corresponding field in the target record
Tuple Types
• A tuple is a data type that is similar to a
record, except that the elements are not
named
Tuple Types
• Used in Python, ML, and F# to allow
functions to return multiple values
– Python
• Closely related to its lists, but immutable
• Create with a tuple literal
myTuple = (3, 5.8, ′apple′)
Referenced with subscripts (begin at 1)
Catenation with + and deleted with del
Tuple Types in F#
let tup = (3, 5, 7)
let a, b, c = tup
This assigns a tuple to a tuple pattern
(a, b, c)
List Types
Lists n F# and ML
• F# Lists
– Like those of ML, except elements are separated by
semicolons and hd and tl are methods of the List class
• Python Lists
– The list data type also serves as Python’s arrays
– Unlike Scheme, Common LISP, ML, and F#, Python’s
lists are mutable
– Elements can be of any type
– Create a list with an assignment
myList = [3, 5.8, "grape"]
Lists in Python
• List elements are referenced with subscripting, with
indices beginning at zero
x = myList[1] Sets x to 5.8
• List elements can be deleted with del
del myList[1]
• List Comprehensions – derived from set notation
[x * x for x in range(6) if x % 3 == 0]
Evaluation of Unions
• Free unions are unsafe
– Do not allow type checking
• Java and C# do not support unions
– Reflective of growing concerns for safety in
programming language
• Ada’s descriminated unions are safe
Pointer and Reference Types
• A pointer type variable has a range of
values that consists of memory addresses
and a special value, nil
• Provide the power of indirect addressing
• Provide a way to manage dynamic memory
• A pointer can be used to access a location in
the area where storage is dynamically
created (usually called a heap)
Pointers in Ada
• Some dangling pointers are disallowed
because dynamic objects can be
automatically deallocated at the end of
pointer's type scope
• The lost heap-dynamic variable problem is
not eliminated by Ada (possible with
UNCHECKED_DEALLOCATION)
Pointers in C and C++
• Extremely flexible but must be used with care
• Pointers can point at any variable regardless of
when or where it was allocated
• Used for dynamic storage management and
addressing
• Pointer arithmetic is possible
• Explicit dereferencing and address-of operators
• Domain type need not be fixed (void *)
void * can point to any type and can be type
checked (cannot be de-referenced)
Evaluation of Pointers
Locks-and-keys
• Locks-and-keys use pointer values that are
represented as (key, address) pairs
• Heap-dynamic variables are represented as
variable plus cell for integer lock value
• When heap-dynamic variable allocated, lock value
is created and placed in lock cell and key cell of
pointer
Heap Management
• A very complex run-time process
• Single-size cells vs. variable-size cells
• Two approaches to reclaim garbage
– Reference counters (eager approach):
reclamation is gradual
– Mark-sweep (lazy approach): reclamation
occurs when the list of variable space becomes
empty
Reference Counter
• Reference counters: maintain a counter in
every cell that store the number of pointers
currently pointing at the cell
– Disadvantages: space required, execution time
required, complications for cells connected
circularly
– Advantage: it is intrinsically incremental, so
significant delays in the application execution
are avoided
Mark-Sweep
• The run-time system allocates storage cells as
requested and disconnects pointers from cells as
necessary; mark-sweep then begins
– Every heap cell has an extra bit used by collection
algorithm
– All cells initially set to garbage
– All pointers traced into heap, and reachable cells
marked as not garbage
– All garbage cells returned to list of available cells
Disadvantages of Mark-Sweep
• In its original form, it was done too
infrequently.
• When done, it caused significant delays in
application execution.
• Contemporary mark-sweep algorithms
avoid this by doing it more often—called
incremental mark-sweep
Marking Algorithm
Variable-Size Cells
• All the difficulties of single-size cells plus more
• Required by most programming languages
• If mark-sweep is used, additional problems occur
• The initial setting of the indicators of all cells in
the heap is difficult
• The marking process in nontrivial
• Maintaining the list of available space is another
source of overhead
Type Checking
• Generalize the concept of operands and operators
to include subprograms and assignments
• Type checking is the activity of ensuring that the
operands of an operator are of compatible types
• A compatible type is one that is either legal for the
operator, or is allowed under language rules to be
implicitly converted, by compiler- generated code,
to a legal type
– This automatic conversion is called a coercion.
• A type error is the application of an operator to an
operand of an inappropriate type
Type Coercion
• Coercion rules strongly affect strong typing-
-they can weaken it considerably (C++
versus Ada)
• Although Java has just half the assignment
coercions of C++, its strong typing is still
far less effective than that of Ada
Name Type Equivalence