A taxonomy of datatypes

Vladimir Shelekhov

A taxonomy of datatypes

Vladimir Shelekhov

1994, ACM SIGPLAN Notices

visibility

…

description

9 pages

link

1 file

A taxonomy of datatypes Brian Meek King's College Londo n b .meek@hazel.cc .kcl.ac.uk Introductio n This is the second article based on language-independent standardisation work being carried out b y international standards working group ISO/IEC JTC1/SC22 WG11 (Language Bindings) . The first , Programming languages - towards greater commonality [Meek 1994] was an overview, which briefly describe d the various and the relationships between them . This article goes into one of these projects in greater detail . Both articles are based on presentations to the DECUS (UK and Ireland) symposia, this one given in May 1994 . A third article, What is a procedure call?, based on a similar presentation in 1993 [Meek 1993], is projected . What exactly is a datatype ? It is surprisingly difficult to get people to agree on what constitutes a "datatype" . Most have a clear idea of what a datatype is - an idea, anyway - but as with many things in this field - like what a procedure call is, discussed i n the projected third paper - they are often not the same idea . Data being ubiquitous, the datatype concept crops up in a wide range of contexts, and different contexts databases, communications, many different programming languages - have their own culture and conventions . Such a variety of background is almost certain to lead to different perceptions . The taxonomy described here i s based, though informally and fairly loosely, on that used in Draft International Standard 11404, Languag e the "language independent" (LI) indicates the attempt to avoid th e Independent Datatypes (LID) ; presuppositions usually present in the culture and conventions of any particular language - or indeed any othe r use of the concept . Interpretations and details are the author's own, and should not be assumed necessarily t o be present in the standard itself . In constructing any taxonomy it is necessary to decide not only what a datatype is, but what it is not . Despit e the apparent derivation of the word, it is not simply "a type of data", but a concept in its own right . (This is th e main reason for removing the space in "data type" .) Things that have a datatype may not themselves be a n item of data (at least of that datatype ; it may be a value of some other datatype) . An integer variable is not itself an integer . If that distinction is not clear (and some programmers find it difficult to see), an X channel (for I/O) i n occam is not an X value - it is something that transports X values and only X values . Something "of datatype X" has properties (depending on its own nature) that in some way relate to the type of data that X values are . Thi s is a fine distinction, but it emphatically is not nitpicking : it is crucial to any taxonomy, if it is to have any hope o f encompassing generically the many different views that there are . The next thing is to abandon any representational view of a datatype . Data is an abstraction, capable o f representation in many forms, and "datatype" is a more abstract still ; it is disastrous to link the concept t o particular representational forms . This applies as much to second-order representational assumptions (e .g . a complex value as an ordered pair of real values, or an array as a contiguous block of values) as it does to first order assumptions such as bit patterns . Such assumptions may be useful, even essential, in particula r contexts, but a generic taxonomy cannot afford such restrictions . Other things that have to go from the definition of datatype are the operations that can be performed on the dat a values . Many languages people are startled by this, even horrified . To them, the values and the operations o n them are inseparable . In that data in a program is of limited use if you cannot do operations, this is a n understandable attitude, but it overlooks two things . One is that you can distinguish between static and dynamic aspects - programming languages separate stati c data declaration and dynamic procedural aspects, implicitly if not explicitly . The point of this is that som e functionality related to languages is purely static as far as the data is concerned . For example, a dat a transmission channel operates with the data (dynamic) but not on the data, which is unchanged (if the channe l is working correctly!) . Operations such as addition and subtraction are an irrelevance here, and can even be a nuisance, for example when talking of conformity to a standard . 159 ACM SIGPLAN Notices, Volume 29, No . 9, September 1994 The other overlooked aspect is that it is not always obvious what the operations should be . Languages do no t always agree, and some with "derived datatype" facilities allow default operations to be suppressed or replaced , and others added . An example is that, for a new Integer called year, add and multiply are suppresse d (meaningless) and subtraction is replaced ; the subtraction is normal but the datatype of the result is not year but a datatype representing number of years . I am not saying you could not include operations in a taxonomy, but to do it properly needs the full machinery o f object-orientation . In standards work you need to set yourselves a specific achievable target, not make a n open-ended commitment, so DIS 11404 does not attempt it . A taxonomy based on values alone is infinite, bu t manageably infinite ; allow operations, and it becomes hard to see how or where you would ever stop, quit e apart from (in a standard) specifying testable conformity requirements . The basis of the taxonom y The taxonomy follows the common practice of starting with a number of primitive datatypes and then usin g these to construct others . There are three main kinds of constructed datatypes : subtypes, generated datatypes , and aggregates . (In fact aggregates are technically also generated datatypes, but important enough to deserv e separate classification . ) Primitive datatype s Primitive datatypes are datatypes whose values are regarded fundamental - not subject to any reduction . The y just "are" . Those values are what some languages have called "atomic", or "plain values" . Many primitive datatypes are also generic, in the sense that they have an unlimited number of values, and hence the datatype s often used in practice are confined to a finite subset of them . The reason that they are used in the taxonomy , rather than "actual" achievable datatypes, is threefold : it is a convenient way to identify a class of datatype s which is infinite in extent ; language definitions commonly use them, meaning that they simplify the bindin g between the LI datatypes and the ones used by specific languages ; and it allows for the possibility of actuall y supporting them if a language is designed to do so (in the sense, for example, that an integer of any arbitrar y magnitude can be accommodated, even if in practice at some point you will run out of storage or time) . It is important here to distinguish "a language" from any given implementation of that language . A languag e definition will usually specify Integer as a datatype, but leave the actual range of integers supported to th e implementation . The most they are likely to specify is that the range must be of continuous values and cover a t least a specific minimal range ; some do not even go that far . However, for the LI datatype taxonomy, each on e of all the possible subsets of the integer domain is a distinct datatype . If you are specifying a LI service in term s of LI datatypes, you cannot just bind Integer to integer and leave it at that - you won't always be able to b e certain that a service will work, For some services, a mismatch, or certain kinds of mismatch, will not matter , but for some (or for some uses of the service) it can matter . Programmers have long experienced suc h problems when transferring applications, even from one implementation of a language to another - and eve n when both conform to the language standard, if the standard has weak conformity requirements in this area . The primitive datatypes of the taxonomy are Boolean, State, Enumerated, Character, Ordinal, Date-andtime, Integer, Rational, Scaled, Real, Complex and Void . This is a much longer list than that which mos t languages designate as "primitive", sometimes because they classify datatypes differently, sometimes becaus e they represent some in terms of others (or assume that the programmer will) . A simple example of such a represented datatype is Complex as an ordered pair of real numbers, the "real" and "imaginary" parts . However, LID eschews this form of "representation" as well as the more obvious bit-level form . A LID datatyp e is just a set of values, and using the cartesian form to identify them is only a convenience for some purposes think of the polar form, for example . Similarly, Rational is not a directly supported datatype of any well-known language (though Forth goes some way towards it), but its value-space is distinctive, and here again th e integer-pair relationship runs into trouble, e .g . through multiple representations (2/4, 3/6, 34/68 etc) and specia l cases (110, etc) . It is best treated as primitive in its own right . Most of these primitive datatypes are generic, the actual specific, usable datatypes being derived by the use of parameters or qualifiers . Boolean and Void are exceptions . (An earlier paper [Meek 1990] discusses th e multiplicity of ways two-valued datatypes like Boolean can be - and are - handled . ) 160 Subtype s In the taxonomy subtypes are created by modifying the value-space of a "base" datatype in various ways specifying a range or size ; selecting values ; excluding values ; extending the value-space ; or defining explicitl y how the value-space is constructed from that of a "base" datatype . Any combination of these is possible too . These are fairly self-explanatory, even though eyebrows might be raised at the idea of extending, where th e subtype ends up with a wider value space than the base datatypel However, in the taxonomy any datatype ca n be used as the base, not just the primitive ones, and in that context extension is a useful subtype constructor, e .g . you can make a new subtype by extending an existing one . Generated datatype s The (non-aggregate) generated datatypes in the taxonomy are Pointer, Procedure, and Choice datatypes . Such datatypes are produced from other datatypes by the methods familiar from languages that include them , e .g . in the case of a procedure datatype it is constructed from the datatypes of the procedure parameters and o f the returned result (if any) . The primitive datatype Void is useful here for subroutine procedure datatypes tha t do not return a value . Vold is also used in choices, where not making any choice between alternative "proper " datatypes is an allowed option . Aggregate datatypes An aggregate datatype is one whose values are made up of a number, in general more than one, of componen t values, each of which is a value of another datatype . In many ways this the most complex and interestin g subclass of datatypes, simply because of the number of combinations and variations that are possible . It is important for any taxonomy to get clear what qualifies as an aggregate, what you regard as a component o f it, and what it means to talk about a value of such a datatype . Here, the statement that each value of a n aggregate has "in general" more than one component value does not preclude cases where the "aggregate " value has only one component, or even none at all . (As far as the LID standard is concerned, languages ma y allow such "degenerate" cases in their own right, regard them as equivalent to non-aggregate types, or forbi d them altogether .) Next, a component value may itself be of an aggregate datatype . The difference betwee n aggregate and non-aggregate components is immaterial when considering the properties of the composite ; while "inside" the aggregate the individual components remain as single entities ("closed boxes") . Similarly, th e aggregate as whole is regarded as having a single value, a closed box, whose datatype is determined by th e datatypes of its components and the structure of the aggregate . The approach adopted here is to start with the most general form of aggregate datatype, capable of containin g anything, and then describing additional properties or constraints used to identify various kinds of aggregat e datatype that are encountered computationally . These properties and constraints are not all mutuall y orthogonal ; they may interact with others, in various ways . The particular mix of properties used for a give n aggregate datatype will depend on the envisaged computational uses of the datatype and its values . Thi s taxonomy shows a way of building any of the commonly-found forms of aggregate, and how to construct other s if needed, by appropriate mixing and matching of a relatively small number of properties . The taxonomy is i n fact capable of expressing bizarre conceptual datatypes of no practical interest, including unlimited numbers o f components, values not in practice representable, and so on - again sharing this property with many definition s of programming languages . The main kinds of aggregate in the taxonomy are Bag, Set, Record, Sequence, Array, and Table . These wil l be introduced as they appear in the discussion of the various properties aggregates can have . They are technically also datatype generators, though we shall call them datatypes for simplicity - the components will b e of one or more base datatypes from which the aggregate datatype values are constructed . The most general aggregate datatyp e The most general aggregate datatype is one whose values are each made up of any number of componen t values (including zero or one component values), .every such component value being of any datatype at all . This kind of aggregate datatype is called a Bag . It is completely unstructured, with no internal relationships a t all . It is of limited practical interest but is useful in the taxonomy since all other aggregate datatypes can b e expressed in terms of constraints and properties applied to it . 161 Distinguishing components by component values Since a Bag is completely unstructured, it is not in general possible to distinguish between different componen t values of a Bag value . This is because the components may have the same datatype and the same value o f that datatype : a Bag value may contain duplicate components . An example can be found in the simpl e probability exercise : a bag contains four white balls and eight blacks balls, what is the probability that the first two balls removed from the bag will be of the same colour ? Usually, component values of aggregate values are distinguished by their relationships with the structure of th e aggregate datatype as a whole, and relationships which exist between the components within it the aggregat e as a result of their membership of the aggregate (rather than as values of their own datatype) . A Set, in this taxonomy, is a Bag which has the constraint added that there are no duplicate values, i .e . give n two components, either their two datatypes are different, or they have the same datatype but have two differen t values of that datatype . Homogeneity Homogeneity is another constraint that can be added, either to Bag or Set . This means that the componen t values are drawn solely from one datatype, known as the base datatype . If the base datatype is B, th e datatypes resulting are Bag of B and Set of B respectively. Between the total generality of unconstrained Bag and Bag of B there would seem to be a limitless number o f intermediate cases where component values may be drawn from a range of different datatypes but not al l possible datatypes . In this taxonomy many such possibilities can be accommodated by using Choice . Size Returning to the most general Bag, another constraint that can be placed on values is the size of the aggregat e value, i.e . the allowed number of components . In fact this consists of two constraints, one being the lower limi t on the number of components, which must exist, and the other being the upper limit, which may or may no t exist . If these two are the same, every value of the aggregate datatype has the same size, if they are not the n any size between the upper and lower limits is possible . The size constraint on aggregate values should not b e confused with the number of different possible such values, which depends on the number of possibl e combinations of possible values of the components within their own datatypes . The lower limit on size must be at least zero, the upper limit at least the same as the lower limit . For the purposes of the taxonomy, it is assumed that both limits are actually achieved ; if no upper limit exists, thi s means that values of the aggregate datatype of any arbitrary size are valid . Where no upper limit exists, thi s means that the aggregate datatype has an unlimited (infinite) set of possible values . In any practical case o f course, only a finite (though perhaps unspecified) number of actual aggregate values, each of a finite (thoug h perhaps not predetermined) size, will be used . More complicated situations are possible where the size of each aggregate value may be any one of two o r more possibilities not expressible in terms of lower and upper bounds of a contiguous range : e .g . the size might be defined as a multiple of three (3 components, 6 components, . . . .) . Again, the taxonomy can be extended to cover this by use of Choice . It is at this point that it becomes evident that there may be interaction between the various constraints . Fo r example, if Set of B is defined where the base datatype B has a finite number n of distinct values, then becaus e of the constraint "every component value is distinct" in the definition of Set, n is the largest number o f components that any value of datatype Set of B can possibly have . in general from now on, additional properties or constraints used to identify various kinds of aggregate datatyp e may interact with others, in various ways . 162 The aggregate datatype and its base datatype It is very important in this taxonomy to distinguish between properties of the aggregate datatype and those tha t the base datatype has in its own right . In general, any property of the base datatype does not induce the sam e or a similar property in any aggregate datatype whose values are composed of its values, except perhaps wher e it interacts with a property of that aggregate datatype . For example, the finiteness of the base datatype in th e above example does induce a constraint on the upper limit of size of values of the aggregate datatype, but onl y through interaction with another constraint on the aggregate datatype . It does not induce a similar constraint fo r Bag . Similarly, a base datatype may have a specific ordering, but this does not induce a similar ordering of th e values of the aggregate, or of the components within any aggregate value . In fact, as will be seen, either o r both such orderings at the aggregate level will normally be different from any ordering of the base datatyp e values, and indeed can validly exist whether a base datatype ordering is defined or not . Distinguishing components by taggin g It is possible to distinguish the different components by "tagging" each one ; for example, a language ma y provide a means of defining names (identifiers) for the various components . The term 'tag" is used here simpl y to avoid any implication that the distinguishing syntax is necessarily an alphanumeric identifier - it could, fo r example, be a numerical label . The essence of tagging is that the tags, themselves, are not in themselves members of a datatype, and have no significance except as a means of referring to the aggregate component s concerned . In particular, they do not imply any ordering of the components within the aggregate . (For example , numerical labels when used as tags are not members of an arithmetic datatype, and have no arithmeti c properties .) However, the combination of a tag with a value of the relevant aggregate datatype will of cours e have a datatype, that of the component selected . In this sense the tag does have an associated datatype, tha t of the aggregate component it tags, but on its own it is not a "value" of anything . Of course, the datatype of an y given tagged component can be a Choice datatype, of any required degree of complexity . Theoretically, tagging can be partial, i .e . only some of the components are given tags . This taxonomy could b e extended to include it, but for current purposes the complications this would entail are not justified by th e benefits . Record datatypes A Record datatype in this taxonomy can be described as a Bag in which each component is tagged . However, tagging imposes so many constraints on what is in the Bag, that the unstructured nature of the unconstraine d Bag only remains through the absence of homogeneity (see above) and of ordering (see below) . In this taxonomy, tagging determines the number of components (i .e . it fixes the size of the aggregate), and it fixes th e datatypes of them . However, as noted earlier, flexibility of component datatype can be achieved through usin g Choice datatypes for them, and similarly Choice of Record can be used to achieve size variability, and othe r alternatives such as "variant records" which may be needed whether or not the number of components i s variable . Ordered datatype s In Bags, Sets and Records, the components of the aggregate are unordered. That means that, given any pai r of components, it is meaningless to ask whether either comes before (precedes) or after (succeeds) the other . If ordering is a property which is added to the aggregate, this means that an ordering relationship exist s between the various components . This can be done simply by giving mutual ordering properties between th e components (e .g . suitable operations on them) ; by giving them a position (in some sense) in the overal l aggregate structure, which induces an ordering ; or both . Any precedence relationship between components i s independent of any precedence relationship between the values of those components . For example if Cl and C2 are components of an aggregate and Cl comes before C2 in the aggregate, then this says nothing abou t ordering of the values . If the aggregate is inhomogeneous, Cl and C2 may have different datatypes . If it i s homogeneous, Cl and C2 may be members of a datatype which is unordered . If the datatype of Cl and C2 i s an ordered datatype, the value of Cl need not precede the value of C2 by the ordering rules of that datatype . Ordering of components within aggregates does not result in an ordering of values of the aggregate datatype . In this taxonomy, the values of any datatype either are totally ordered, or are unordered . Theoreticall y datatypes can exist which are partially ordered . Extending the taxonomy to include such a concept is possibl e 163 but would add considerable complication with little apparent benefit . For the purposes of this taxonomy it i s unnecessary to pursue this further. (Few if any actual programming languages seem to support partially ordered datatypes, and in practica l applications needing it, often it is adequate to use total ordering but then either ignore ordering when no t required, or check that it is meaningful in a particular case . ) In this taxonomy, Sequence is an aggregate with a strict and unique ordering, but whose components i n general are not distinguishable from one another in any other way than by this ordering . By "strict and unique" is meant that every component (except the first) has one and only one immediate predecessor, and (except th e last) one and only one immediate successor . A Sequence can be homogeneous (Sequence of B) or not, an d its size may be fixed or variable in any of the ways discussed earlier . Starting from the first component and taking successor after successor until the last component is reached , each component can be related to the Sequence as a whole, in terms of "distance" from one end of th e Sequence or the other . This does not imply that any given component can be accessed in any other way tha n by systematic searching for it through the ordering . There may not even be any direct means of identifyin g which is "first", though commonly in languages this can be indicated through the lexical ordering of the definitio n of aggregate values . Notionally components may be distinguished (keyed or indexed, see below) by implici t association with ordinal values first, second, third, . . ., but in this taxonomy a Sequence does not have a genera l such method of picking out any individual component . Ways may be provided (e .g . special operations on th e aggregate as a whole) of finding (say) the first or the last, but not for all . Operations to find the (immediate ) successor or predecessor are characterising of Sequence in this taxonomy, and those do apply to al l components . In this taxonomy, therefore, Sequence is distinct from Vector, which has a similar structure but i s indexed, as shown below . The taxonomy can be extended to include cyclic datatypes (where the choice of "first" component is arbitrar y and the successor of the "last" is the "first"), and cases with more complicated topology (directed graphs o r digraphs) . Here too, for current purposes the complications this would entail are not justified by the benefits . I f it is really needed in a particular case, it can be done at the operations level, but suitably modifying those fo r obtaining the predecessor or successor of a component . Recursive aggregate s A special situation arises when a datatype is defined recursively, i .e . where the base datatype of th e components is a Choice which includes, as one of the alternatives, the aggregate datatype itself . (There has to be a Choice, to enable the recursion to terminate .) While in principle this can apply generally, in this taxonom y only one case appears explicitly, namely Tree identified as a recursively defined Sequence, i .e . Tree of B is Sequence of (Choice of (B,Tree of B)) . In some programming languages this kind of aggregate is called a list , but the word "list" is avoided here it is sometimes taken to mean what in this taxonomy is called a Sequence . Sequence can be regarded as a "flat" form of Tree, where the recursive property is not used ; this is clear fro m the definition of Tree . In fact, assuming other constraints (such as size constraints) are the same for both, i n general the values of the datatype Tree of B will include all of the values of the datatype Sequence of B . Distinguishing components by keyin g Tagging was introduced earlier as a means of identifying each individual component of an aggregate, and it wa s noted that in this taxonomy the tags are not values of a datatype, and have no significance except as names fo r the components they reference . Keying is similar to tagging except that keys are values of a datatype ; i .e ., there is a one-to-on e correspondence between values of the key datatype and the components of the aggregate datatype . The ke y datatype may or may not be ordered, and this ordering may or may not be used in relation to the keying ; that is , though the datatype the keys are taken from is an ordered datatype, this property need not be used by th e aggregate datatype being keyed . Keying may either be internal or external to the aggregate datatype . If the keying is internal, this means that th e values of the keys appear themselves explicitly in the aggregate as components in their own right . In general , 164 this results in the aggregate datatype having more than one dimension, so further discussion of internal keyin g will be deferred until the discussion of dimensionality . If keying is external, the key values are used more like tags but with the added feature that they have propertie s of their own . In particular, if the key datatype is ordered, it induces an equivalent ordering on the components o f the aggregate . In the case of internal keying, any ordering of the key datatype will not in general induce a n ordering of the aggregate components, since that will depend on where the keys appear as components themselves . In this taxonomy, a keyed aggregate is homogeneous . The base datatype, as usual, can he made a Choice, fo r greater flexibility . As with tagging, keying could be partial, but this taxonomy is not extended to include that fo r the same reasons as before . Distinguishing components by indexing Indexing is a special case of keying where the values of the key datatype are an uninterrupted range o f successive values from a base datatype from which the keys are drawn . The number and values of the key s are defined in terms of the lower and upper bounds of the range . Indexing is commonly used in language s since it can provide a bridge between mathematical concepts such as indexed variables, which are useful i n many applications, and an efficient representational method using contiguous storage locations and simpl e means of calculating the storage position of an individual indexed component . A Vector is an aggregate with one index, where indexing is understood in the sense described here . Eac h value of the index identifies a unique component of the aggregate datatype, and the ordering of the inde x datatype induces an ordering on those components . A Vector becomes a Sequence if the index datatyp e values identifying the components are discarded, but the ordering remains, though to be able to extrac t individual components some means other than indexing of identifying individual components will be needed . Some languages merge, or regard as interchangeable, the concepts of Sequence and Vector . Some treat al l Vectors of the same length as indistinguishable, and some place constraints on the index datatype and/or o n the bounds . Examples of constraints are restricting the index datatype to Integer, and fixing the lower bound . Some languages allow (or assume) that Vectors (and/or Arrays, see below) are of variable bounds or size . I n this taxonomy, this can be allowed for by suitable use of Choice . Variable Vectors could equally be provide d by extending the taxonomy to allow either or both of the actual bounds of any aggregate value to be one of a range . It could also be done by "padding" shorter aggregate values with meaningless values for the "missing " components, or by accompanying each aggregate value by other values identifying the meaningful values, bu t in this taxonomy these are regarded as representational matters . As with tagging and keying, indexing could be partial, but this taxonomy is not extended to include that for th e same reasons as before . To summarise the differences between tagging, keying and indexing, tags individually identify components bu t are not values of a datatype, while keys do the same but are values of a datatype . Keys may or may not b e ordered, and can be provided externally or internally . Indexes are external keys forming a consecutive rang e from an ordered datatype . Partial use (some but not all) has already been referred to, and the use of a mixtur e (e .g . some components are indexed, the rest tagged), though theoretically possible, is left out of this taxonom y for similar reasons . However, occasion may arise where hybrids occur, for example aggregates which are bot h keyed and tagged, so components can be identified in two different ways . That the same aggregate value ca n be used either as a Record or as a Vector is not in itself a problem, but in general it will be necessary t o determine which of the two is required for an external mapping . Aggregates with more than one dimensio n The discussion so far has been of aggregates either with no structure or where the structure is essentially one dimensional . This is a consequence of the property of ordering . A Tree might be regarded as an exception, bu t in this taxonomy it is regarded as one-dimensional even if some components have substructure . Indeed, since the components which are themselves Trees can be regarded as one-dimensional in the same way, th e substructures can be "unpacked" recursively just as they are built recursively, until no substructures remain, i n 165 wnicn case a sequence or oase aatatype components bas been produced . rus process, sometimes calle d flattening, is similar to one that can be used in list processing as a simple form of searching, though in that cas e the substructures are "visited" but not removed . This is often displayed lexically by use of parentheses, wher e e .g . (a, (b, c), (d, (e, f), (g, h))) is a Tree and (a, b, c, d, e, f, g, h), with the parentheses removed, is the flattene d version . Note that this "unpacking" of the Tree components of the original Tree releases their own component s to the parent aggregate value, but does not destroy the ordering of them . In this taxonomy, an aggregate datatype is multidimensional if more than one piece of information is needed t o identify each component . For example, one might need two indexes, or two keys, or one key and then a n ordering, or one key and one index . Each of these would be two-dimensional . An aggregate needing thre e indexes, or two indexes and a key, would be three-dimensional, and so on . Multidimensionality can either be inherent or induced. Multidimensionality is inherent if the aggregate datatyp e is defined directly in terms of using more than one piece of information to identify each component . The most common example of inherent multidimensionality is the Array datatype . Array (Ibi:ubi,lb2 :ub2) of B, where B is the base datatype and Ibi, ubi, 1b2, and ub2 are values of an index datatype, is a two-dimensional aggregat e datatype, and two index values, one for each dimension ; this can readily be extended to any required numbe r of dimensions . The index datatype is the same for all dimensions . Aggregate datatypes are possible ar e defined similarly but do not have this constraint, but in this taxonomy it is not called an Array . A Vector i s identical to an Array with only one dimension . With inherent multidimensionality, there is no ordering of dimensions, and hence no general ordering o f components, though the components along each dimension, when the indexes for the other dimensions ar e fixed, are ordered (induced, as usual, by the ordering of the index values) . The ordering of definition of th e bounds as shown above is purely lexical, and the use of 1 and 2 etc in the names used for the bounds is purely a linguistic convenience without any implied ordering . Multidimensionality is induced by defining an aggregate datatype as an aggregate of components which ar e themselves aggregates, but which are then unpacked, so that its own components are thereafter regarded a s components of the larger datatype . Here, unpacked is used in the sense that, though the outermost structur e boundaries have been removed, the "exposed" components retain any properties they had as part of tha t (component) aggregate, and make them available in the overall aggregate . For example, an unpacked Record will retain its keys, a Vector will retain its indexing, and an ordered aggregate will retain the ordering of th e components that came from it . Various kinds of two-dimensional and higher dimensional "tabular" datatype s can be produced in this way . We have already used the concept of unpacking in relation to flattening of a Tree, where it can be noted that the ordering of components within subtrees is retained on unpacking, hence making the flattened Tree ordered and hence a Sequence . Note that the definitions make it possible to produce a multidimensional aggregate with the same component s either by direct definition (inherent) or by unpacking aggregates of aggregates . In this taxonomy, the two ar e kept distinct, and in particular properties like ordering may be different between the two . The definition o f inherent multidimensionality does not imply an ordering of dimensions or of components, whereas unpacking o f aggregates of (ordered) aggregates preserves the ordering within the (internal) aggregates . Thus, for example , a Vector of Vectors is one-dimensional (e .g . [V 1 , V2]) before unpacking, and two-dimensional and ordered (e .g . [v 11 , v 12, v21 , v22] after unpacking, whereas the equivalent two-dimensional Array is not (totally) ordered . These distinctions are maintained in this taxonomy because it is logically possible and because languages exist tha t make such distinctions . However, some languages may regard them as equivalent, or even define th e multidimensional aggregates as being built up of successive aggregating and unpacking of one-dimensiona l aggregates . The definition of unpacking has further consequences when higher dimensionality than two is involved, wit h various intermediate stages . For example, a three-dimensional Array will have corresponding structures of th e kind Vector of Vector of Vector, Vector of two-dimensional Array, and two-dimensional Array of Vector . Each of these will have their own ordering rules . Finally, a Table is a special form of multidimensional aggregate with internal keying . It is easiest to visualise i n its two-dimensional form, with one or more columns (say) holding keys to identify particular rows . Th e 166 complications possible in building Tables, using similar elaborations to those already discussed, can be left fo r the reader to imagine ! Subaggregate s As noted, it is possible to obtain an aggregate of lower dimensionality by fixing one or more components in othe r dimensions . For example a two-dimensional Array can yield "row" Vectors and "column" Vectors, dependin g on which of the two indices is fixed . Other methods of producing subaggregates are selecting subsets or subranges of keys or indexes, which ca n apply equally to one-dimensional aggregates . Mostly the various possibilities will be clear from the nature of th e original datatype . Note that the properties of the subaggregate may not necessarily be the same as those of th e original . Clearly a reduction of dimensionality changes that property, but it can change others too, for example a Vector derived from an Array of higher dimensions will have the induced ordering of the indexing whereas th e original was not (totally) ordered . Properties may also be lost, e .g . a subaggregate produced from a Vector by selecting specific and non-consecutive values of its index will no longer be a Vector, and the residual inde x values effectively form a kind of external keying . The properties that are preserved, changed, gained or lost can be deduced from consideration of the propertie s of the original, including index or key datatypes as appropriate, and the method of obtaining the subaggregates . Derived datatypes and datatype generator s The taxonomy allows for new datatypes to be produced from existing ones (copies, or "clones") and for furthe r datatypes and datatype generators to be derived from the basic primitive and generated ones - Tree (mentioned above) is one example, and CharacterString and BitString are others . Non-aggregate derived datatype s include Bit, Modulo, and Timelnterval . Space precludes detailed discussion of them, but details are in th e standard . Conclusio n The taxonomy has been introduced here to show the kind of way in which the LID and other LI standards ca n distil the essence of commonality of concept that underlies the often confusing and conflicting versions that exis t in programming languages and elsewhere [Meek 1994] . Its utility, and the utility of the forthcoming standard, i f properly exploited, is that will help to bridge the gulf between different, incompatible approaches betwee n languages and systems, that have so bedevilled users over the years . Reference s ISO/IEC DIS 11404, Language Independent Datatypes, 199 4 B .L . Meek, Two-valued datatypes, Sigplan Notices of the ACM, Vol 25 No 8, pp 75-79, August 199 0 B .L . Meek, What is a procedure call?, DECUS Symposium 1993, submission to Sigplan Notices of the ACM in preparatio n B .L . Meek, Programming languages - towards greater commonality, Sigplan Notices of the ACM, April 199 4 (based on DECUS Symposium 1992 presentation) 167

Log In

A taxonomy of datatypes

Sign up for access to the world's latest research

Sign up for access to the world's latest research

Related papers

Related papers

Related topics