Field (computer science)

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

In computer science, data that has several parts, known as a record, can be divided into fields. Relational databases arrange data as sets of database records, also called rows. Each record consists of several fields; the fields of all records form the columns.

In object-oriented programming, field (also called data member or member variable) is the data encapsulated within a class or object. In the case of a regular field (also called instance variable), for each instance of the object there is an instance variable: for example, an Employee class has a Name field and there is one distinct name per employee. A static field (also called class variable) is one variable, which is shared by all instances.[1]

Fixed length

Fields that contain a fixed number of bits are known as fixed length fields. A four byte field for example may contain a 31 bit binary integer plus a sign bit (32 bits in all). A 30 byte name field may contain a persons name typically padded with blanks at the end. The disadvantage of using fixed length fields is that some part of the field may be wasted but space is still required for the maximum length case. Also, where fields are omitted, padding for the missing fields is still required to maintain fixed start positions within a record for instance.

Variable length

A variable length field is not always the same physical size. Such fields are nearly always used for text fields that can be large, or fields that vary greatly in length. For example, a bibliographical database like PubMed has many small fields such as publication date and author name, but also has abstracts, which vary greatly in length. Reserving a fixed-length field of some length would be inefficient because it would enforce a maximum length on abstracts, and because space would be wasted in most records (particularly if many articles lacked abstracts entirely).

Database implementations commonly store varying-length fields in special ways, in order to make all the records of a given type have a uniform small size. Doing so can help performance. On the other hand, data in serialized forms such as stored in typical file systems, transmitted across networks, and so on usually uses quite different performance strategies. The choice depends on factors such as the total size of records, performance characteristics of the storage medium, and the expected patterns of access.

Database implementations typically store variable length fields in ways such as

  • a sequence of characters or bytes, followed by an end-marker that is prohibited within the string itself. This makes it slower to access later fields in the same record because the later fields are not always at the same physical distance from the start of the record.
  • a pointer to data in some other location, such as a URI, a file offset (and perhaps length), or a key identifying a record in some special place. This typically speeds up processes that don't need the contents of the variable length field(s), but slows processes that do.
  • a length prefix followed by the specified number of characters or bytes. This avoids searches for an end-marker as in the first method, and avoids the loss of locality of reference as in the second method. On the other hand, it imposes a maximum length: the biggest number that can be represented using the (generally fixed length) prefix. In addition, records still vary in length, and must be traversed in order to reach later fields.

If a varying-length field is often empty, additional optimizations come into play.

See also

References

  1. Lua error in package.lua at line 80: module 'strict' not found.