Structured, Semi-Structured and Unstructured Data (M-2)
Structured, Semi-Structured and Unstructured Data (M-2)
Structured data is the data which conforms to a data model, has a well define
structure, follows a consistent order and can be easily accessed and used by a person
or a computer program.
Structured data is usually stored in well-defined schemas such as Databases. It is
generally tabular with column and rows that clearly define its attributes.
SQL (Structured Query language) is often used to manage structured data stored in
databases.
Characteristics of Structured Data:
Data conforms to a data model and has easily identifiable structure
Data is stored in the form of rows and columns Example : Database
Data is well organised so, Definition, Format and Meaning of data is
explicitly known
Data resides in fixed fields within a record or file
Similar entities are grouped together to form relations or classes
Entities in the same group have same attributes
Easy to access and query, So data can be easily used by other programs
Data elements are addressable, so efficient to analyse and process
Storage:
Structured data is stored in tabular formats (e.g., excel sheets or SQL databases) that
require less storage space. It can be stored in data warehouses, which makes it highly
scalable.
Semi-structured data
Semi-structured data is the data which does not conforms to a data model but has
some structure. It lacks a fixed or rigid schema. It is the data that does not reside in
a rational database but that have some organisational properties that make it easier
to analyse. With some process, we can store them in the relational database.
Characteristics of semi-structured Data:
Data does not conforms to a data model but has some structure.
Data can not be stored in the form of rows and columns as in Databases
Semi-structured data contains tags and elements (Metadata) which is used to
group data and describe how the data is stored
Similar entities are grouped together and organised in a hierarchy
Entities in the same group may or may not have the same attributes or
properties
Does not contains sufficient metadata which makes automation and
management of data difficult
Size and type of the same attributes in a group may differ
Due to lack of a well defined structure, it can not used by computer programs
easily
Unstructured data
Unstructured data is the data which does not conforms to a data model and has no
easily identifiable structure such that it can not be used by a computer program
easily. Unstructured data is not organised in a pre-defined manner or does not have
a pre-defined data model, thus it is not a good fit for a mainstream relational
database.
Characteristics of Unstructured Data:
Data neither conforms to a data model nor has any structure.
Data can not be stored in the form of rows and columns as in Databases
Data does not follows any semantic or rules
Data lacks any particular format or sequence
Data has no easily identifiable structure
Due to lack of identifiable structure, it can not used by computer programs
easily