DBDM Lecture Notes
DBDM Lecture Notes
DBDM Lecture Notes
Mapping EER to ODB schema –Object identifier –reference types –row types –UDTs
–Subtypes and super types –user-defined routines –Collection types –Object Query Language;
No-SQL: CAP theorem –Document-based: MongoDB data model and CRUD operations;
Column-based: Hbase data model and CRUD operations.
Step 1. Create an ODL class for each EER entity type or subclass. The type of the ODL class
should include all the attributes of the EER class. Multivalued attributes are typically declared
by using the set, bag, or list constructors.
If the values of the multivalued attribute for an object should be ordered, the list constructor
is chosen; if duplicates are allowed, the bag constructor should be chosen; otherwise, the set
constructor is chosen. Composite attributes are mapped into a tuple constructor (by using a
struct declaration in ODL).
Declare an extent for each class, and specify any key attributes as keys of the extent. (This is
possible only if an extent facility and key constraint declarations are available in the ODBMS.)
Step 2. Add relationship properties or reference attributes for each binary relationship into the
ODL classes that participate in the relationship. These may be created in one or both
directions.
If a binary relationship is represented by references in both directions, declare the references
to be relationship properties that are inverses of one another, if such a facility exists. If a binary
relationship is represented by a reference in only one direction, declare the reference to be an
attribute in the referencing class whose type is the referenced class name.
Depending on the cardinality ratio of the binary relationship, the relationship properties or
reference attributes may be single-valued or collection types. They will be singlevalued for
binary relationships in the 1:1 or N:1 direction; they are collection types (setvalued or list-
valued) for relationships in the 1: N or M: N direction. An alternative way to map binary M: N
relationships is discussed in step 7.
If relationship attributes exist, a tuple constructor (struct) can be used to create a structure of
the form < reference, relationship attributes >, which may be included instead of the reference
attribute. However, this does not allow the use of the inverse constraint. Additionally, if this
111
choice is represented in both directions, the attribute values will be represented twice,
creating redundancy.
This implicitly uses a tuple constructor at the top level of the type declaration, but in general,
the tuple constructor is not explicitly shown in the ODL class declarations. Further analysis of
the application domain is needed to decide which constructor to use because this information
is not available from the EER schema.
The ODL standard provides for the explicit definition of inverse relationships. Some ODBMS
products may not provide this support; in such cases, programmers must maintain every
relationship explicitly by coding the methods that update the objects appropriately. The
decision whether to use set or list is not available from the EER schema and must be
determined
Object and Object-Relational Databases
Step 3. Include appropriate operations for each class. These are not available from the EER
schema and must be added to the database design by referring to the origi- nal requirements.
A constructor method should include program code that checks any constraints that must hold
when a new object is created. A destructor method should check any constraints that may be
violated when an object is deleted. Other methods should include any further constraint
checks that are relevant.
Step 4. An ODL class that corresponds to a subclass in the EER schema inherits (via extends) the
type and methods of its superclass in the ODL schema. Its specific (noninherited) attributes,
relationship references, and operations are specified, as discussed in steps 1, 2, and 3.
Step 5. Weak entity types can be mapped in the same way as regular entity types.
An alternative mapping is possible for weak entity types that do not participate in any
relationships except their identifying relationship; these can be mapped as though they were
composite multivalued attributes of the owner entity type, by using the set < struct < ... >> or
list < struct < ... >> constructors. The attributes of the weak entity are included in the struct < ...
> construct, which corresponds to a tuple constructor. Attributes are mapped as discussed in
steps 1 and 2.
Step 6. Categories (union types) in an EER schema are difficult to map to ODL. It is possible to
create a mapping similar to the EER-to-relational mapping by declaring a class to represent the
category and defining 1:1 relationship between the category and each of its superclasses.
Another option is to use a union type, if it is available.
Step 7. An n-ary relationship with degree n > 2 can be mapped into a separate class, with
appropriate references to each participating class.
These references are based on mapping a 1: N relationship from each class that represents a
participating entity type to the class that represents the n-ary relationship. An M: N binary
112
relationship, especially if it contains relationship attributes, may also use this mapping option,
if desired.
5.2 Object identifier
An object identifier (OID) is an unambiguous, long-term name for any type of object or entity.
The OID mechanism finds application in diverse scenarios, particularly in security, and is
endorsed by the International Telecommunication Union (ITU), the Internet Engineering Task
Force (IETF), and ISO.
What is an OID?
An object identifier (OID) is an extensively used identification mechanism jointly
developed by ITU-T and ISO/IEC for naming any type of object, concept or "thing" with a
globally unambiguous name which requires a persistent name (long life-time). It is not
intended to be used for transient naming. OIDs, once allocated, should not be re-used for a
different object/thing.
It is based on a hierarchical name structure based on the "OID tree". This naming structure
uses a sequence of names, of which the first name identifies a top-level "node" in the OID tree,
and the next provides further identification of arcs leading to subnodes beneath the top-level,
and so on to any depth.
A critical feature of this identification mechanism is that it makes OIDs available to a great
many organizations and specifications for their own use (including countries, ITUT
Recommendations, ISO and IEC International Standards, specifications from national, regional
or international organizations, etc.).
How are OIDs allocated and what is a registration authority?
At each node, including the root, there is a requirement for some organization or standard to
be responsible for allocating arcs to sub-nodes and recording that allocation (together with the
organization the subordinate node has been allocated to), not necessarily publicly. This activity
is called a Registration Authority (RA).
In the OID tree, RAs are generally responsible only for allocation of sub-arcs to other RAs that
then control their own sub-nodes. In general, the RA for a sub-node operates independently in
allocating further sub-arcs to other organizations, but can be constrained by rules imposed by
its superior, should the superior so wish.
113
The registration tree is indeed managed in a completely decentralized way (a node gives full
power to its children).
The registration tree is defined and managed following the ITU-T X.660 & X.670
Recommendation series (or the ISO/IEC 9834 series of International Standards) What
is an OID repository?
Initially, it was left for each Registration Authority (RA)in the hierarchy to maintain its own
record of allocation beneath that RA, and to keep those allocations private if it so chose. There
was never any policing of this. An RA in the hierarchy was its own master and operated
autonomously.
In the early 1990s Orange developed software for their internal use which was generic enough
to provide a publicly available repository of OID allocations.
Information on OIDs is often buried inside the databases (perhaps sometimes paper)
maintained by an immense number of RAs. The information can be hard to access and is
sometimes private. Today this OID repository is regarded as the easiest way to access a large
amount of the publicly available information on OIDs: Many OIDs are recorded but it does not
contain all existing OIDs.
This OID repository is not an official Registration Authority, so any OID described on this web
site has to be officially allocated by the RA of its parent OID. The accuracy and completeness of
this OID repository rely on crowdsourcing, i.e., each user is welcome to contribute data.
114
<reference type>:: =
REF (<UDT name>)
[ SCOPE <Table name> [reference scope check] ]
<reference scope check> ::=
REFERENCES ARE [NOT] CHECKED
[ ON DELETE
115
116
INSERT or an UPDATE statement to indicate that the target should be set to NULL; that is, if
your source is NULL, your DBMS will set your target to vNULL``. Outside of SQL, if your source
has an indicator parameter that is set to -1, your DBMS will set your target to NULL (regardless
of the value of the source). (An indicator parameter with a value less than -1 will cause an
error: your DBMS will return the SQLSTATE error 22010 "data exception-invalid indicator
parameter value".) We’ll talk more about indicator parameters in our chapters on SQL binding
styles. Comparison
SQL provides only two scalar comparison operators – = and <> – to perform operations on
<reference type>s. Both will be familiar; there are equivalent operators in other computer
languages. Two REF values are comparable if they’re both based on the same UDT. If either of
the comparands are NULL, the result of the operation is UNKNOWN. Other Operations
With SQL, you have several other operations that you can perform on <reference type>s. Scalar
functions
SQL provides two scalar functions that operate on or return a <reference type>: the
<dereference operation> and the <reference resolution>.
<dereference operation>
The required syntax for a <dereference operation> is as follows.
<dereference operation>:: = reference_argument -> <Attribute name>
The <dereference operation> operates on two operands — the first must evaluate to a
<reference type> that has a non-empty scope and the second must be the name of an
Attribute of the <reference type>’s UDT.
The <dereference operation> allows you to access a Column of the row identified by a REF
value; it returns a result whose <data type> is the <data type> of <Attribute name> and whose
value is the value of the system-generated Column of the Table in the <reference type>’s scope
(where the system-generated Column is equal to reference_argument). That is, given a REF
value, the <dereference operation> returns the value at the site referenced by that REF value.
If the REF value doesn’t identify a site (perhaps because the site it once identified has been
destroyed), the <dereference operation> returns NULL.
If you want to restrict your code to Core SQL, don’t use the <dereference operation>.
<reference resolutions>
The required syntax for a <dereference operation> is as follows.
<dereference operation>:: = reference_argument -> <Attribute name>
DEREF operates on any expression that evaluates to a <reference type> that has a nonempty
scope. It returns the value referenced by a REF value. Your current <AuthorizationID> must
have the SELECT WITH HIERARCHY Privilege on reference_argument's scope Table.
117
If you want to restrict your code to Core SQL, don’t use DEREF. Set
Functions
SQL provides three set functions that operate on a <reference type>: COUNT and
GROUPING. Since none of these operate exclusively with REF arguments, we won’t
discuss them here; look for them in our chapter on set functions. Predicates
In addition to the comparison operators, SQL provides eight other predicates that operate on
<reference type>s: the <between predicate>, the <in predicate>, the <null predicate>, the
<exists predicate>, the <unique predicate>, the <match predicate>, the <quantified predicate>
and the <distinct predicate>. Each will return a boolean value: either TRUE, FALSE or
UNKNOWN. Since none of them operates strictly on <reference type>s, we won’t discuss them
here. Look for them in our chapters on search conditions.
118
value of a Field is a value of the Field’s <data type>. Each Field in a row must have a unique
name.
Example of a <row type> specification:
ROW(field_1INT,field_2DATE,field_3INTERVAL(4)YEAR)
A <Field name> identifies a Field and is either a <regular identifier> or a <delimited identifier>
that is unique (for all Fields and Columns) within the Table it belongs to. You can define a
Field’s <data type> either by putting a <data type> specification after <Field name> or by
putting a <Domain name> after the <Field name>. The <data type> of a Field can be any type
other than a <reference type> – in particular, it can itself be a <row type>.
Example, of a <row type> specification;
It defines a row with one Field (called field_1) whose defined <data type> is DATE:
ROW(field_1DATE)
[Obscure Rule] If the <data type> of a Field is CHAR, VARCHAR or CLOB, the Character set that
the Field’s values must belong to is determined as follows:
• If the <Field definition> contains a <data type> specification that includes a CHARACTER
SET clause, the Field’s Character set is the Character set named. Your current
<AuthorizationID> must have the USAGE Privilege on that Character set.
• If the <Field definition> does not include a <data type> specification, but the Field is
based on a Domain whose definition includes a CHARACTER SET clause, the Field’s
Character set is the Character set named.
• If the <Field definition> does not include any CHARACTER SET clause at all – either
through a <data type> specification or through a Domain definition – the Field’s
Character set is the Character set named in the DEFAULT CHARACTER SET clause of the
CREATE SCHEMA statement that defines the Schema that the Field belongs to.
For example, the effect of these two SQL statements:
CREATE SCHEMA bob AUTHORIZATION bob
DEFAULT CHARACTER SET INFORMATION_SCHEMA.LATIN1;
CREATE TABLE Table_1 ( column_1
ROW(
field_1 CHAR(10), field_2
INT));
119
is to create a Table in Schema bob. The Table has a Column with a ROW <data type>,
containing two Fields.
The character string Field’s set of valid values are fixed length character strings, exactly
10 characters long, all of whose characters must be found in the
INFORMATION_SCHEMA.LATIN1 Character set – the Schema’s default Character set. The effect
of these two SQL statements:
CREATE SCHEMA bob AUTHORIZATION bob
DEFAULT CHARACTER SET INFORMATION_SCHEMA.LATIN1;
CREATE TABLE Table_1 ( column_1 ROW( field_1 CHAR(10) CHARACTER SET
INFORMATION_SCHEMA.SQL_CHARACTER, field_2 INT));
is to create the same Table with one difference: this time, the character string Field’s values
must consist only of characters found in the
INFORMATION_SCHEMA.SQL_CHARACTER Character set – the explicit Character set
specification in CREATE TABLE constrains the Field’s set of values. The Schema’s default
Character set does not.
[Obscure Rule] If the <data type> of a Field is CHAR, VARCHAR, CLOB, NCHAR, NCHAR
VARYING or NCLOB, and your <Field definition> does not include a COLLATE clause, the
Field has a coercibility attribute of COERCIBLE – but if your <Field definition> includes a
COLLATE clause, the Field has a coercibility attribute of IMPLICIT. In either case, the Field’s
default Collation is determined as follows:
• If the <Field definition> includes a COLLATE clause, the Field’s default Collation is the
Collation named. Your current <Authorization ID> must have the USAGE Privilege on that
Collation.
• If the <Field definition> does not include a COLLATE clause, but does contain a
120
<data type> specification that includes a COLLATE clause, the Field’s default Collation is
the Collation named. Your current <Authorization ID> must have the USAGE Privilege on
that Collation.
• If the <Field definition> does not include a COLLATE clause, but the Field is based on a
Domain whose definition includes a COLLATE clause, the Field’s default Collation is the
Collation named.
• If the <Field definition> does not include any COLLATE clause at all – either explicitly,
through a <data type> specification or through a Domain definition – the Field’s default
Collation is the default Collation of the Field’s Character set.
[Obscure Rule] If the <data type> of a Field is REF(UDT), your current <AuthorizationID> must
have the USAGE Privilege on that UDT. If the <data type> of a Field includes REF with a <scope
clause>, your <Field definition> must also include this <reference scope check> clause:
REFERENCES ARE [NOT] CHECKED ON DELETE NO ACTION – to indicate whether references are
to be checked or not. Do not add a <reference scope check> clause under any other
circumstances.
• If a Field is defined with REFERENCES ARE CHECKED, and a <scope clause> is included in
the <Field definition>, then there is an implied DEFERRABLE INITIALLY IMMEDIATE
Constraint on the Field. This Constraint checks that the Field´s values are also found in
the corresponding Field of the system-generated Column of the Table named in the
<scope clause>.
• If the <data type> of a Field in a row is a UDT, then the current <AuthorizationID> must
have the USAGE Privilege on that UDT.
• A <row type> is a subtype of a <data type> if (a) both are <row type>s with the same
degree and (b) for every pair of corresponding <Field definition>s, the <Field name>s
are the same and the <data type> of the Field in the first <row type> is a supertype of
the <data type> of the Field in the second <row type>. <row reference>
A <row reference> returns a row. The required syntax for a <row reference> is as follows.
<rowreference>::=ROW{<Tablename>|<queryname>|<Correlationname>}
A row of data values belonging to a Table (or a query result, which is also a Table) is also
considered to be a <row type>.
In a Table, each Column of a data row corresponds to a Field of the <row type>: the Column
and Field have the same ordinal positions in the Table and <row type>, respectively.
A <row reference> allows you to access a specific row of a Table or a query result. Here is an
example of a <row reference> that would return a row of a Table named TABLE_1:
ROW(Table_1)
121
<Field reference>
A <Field reference> returns a Field of a row. The required syntax for a <Field reference> is as
follows.
<Fieldreference>::=row_argument.<Fieldname>
A <Field reference> allows you to access a specific Field of a row. It operates on two arguments:
the first must evaluate to a <row type> and the second must be the name of a Field belonging
to that row.
If the value of row_argument is NULL, then the specified Field is also NULL.
If row_argument has a non-null value, the value of the Field reference is the value of the
specified Field in row_argument. Here is an example of a <Field reference> that would return
the value of a Field named FIELD_1 that belongs to a row of TABLE_1:
ROW(Table_1).field_1
<row value constructor>
An <row value constructor> is used to construct a row of data. The required syntax for a <row
value constructor> is as follows.
<row value constructor> ::= element_expression |
[ ROW ] (element_expression [ {,element_expression}... ]) |
( <query expression> ) element_expression ::=
element_expression |
NULL |
ARRAY[] |
ARRAY??(??) |
DEFAULT
A <row value constructor> allows you to assign values to the Fields of a row, using either a list of
element_expressions of the result of a subquery. An element_expression may be any expression
that evaluates to a scalar value with a <data type> that is assignable to the corresponding Field’s
<data type>. A subquery – ( <query expression> ) – is discussed in our chapter on complex
queries.
The result is a row whose n-th Field value is the value of the n-th element_expression
(or whose value is the value of the subquery) you specify. If your element_expression is NULL,
the corresponding Field is assigned the null value. If your element_expression is ARRAY [] or
ARRAY??(??), the corresponding Field is assigned an empty array. If your element_expression is
122
DEFAULT, the corresponding Field is assigned its default value. Here is an example of a <row
value constructor>:
ROW('hello',567,DATE'1994-07-15’,NULL,DEFAULT,ARRAY[])
This example constructs a row with six Fields. The first Field has a character string value of
'hello', the second has a numeric value of 567, the third has a date value of '1994-07-15', the
fourth has a null value, the fifth has a value that is the fifth Field’s default value and the sixth
has a value that is an empty array. This <row value constructor> would be valid for this <row
type> specification:
ROW (field_1 CHAR (5),
field_2 SMALLINT,
field_3 DATE, field_4
BIT (4), field_5
domains_1, field_6 INT
ARRAY [4])
A <row value constructor> serves the same purpose for a row as a <literal> does for a
predefined <data type>. It has the same format as the <row type>’s ROW () – but instead of a
series of <Field definition>s inside the size delimiters, it contains commadelimited values of the
correct <data type> for each Field. For example, if your <row type> specification is:
ROW(field_1INT,field_2CHAR(5),field_3BIT(4)) a valid
<row value constructor> would be:
ROW(20,'hello’,B'1011')
If you construct a row with a subquery, the row takes on the <data type> of the subquery’s
result. An empty subquery result constructs a one-Field row whose value is NULL. A non-empty
subquery result constructs a one-Field row whose value is the subquery result.
If you want to restrict your code to Core SQL, (a) don’t use the ROW <data type> or <row
reference>s and <Field reference>s and, when using a <row value constructor>, (b) don’t use
ARRAY[] or ARRAY??(??) as an element_expression,(c) don’t construct a row with more than
one Field,(d) don’t use the ROW <keyword> in front of your element_expression, and (e) don’t
use a subquery to construct your row.
Row Operations
A row is compatible with, and comparable to, any row with compatible Fields – that is, rows are
mutually comparable and mutually assignable only if they have the same number of Fields and
each corresponding pair of Fields are mutually comparable and mutually assignable. Rows may
not be directly compared with, or directly assigned to, any other <data type> class, though
123
implicit type conversions of their Fields can occur in expressions, SELECTs, INSERTs, DELETEs
and UPDATEs. Explicit row type conversions are not possible. Assignment
In SQL, when a <row type> is assigned to a <row type> target, the assignment is done one Field
at a time – that is, the source’s first Field value is assigned to the target’s first Field, the source’s
second Field value is assigned to the target’s second Field, and so on. Assignment of a <row
type> to another <row type> is possible only if (a) both <row type>s have the same number of
Fields and (b) each corresponding pair of Fields have <data type>s that are mutually assignable.
[Obscure Rule] Since only SQL accepts null values, if your source is NULL, then your target’s
value is not changed. Instead, your DBMS will set its indicator parameter to -1, to indicate that
an assignment of the null value was attempted.
If your target doesn’t have an indicator parameter, the assignment will fail: your DBMS will
return the SQLSTATE error 22002 "data exception-null value, no indicator parameter". Going the
other way, there are two ways to assign a null value to an SQL-data target. Within SQL, you can
use the <keyword> NULL in an INSERT or an UPDATE statement to indicate that the target
should be set to NULL; that is, if your source is NULL, your DBMS will set your target to NULL.
Outside of SQL, if your source has an indicator parameter that is set to -1, your DBMS will
set your target to NULL (regardless of the value of the source). (An indicator parameter with
a value less than -1 will cause an error: your DBMS will return the SQLSTATE error 22010
"data exception-invalid indicator parameter value".) We’ll talk more about indicator
parameters in our chapters on SQL binding styles. Comparison
SQL provides the usual scalar comparison operators – = and <> and < and <= and > and
>= – to perform operations on rows. All of them will be familiar; there are equivalent operators
in other computer languages. Two rows are comparable if (a) both have the same number of
Fields and (b) each corresponding pair of Fields have <data type>s that are mutually
comparable.
Comparison is between pairs of Fields in corresponding ordinal positions – that is, the first Field
of the first row is compared to the first Field of the second row, the second Field of the first
row is compared to the second Field of the second row, an so on. If either comparand is NULL
the result of the operation is UNKNOWN.
The result of a <row type> comparison depends on two things: (a) the comparison operator and
(b) whether any Field is NULL. The order of comparison is:
If the comparison operator is = or <>: First the Field pairs which don´t include NULLs, then
the pairs which do.
If the comparison operator is anything other than = or <>: Field pairs from left to right.
Comparison stops when the result is unequal or UNKNOWN, or when there are no more Fields.
The result of the row comparison is the result of the last Field pair comparison.
Here are the possibilities.
124
If the comparison operator is =. The row comparison is (a) TRUE if the comparison is TRUE for
every pair of Fields, (b) FALSE if any non-null pair is not equal, and (c) UNKNOWN if at least one
Field is NULL and all non-null pairs are equal. For example:
ROW (1,1,1) = ROW (1,1,1) -- returns TRUE
ROW (1,1,1) = ROW (1,2,1) -- returns FALSE
ROW (1, NULL,1) = ROW (2,2,1) -- returns FALSE
ROW (1, NULL,1) = ROW (1,2,1) -- returns UNKNOWN
Comparison operator is <>. The row comparison is (a) TRUE if any non-null pair is not equal, (b)
FALSE if the comparison is FALSE for every pair of Fields, and (c) UNKNOWN if at least one Field
is NULL and all non-null pairs are equal. For example:
ROW (1,1,1) <> ROW (1,2,1) -- returns TRUE
ROW (1, NULL,2) <> ROW (2,2,1) -- returns TRUE
ROW (2,2,1) <> ROW (2,2,1) -- returns FALSE ROW (1,
NULL,1) <> ROW (1,2,1) -- returns UNKNOWN
Comparison operator is anything other than = or <>.
The row comparison is
(a) TRUE if the comparison is TRUE for at least one pair of Field and every pair before the
TRUE result is equal,
(b) FALSE uf the comparison is FALSE for at least one pair of Fields and every pair before the
FALSE result is equal, and
(c) UNKNOWN if the comparison is UNKNWON for at least one pair of Fields and every pair
before the UNKNOWN result is equal. Comparison stops as soon as any of these results (TRUE,
FALSE, or UNKNOWN) is established. For example:
ROW (1,1,1) < ROW (1,2,1) -- returns TRUE
ROW (1, NULL,1) < ROW (2, NULL,0) -- returns TRUE
ROW (1,1,1) < ROW (1,1,1) -- returns FALSE
ROW (3, NULL,1) < ROW (2, NULL,0) -- returns FALSE
ROW (2, NULL,1) < ROW (1,2,0) -- returns UNKNOWN
ROW (NULL,1,1) < ROW (2,1,0) -- returns UNKNOWN
SQL also provides three quantifiers – ALL, SOME, ANY – which you can use along with a
comparison operator to compare a row value with the collection of values returned by a <table
125
subquery>. Place the quantifier after the comparison operator, immediately before the <table
subquery>. For example:
SELECT row_column
FROM Table_1
WHERE row_column < ALL (
SELECT row_column
FROM Table_2);
ALL returns TRUE either (a) if the collection is an empty set (i.e.: if it contains zero rows) or (b) if
the comparison operator returns TRUE for every value in the collection. ALL returns FALSE if the
comparison operator returns FALSE for at least one value in the collection.
SOME and ANY are synonyms. They return TRUE if the comparison operator returns TRUE for at
least one value in the collection. They return FALSE either (a) if the collection is an empty set or
(b) if the comparison operator returns FALSE for every value in the collection. The search
condition = ANY (collection) is equivalent to “IN
(collection)``
5.5 UDTs
A UDT is defined by a descriptor that contains twelve pieces of information:
1. The <UDT name>, qualified by the <Schema name> of the Schema it belongs to.
2. Whether the UDT is ordered.
3. The UDT’s ordering form: either EQUALS, FULL or NONE.
4. The UDT’s ordering category: either RELATIVE, HASH or STATE.
5. The <specific routine designator> that identifies the UDT’s ordering function.
6. If the UDT is a direct subtype of one or more other UDTs, then the names of
those UDTs.
7. If the UDT is a distinct type, then the descriptor of the <data type> it’s based on;
otherwise an Attribute descriptor for each of the UDT’s Attributes.
8. The UDT’s degree: the number of its Attributes.
9. Whether the UDT is instantiable or not instantiable.
10.Whether the UDT is final or not final.
11.The UDT’s Transform descriptor.
126
12.If the UDT’s definition includes a method signature list, a descriptor for each method
signature named.
To create a UDT, use the CREATE TYPE statement (either as a stand-alone SQL statement or
within a CREATE SCHEMA statement). CREATE TYPE specifies the enclosing Schema, names the
UDT and identifies the UDT’s set of valid values.
To destroy a UDT, use the DROP TYPE statement. None of SQL3’s UDT syntax is Core SQL, so if
you want to restrict your code to Core SQL, don’t use UDTs.
UDT Names
A <UDT name> identifies a UDT. The required syntax for a <UDT name> is:
<UDT name> ::= [ <Schema name>. ] unqualified name
A <UDT name> is a <regular identifier> or a <delimited identifier> that is unique (for all Domains
and UDTs) within the Schema it belongs to. The <Schema name> which qualifies a <UDT name>
names the Schema that the UDT belongs to and can either be explicitly stated, or a default will
be supplied by your DBMS as follows:
• If a <UDT name> in a CREATE SCHEMA statement isn’t qualified, the default qualifier is
the name of the Schema you’re creating.
• If the unqualified <UDT name> is found in any other SQL statement in a Module, the
default qualifier is the name of the Schema identified in the SCHEMA clause or
AUTHORIZATION clause of the MODULE statement that defines that Module UDT
Example
Here’s an example of a UDT definition:
CREATE TYPE book_udt AS -- the UDT name will be book_udt
title CHAR (40), -- title is the first attribute
buying_price DECIMAL (9,2), -- buying_price is the second attribute
selling_price DECIMAL (9,2) -- selling_price is the third attribute
NOT FINAL -- this is a mandatory Finality Clause
METHOD profit () RETURNS DECIMAL (9,2); -- profit is a method, defined later
This CREATE TYPE statement results in a UDT named BOOK_UDT. The components of the UDT
are three attributes (named TITLE, BUYING_PRICE and SELLING_PRICE) and one method
(named PROFIT).
The three name-and-data-type pairs title CHAR (40) and buying_price DECIMAL (9,2) and
selling_price DECIMAL (9,2) are the UDT’s Attribute definitions.
127
The words NOT FINAL matter only for subtyping, which we’ll get to later. Briefly, though, if a
UDT definition doesn’t include an UNDER clause, the finality clause must specify NOT FINAL.
The clause METHOD profit () RETURNS DECIMAL (9,2) is a teaser. Like an Attribute, a
“method” is a component of a UDT. However, this method – PROFIT – is actually a declaration
that a function named PROFIT exists.
This function isn’t defined further in the UDT definition – there is a separate SQL statement
for defining functions: CREATE METHOD. All we can see at this stage is that PROFIT has a name
and a (predefined) data type>, just as regular Attributes do. Some people would call PROFIT a
“derived Attribute”.
128
And for credit card and check payments, we may need to know which CUSTOMER made the
payment, while this is not needed for cash payments
Should we create a single PAYMENT entity or three separate entities CASH, CHECK, and
CREDIT CARD?
And what happens if in the future we introduce a fourth method of payment?
Subdivide an Entity
Sometimes it makes sense to subdivide an entity into subtypes.
This may be the case when a group of instances has special properties, such as attributes or
relationships that exist only for that group.
In this case, the entity is called a “supertype” and each group is called a “subtype”.
Subtype Characteristics A
subtype:
Inherits all attributes of the supertype
Inherits all relationships of the supertype
Usually has its own attributes or relationships
Is drawn within the supertype
Never exists alone
May have subtypes of its own
129
Supertype Example
EXAM is a supertype of QUIZ, MIDTERM, and FINAL.
The subtypes have several attributes in common.
These common attributes are listed at the supertype level.
130
131
Nested Subtypes
You can nest subtypes.
For ease of reading — “readability” — you would usually show subtypes with only two levels,
but there is no rule that would stop you from going beyond two levels.
132
User-defined routines (UDR) are functions that perform specific actions that you can define in
your SIL™ programs for a later use. These can considerably improve the readability and
maintainability of your code. Syntax
function <name>(<type> param1, <type> param2, ...) {
Instruction1;
...
InstructionN; return
<value>;
}
Example function
zero () {
return 0;
}
number a = zero (); Parameters
The list of parameters in the definition of a UDR can be of any length (including 0) and their
respective types can be any valid SIL™ type.
Eample:
function zero () {
return 0;
}
function doSomething(string s, number n1, number [] n2, boolean flag, string [] oneMore){
....
}
Constant Parameters
Parameters of user-defined routines can be made read-only in the scope of the routine by
adding the keyword "const" before the parameter definition in the signature of the routine.
function f (const string s) {
...
}
Variable visibility
133
There are three categories of variables that can be used in a UDR: Local
variables
These are the variables you define in the body of the UDR. These can be used throughout the
body of the UDR. On exit, the values of these variables are lost.
function example ()
{ number a = 3;
number b = a + 10;
// use here variables a and b
}
Parameter variables
These are the values passed to the UDR in the list of parameters. Because SIL™ uses a "pass-by-
value" policy, even though you modify the value of these variables in the body of the function,
on exit, their original values will be restored.
function increment (number a) {
a = a + 1; // the value of a is only modified locally return a;
}
number b = 0; number c = increment(b); // the value of b does
not change print(b); // this prints 0 print(c); // this prints 1
Global variables
These are the variables that are already defined and can be used right away (issue fields,
customfields and any variables defined before the routine).
You can use issue fields and custom fields anywhere in your code (including in the UDR body)
without having to declare them.
function print Key () { print(key);
Return value
Return values can be used to communicate with the context that called the UDR or to halt
its execution. Examples function isEven(number a){
return (a % 2 == 0);
134
}
function increment (number a) {
return a + 1;
}
number b = increment (2);
Notice that there is no need to declare the type of the return value; this will be evaluated at
runtime.
Therefore, even though the check on the following program will be ok, at runtime the value of
d will NOT be modified because of the incompatibility between date (on the right-hand-side)
and number (on the left-hand-side).
function increment (number a) {
return a + 1;
}
date d = increment (2);
You can return simply from a routine without specifying a value. However, you should always
remember that by design routines return a value, even if it is undefined. The following code is
therefore valid:
function f (number a) { if
(a > 0) {
print("positive");
return;
}
if (a == 0) {print("ZERO");}
}
//[...................]
string s =f (4); //s is still undefined, no value was returned if(isNull(s)) {
? print ("S IS NULL!"); //this will be printed
} else {
? print ("S IS NOT NULL!");
}
135
Of course, the above code will print the text 'S IS NULL' in the log.
• Nested table
136
137
Example
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
+----+----------+-----+-----------+----------+
DECLARE
CURSOR c_customers is select
name from customers;
TYPE c_list IS TABLE of customers.Name%type INDEX BY binary_integer;
name_list c_list; counter integer:=0;
BEGIN
FOR n IN c_customers LOOP counter:=
counter +1; name_list(counter):=
n.name;
dbms_output.put_line('Customer('||counter||'):'||name_lis t(counter));
END LOOP;
END;
/
When the above code is executed at the SQL prompt, it produces the following result −
Customer (1): Ramesh
Customer (2): Khilan
Customer (3): kaushik
138
An array is always dense, i.e., it always has consecutive subscripts. A nested array is dense
initially, but it can become sparse when elements are deleted from it.
A nested table is created using the following syntax − TYPE
type_name IS TABLE OF element_type [NOT NULL]; table_name
type_name;
This declaration is similar to the declaration of an index-by table, but there is no INDEX BY
clause.
A nested table can be stored in a database column. It can further be used for simplifying
SQL operations where you join a single-column table with a larger table. An associative array
cannot be stored in the database. Example
The following examples illustrate the use of nested table −
DECLARE
TYPE names_table IS TABLE OF VARCHAR2(10); TYPE
grades IS TABLE OF INTEGER; names names_table;
marks grades; total integer;
BEGIN
names := names_table('Kavita', 'Pritam', 'Ayan', 'Rishav', 'Aziz');
marks:= grades(98, 97, 78, 87, 92); total := names.count;
dbms_output.put_line('Total '|| total || ' Students');
FOR i IN 1 .. total LOOP
dbms_output.put_line('Student:'||names(i)||', Marks:' || marks(i));
139
end loop;
END;
/
When the above code is executed at the SQL prompt, it produces the following result −
Total 5 Students
Student:Kavita, Marks:98
Student:Pritam, Marks:97 Student:Ayan,
Marks:78
Student:Rishav, Marks:87
Student:Aziz, Marks:92
PL/SQL procedure successfully completed.
Example
Elements of a nested table can also be a %ROWTYPE of any database table or %TYPE of any
database table field. The following example illustrates the concept. We will use the
CUSTOMERS table stored in our database as −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
+----+----------+-----+-----------+----------+
DECLARE
CURSOR c_customers is
SELECT name FROM customers;
TYPE c_list IS TABLE of customerS.No.ame%type;
name_list c_list := c_list(); counter integer :=0;
BEGIN
140
Varrays in PL/SQL
An array is a part of collection type data and it stands for variable-size arrays. We will study
other collection types in a later chapter 'PL/SQL Collections'.
141
Each element in a varray has an index associated with it. It also has a maximum size that
can be changed dynamically. Creating a Varray Type
A varray type is created with the CREATE TYPE statement. You must specify the maximum size
and the type of elements stored in the varray.
The basic syntax for creating a VARRAY type at the schema level is − CREATE OR
REPLACE TYPE varray_type_name IS VARRAY(n) of <element_type>
Where, varray_type_name is a valid attribute name, n is the
number of elements (maximum) in the varray, element_type is
the data type of the elements of the array.
Maximum size of a varray can be changed using the ALTER TYPE statement.
For example,
CREATE Or REPLACE TYPE namearray AS VARRAY (3) OF VARCHAR2(10);
/
Type created.
The basic syntax for creating a VARRAY type within a PL/SQL block is −
TYPE varray_type_name IS VARRAY(n) of <element_type>
For example –
TYPE namearray IS VARRAY(5) OF VARCHAR2(10);
Type grades IS VARRAY(5) OF INTEGER;
Let us now work out on a few examples to understand the concept
Example 1
The following program illustrates the use of varrays
DECLARE type namesarray IS VARRAY(5) OF
VARCHAR2(10); type grades IS VARRAY(5) OF INTEGER;
names namesarray; marks grades; total integer;
BEGIN
names := namesarray('Kavita', 'Pritam', 'Ayan', 'Rishav', 'Aziz');
marks:= grades(98, 97, 78, 87, 92); total := names.count;
142
143
Partition tolerance: This is the ability of the system to keep responding to client requests
even if there’s a communication failure between database partitions. The system will still
function even if network communication between partitions is temporarily lost.
Note that the CAP theorem only applies in cases when there’s a connection failure between
partitions in our cluster. The more reliable our network, the lower the probability we will need
to think about this theorem. The CAP theorem helps us understand that once we partition our
data, we must determine which options best match our business requirements: consistency or
availability. Remember: at most two of the aforementioned three desirable properties can be
fulfilled, so we have to select either consistency or availability.
With MongoDB, you may embed related data in a single structure or document. These schema
are generally known as "denormalized" models, and take advantage of MongoDB's rich
documents. Consider the following diagram:
Embedded data models allow applications to store related pieces of information in the same
database record. As a result, applications may need to issue fewer queries and updates to
complete common operations. In general, use embedded data models when:
• you have "contains" relationships between entities. See Model One-to-One Relationships
with Embedded Documents.
144
"many" or child documents always appear with or are viewed in the context of the
"one" or parent documents. See Model One-to-Many Relationships with Embedded
Documents.
145
In general, embedding provides better performance for read operations, as well as the ability to
request and retrieve related data in a single database operation. Embedded data models make it
possible to update related data in a single atomic write operation.
To access data within embedded documents, use dot notation to "reach into" the embedded
documents. See query for data in arrays and query data in embedded documents for more
examples on accessing data in arrays and embedded documents. Embedded Data Model and
Document Size Limit
Documents in MongoDB must be smaller than the maximum BSON document size.
For bulk binary data, consider GridFS.
Normalized Data Models
Normalized data models describe relationships using references between documents.
146
Read Operations: Read operations retrieve documents from a collection; i.e. query a collection
for documents. MongoDB provides the following methods to read documents from a
collection: db.collection.find()
You can specify query filters or criteria that identify the documents to return.
Delete Operations: Delete operations remove documents from a collection. MongoDB provides
the following methods to delete documents of a collection:
db.collection.deleteOne() New in version 3.2 db.collection.deleteMany()
New in version 3.2
In MongoDB, delete operations target a single collection. All write operations in MongoDB are
atomic on the level of a single document.
You can specify criteria, or filters, that identify the documents to remove. These filters use the
same syntax as read operations.
147
Table:
An HBase table is made up of several columns. The tables in HBase defines upfront during the
time of the schema specification.
Row:
An HBase row consists of a row key and one or more associated value columns. Row keys are the
bytes that are not interpreted. Rows are ordered lexicographically, with the first row appearing
in a table in the lowest order. The layout of the row key is very critical for this purpose.
Column:
A column in HBase consists of a family of columns and a qualifier of columns, which is
identified by a character: (colon).
Column Family:
Apache HBase columns are separated into the families of columns. The column families
physically position a group of columns and their values to increase its performance.
Every row in a table has a similar family of columns, but there may not be anything in a given
family of columns.
148
CRUD Operations
1. Create a data-Hbase
Inserting Data using HBase Shell- to create data in an HBase table. To create data in an HBase
table, the following commands and methods are used: put command, add () method of Put
class, and put () method of HTable class.
As an example, we are going to create the following table in HBase.
149
z Using
put command, you can insert rows into a table. Its syntax is as follows:
put’<tablename>’,’row1’,’<colfamily:colname>’,’<value>’
Inserting the First Row
Let us insert the first-row values into the emp table as shown below.
hbase(main):005:0>put'emp','1','personaldata:name','raju'
0row(s)in0.6600seconds
hbase(main):006:0>put'emp','1','personaldata:city','hyderabad'
0row(s)in0.0410seconds hbase(main):007:0>put'emp','1','professional
data:designation','manager' 0row(s)in0.0240seconds
hbase(main):007:0>put'emp','1','professionaldata:salary','50000'
0row(s)in0.0240seconds
Insert the remaining rows using the put command in the same way. If you insert the whole table,
you will get the following output.
hbase(main):022:0>scan'emp'
ROW COLUMN+CELL
1column=personaldata:city,timestamp=1417524216501,value=hyderabad
1column=personaldata:name,timestamp=1417524185058,value=ramu
1column=professionaldata:designation,timestamp=1417524232601, value=manager
1column=professionaldata:salary,timestamp=1417524244109,value=50000
2column=personaldata:city,timestamp=1417524574905,value=chennai
2column=personaldata:name,timestamp=1417524556125,value=ravi
150
2column=professionaldata:designation,timestamp=1417524592204, value=sr:engg
2column=professionaldata:salary,timestamp=1417524604221,value=30000
3column=personaldata:city,timestamp=1417524681780,value=delhi
3column=personaldata:name,timestamp=1417524672067,value=rajesh
3column=professionaldata:designation,timestamp=1417524693187, value=jr:engg
3column=professionaldata:salary,timestamp=1417524702514, value=25000
Inserting Data Using Java API
You can insert data into Hbase using the add () method of the Put class. You can save it using the
put () method of the HTable class. These classes belong to the org.apache.hadoop.hbase.client
package. Below given are the steps to create data in a Table of HBase.
Step 1: Instantiate the Configuration Class
The Configuration class adds HBase configuration files to its object. You can create a
configuration object using the create () method of the HbaseConfiguration class as shown below.
Configuration conf = HbaseConfiguration.create();
Step 2: Instantiate the HTable Class
You have a class called HTable, an implementation of Table in HBase. This class is used to
communicate with a single HBase table. While instantiating this class, it accepts configuration
object and table name as parameters. You can instantiate HTable class as shown below.
HTable hTable = new HTable(conf, tableName);
Step 3: Instantiate the PutClass
To insert data into an HBase table, the add () method and its variants are used. This method
belongs to Put, therefore instantiate the put class. This class requires the row name you want to
insert the data into, in string format. You can instantiate the Put class as shown below.
Putp=newPut(Bytes.toBytes("row1"));
Step 4: Insert Data
The add () method of Put class is used to insert data. It requires 3-byte arrays representing
column family, column qualifier (column name), and the value to be inserted, respectively.
Insert data into the HBase table using the add () method as shown below.
p.add(Bytes.toBytes("coloumnfamily"),Bytes.toBytes("column
name"),Bytes.toBytes("value")); Step 5: Save the Data in Table
After inserting the required rows, save the changes by adding the put instance to the put ()
method of HTable class as shown below. hTable.put(p);
151
152
hTable.close();
}
}
Compile and execute the above program as shown below.
$javacInsertData.java
$javaInsertData
The following should be the output:
datainserted
2. Updating Data using HBase Shell
You can update an existing cell value using the put command. To do so, just follow the same
syntax and mention your new value as shown below.
put‘tablename’,’row’,'Columnfamily:columnname',’newvalue’
The newly given value replaces the existing value, updating the row. Example
Suppose there is a table in HBase called emp with the following data.
hbase(main):003:0>scan'emp'
ROW COLUMN+CELL
row1column=personal:name,timestamp=1418051555,value=raju
row1column=personal:city,timestamp=1418275907,value=Hyderabad
row1column=professional:designation,timestamp=14180555,value=manager
row1column=professional:salary,timestamp=1418035791555,value=50000
1row(s)in0.0100seconds
The following command will update the city value of the employee named ‘Raju’ to Delhi.
hbase(main):002:0>put'emp','row1','personal:city','Delhi'
0row(s)in0.0400seconds
The updated table looks as follows where you can observe the city of Raju has been changed to
‘Delhi’.
hbase(main):003:0>scan'emp'
ROW COLUMN+CELL
153
row1column=personal:name,timestamp=1418035791555,value=raju
row1column=personal:city,timestamp=1418274645907,value=Delhi
row1column=professional:designation,timestamp=141857555,value=manager
row1column=professional:salary,timestamp=1418039555,value=50000
1row(s)in0.0100seconds
Updating Data Using Java API
You can update the data in a particular cell using the put () method. Follow the steps given
below to update an existing cell value of a table. Step 1: Instantiate the Configuration Class
Configuration class adds HBase configuration files to its object. You can create a configuration
object using the create () method of the HbaseConfiguration class as shown below.
Configurationconf=HbaseConfiguration.create();
Step 2: Instantiate the HTable Class
You have a class called HTable, an implementation of Table in HBase. This class is used to
communicate with a single HBase table. While instantiating this class, it accepts the
onfiguration object and the table name as parameters. You can instantiate the HTable class as
shown below.
HTablehTable=newHTable(conf,tableName);
Step 3: Instantiate the Put Class
To insert data into HBase Table, the add () method and its variants are used. This method
belongs to Put, therefore instantiate the put class. This class requires the row name you want to
insert the data into, in string format. You can instantiate the Put class as shown below.
Putp=newPut(Bytes.toBytes("row1"));
Step 4: Update an Existing Cell
The add () method of Put class is used to insert data. It requires 3-byte arrays representing
column family, column qualifier (column name), and the value to be inserted, respectively.
Insert data into HBase table using the add () method as shown below.
p.add(Bytes.toBytes("coloumnfamily"),Bytes.toBytes("column name"),Bytes.toBytes("value"));
p.add(Bytes.toBytes("personal"),
Bytes.toBytes("city"),Bytes.toBytes("Delih")); Step 5:
Save the Data in Table
154
After inserting the required rows, save the changes by adding the put instance to the put ()
method of the HTable class as shown below.
hTable.put(p);
Step 6: Close HTable Instance
After creating data in HBase Table, close the HTable instance using the close () method as
shown below. hTable.close();
Given below is the complete program to update data in a particular table.
importjava.io.IOException; importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.hbase.HBaseConfiguration;
importorg.apache.hadoop.hbase.client.HTable;
importorg.apache.hadoop.hbase.client.Put;
importorg.apache.hadoop.hbase.util.Bytes;
publicclassUpdateData{ publicstaticvoidmain(String[]args)throwsIOExcepti
on{
//InstantiatingConfigurationclass
Configurationconfig=HBaseConfiguration.create();
//InstantiatingHTableclass
HTablehTable=newHTable(config,"emp");
//InstantiatingPutclass
//acceptsarowname
Putp=newPut(Bytes.toBytes("row1"));
//Updatingacellvalue
p.add(Bytes.toBytes("personal"),
Bytes.toBytes("city"),Bytes.toBytes("Delih")); //SavingtheputInstancetotheHTable.
hTable.put(p);
System.out.println("dataUpdated");
//closingHTable hTable.close();
}
}
155
personal:nametimestamp=1418035791555,value=raju
1row(s)in0.0080seconds Reading Data
Using Java API
To read data from an HBase table, use the get () method of the HTable class. This method
requires an instance of the Get class. Follow the steps given below to retrieve data from the
HBase table.
156
157
byte[]value=result.getValue(Bytes.toBytes("personal"),Bytes.toBytes("name"));
byte[]value1=result.getValue(Bytes.toBytes("personal"),Bytes.toBytes("city")); Given below
is the complete program to read values from an HBase table.
importjava.io.IOException; importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.hbase.HBaseConfiguration;
importorg.apache.hadoop.hbase.client.Get;
importorg.apache.hadoop.hbase.client.HTable;
importorg.apache.hadoop.hbase.client.Result;
importorg.apache.hadoop.hbase.util.Bytes;
publicclassRetriveData{ publicstaticvoidmain(String[]args)throwsIOExcepti
on,Exception{
//InstantiatingConfigurationclass
Configurationconfig=HBaseConfiguration.create();
//InstantiatingHTableclass
HTabletable=newHTable(config,"emp");
158
//InstantiatingGetclass
Getg=newGet(Bytes.toBytes("row1"));
//Readingthedata
Resultresult=table.get(g);
//ReadingvaluesfromResultclassobject
byte[]value=result.getValue(Bytes.toBytes("personal"),Bytes.toBytes("name"));
byte[]value1=result.getValue(Bytes.toBytes("personal"),Bytes.toBytes("city"));
//Printingthevalues
Stringname=Bytes.toString(value);
Stringcity=Bytes.toString(value1);
System.out.println("name:"+name+"city:"+city); }
}
Compile and execute the above program as shown below.
$javacRetriveData.java
$javaRetriveData
The following should be the output: name:Rajucity:Delhi
Deleting a Specific Cell in a Table
Using the delete command, you can delete a specific cell in a table. The syntax of delete
command is as follows:
delete‘<tablename>’,‘<row>’,‘<columnname>’,‘<timestamp>’ Example
Here is an example to delete a specific cell. Here we are deleting the salary.
hbase(main):006:0>delete'emp','1','personaldata:city',
1417521848375
0row(s)in0.0060seconds Deleting All
Cells in a Table
Using the “deleteall” command, you can delete all the cells in a row. Given below is the syntax
of deleteall command. deleteall‘<tablename>’,‘<row>’, Example
Here is an example of “deleteall” command, where we are deleting all the cells of row1 of emp
table.
159
hbase(main):007:0>deleteall'emp','1'
0row(s)in0.0240seconds
Verify the table using the scan command. A snapshot of the table after deleting the table is
given below.
hbase(main):022:0>scan'emp'
ROW COLUMN+CELL
2column=personaldata:city,timestamp=1417524574905,value=chennai
2column=personaldata:name,timestamp=1417524556125,value=ravi
2column=professionaldata:designation,timestamp=1417524204,value=sr:engg
2column=professionaldata:salary,timestamp=1417524604221,value=30000
3column=personaldata:city,timestamp=1417524681780,value=delhi
3column=personaldata:name,timestamp=1417524672067,value=rajesh
3column=professionaldata:designation,timestamp=1417523187,value=jr:engg
3column=professionaldata:salary,timestamp=1417524702514,value=25000
160
Instantiate the Delete class by passing the rowid of the row that is to be deleted, in byte array
format. You can also pass timestamp and Rowlock to this constructor.
Deletedelete=newDelete(toBytes("row1")); Step 4:
Select the Data to be Deleted
You can delete the data using the delete methods of the Delete class. This class has various
delete methods. Choose the columns or column families to be deleted using those methods.
Take a look at the following examples that show the usage of Delete class methods.
delete.deleteColumn(Bytes.toBytes("personal"),Bytes.toBytes("name"));
delete.deleteFamily(Bytes.toBytes("professional")); Step 5: Delete the Data
Delete the selected data by passing the delete instance to the delete () method of the HTable
class as shown below.
table.delete(delete);
Step 6: Close the HTableInstance
After deleting the data, close the HTable Instance.
table.close();
Given below is the complete program to delete data from the HBase table.
importjava.io.IOException; importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.hbase.HBaseConfiguration;
importorg.apache.hadoop.hbase.client.Delete;
importorg.apache.hadoop.hbase.client.HTable;
importorg.apache.hadoop.hbase.util.Bytes;
publicclassDeleteData{
publicstaticvoidmain(String[]args)throwsIOException{
//InstantiatingConfigurationclass
Configurationconf=HBaseConfiguration.create();
//InstantiatingHTableclass
HTabletable=newHTable(conf,"employee");
//InstantiatingDeleteclass
161
Deletedelete=newDelete(Bytes.toBytes("row1"));
delete.deleteColumn(Bytes.toBytes("personal"),Bytes.toBytes("name"));
delete.deleteFamily(Bytes.toBytes("professional"));
//deletingthedata table.delete(delete);
//closingtheHTableobject table.close();
System.out.println("datadeleted......");
}
}
Compile and execute the above program as shown below.
$javacDeletedata.java
$javaDeleteData
The following should be the output:datadeleted
162