5 Data and Knowledge Management: Opening Case
5 Data and Knowledge Management: Opening Case
5 Data and Knowledge Management: Opening Case
LEARNING OBJECTIVES
5.1 Discuss ways that common challenges in managing data can be
addressed using data governance.
5.2 Identify and assess the advantages and disadvantages of
relational databases.
5.3 Define Big Data and explain its basic characteristics.
5.4 Explain the elements necessary to successfully implement and
maintain data warehouses.
5.5 Describe the benefits and challenges of implementing
knowledge management systems in organizations.
5.6 Understand the processes of querying a relational database,
entity-relationship modeling, and normalization and joins.
Opening Case
Data from a Pacemaker Leads to an Arrest
Questions
1. The Electronic Frontier Foundation released a statement that
Americans should not have to make a choice between health
and privacy (www.eff.org/issues/medical-privacy). Do
you agree with this statement? Why or why not? Support your
answer.
2. As we noted in Chapter 1, you are known as Homo conexus
because you practice continuous computing and are
surrounded with intelligent devices. Look at Chapter 3 on
ethics and privacy, and discuss the privacy implications of
being Homo conexus.
Introduction
Information technologies and systems support organizations in
managing—that is, acquiring, organizing, storing, accessing, analyzing,
and interpreting—data. As noted in Chapter 1, when these data are
managed properly, they become information and then knowledge.
Information and knowledge are invaluable organizational resources
that can provide any organization with a competitive advantage.
So, just how important are data and data management to
organizations? From confidential customer information to intellectual
property, to financial transactions to social media posts, organizations
possess massive amounts of data that are critical to their success. Of
course, to benefit from these data, organizations need to manage these
data effectively. This type of management, however, comes at a huge
cost. According to Symantec’s (www.symantec.com) State of
Information Survey, digital information costs organizations worldwide
more than $1 trillion annually. In fact, it makes up roughly half of an
organization’s total value. The survey found that large organizations
spend an average of $40 million annually to maintain and use data,
and small-to-medium-sized businesses spend almost $350,000.
This chapter examines the processes whereby data are transformed
first into information and then into knowledge. Managing data is
critical to all organizations. Few business professionals are
comfortable making or justifying business decisions that are not based
on solid information. This is especially true today, when modern
information systems make access to that information quick and easy.
For example, we have information systems that format data in a way
that managers and analysts can easily understand. Consequently,
these professionals can access these data themselves and then analyze
the data according to their needs. The result is useful information.
Managers can then apply their experience to use this information to
address a business problem, thereby producing knowledge.
Knowledge management, enabled by information technology, captures
and stores knowledge in forms that all organizational employees can
access and apply, thereby creating the flexible, powerful “learning
organization.”
Organizations store data in databases. Recall from Chapter 1 that a
database is a collection of related data files or tables that contain data.
We discuss databases in Section 5.2, focusing on the relational
database model. In Section 5.6, we take a look at the fundamentals of
relational database operations.
Clearly, data and knowledge management are vital to modern
organizations. But, why should you learn about them? The reason is
that you will play an important role in the development of database
applications. The structure and content of your organization’s
database depend on how users (meaning you) define your business
activities. For example, when database developers in the firm’s MIS
group build a database, they use a tool called entity-relationship (ER)
modeling. This tool creates a model of how users view a business
activity. When you understand how to create and interpret an ER
model, then you can evaluate whether the developers have captured
your business activities correctly.
Keep in mind that decisions about data last longer, and have a broader
impact, than decisions about hardware or software. If decisions
concerning hardware are wrong, then the equipment can be replaced
relatively easily. If software decisions turn out to be incorrect, they can
be modified, though not always painlessly or inexpensively. Database
decisions, in contrast, are much harder to undo. Database design
constrains what an organization can do with its data for a long time.
Remember that business users will be stuck with a bad database
design, while the programmers who created the database will quickly
move on to their next projects.
Furthermore, consider that databases typically underlie the enterprise
applications that users access. If there are problems with
organizational databases, then it is unlikely that any applications will
be able to provide the necessary functionality for users. Databases are
difficult to set up properly and to maintain. They are also the
component of an information system that is most likely to receive the
blame when the system performs poorly and the least likely to be
recognized when the system performs well. This is why it is so
important to get database designs right the first time—and you will
play a key role in these designs.
You might also want to create a small, personal database using a
software product such as Microsoft Access. In that case, you will need
to be familiar with at least the basics of the product.
After the data are stored in your organization’s databases, they must
be accessible in a form that helps users to make decisions.
Organizations accomplish this objective by developing data
warehouses. You should become familiar with data warehouses
because they are invaluable decision-making tools. We discuss data
warehouses in Section 5.4.
You will also make extensive use of your organization’s knowledge
base to perform your job. For example, when you are assigned a new
project, you will likely research your firm’s knowledge base to identify
factors that contributed to the success (or failure) of previous, similar
projects. We discuss knowledge management in Section 5.5.
You begin this chapter by examining the multiple challenges involved
in managing data. You then study the database approach that
organizations use to help address these challenges. You turn your
attention to Big Data, which organizations must manage in today’s
business environment. Next, you study data warehouses and data
marts, and you learn how to use them for decision making. You
conclude the chapter by examining knowledge management.
Data Governance
To address the numerous problems associated with managing data,
organizations are turning to data governance. Data governance is
an approach to managing information across an entire organization. It
involves a formal set of business processes and policies that are
designed to ensure that data are handled in a certain, well-defined
fashion. That is, the organization follows unambiguous rules for
creating, collecting, handling, and protecting its information. The
objective is to make information available, transparent, and useful for
the people who are authorized to access it, from the moment it enters
an organization until it becomes outdated and is deleted.
One strategy for implementing data governance is master data
management. Master data management is a process that spans all
of an organization’s business processes and applications. It provides
companies with the ability to store, maintain, exchange, and
synchronize a consistent, accurate, and timely “single version of the
truth” for the company’s master data.
Master data are a set of core data, such as customer, product,
employee, vendor, geographic location, and so on, that span the
enterprise’s information systems. It is important to distinguish
between master data and transactional data. Transactional data,
which are generated and captured by operational systems, describe the
business’s activities, or transactions. In contrast, master data are
applied to multiple transactions, and they are used to categorize,
aggregate, and evaluate the transactional data.
Let’s look at an example of a transaction: You (Mary Jones) purchased
one Samsung 42-inch LCD television, part number 1234, from Bill
Roberts at Best Buy, for $2,000, on April 20, 2017. In this example,
the master data are “product sold,” “vendor,” “salesperson,” “store,”
“part number,” “purchase price,” and “date.” When specific values are
applied to the master data, then a transaction is represented.
Therefore, transactional data would be, respectively, “42-inch LCD
television,” “Samsung,” “Best Buy,” “Bill Roberts,” “1234,” “$2,000,”
and “April 20, 2017.”
An example of master data management is Dallas, Texas, which
implemented a plan for digitizing the city’s public and private records,
such as paper documents, images, drawings, and video and audio
content. The master database can be used by any of the 38 government
departments that have appropriate access. The city is also integrating
its financial and billing processes with its customer relationship
management program. (You will learn about customer relationship
management in Chapter 11.)
How will Dallas use this system? Imagine that the city experiences a
water-main break. Before it implemented the system, repair crews had
to search City Hall for records that were filed haphazardly. Once the
workers found the hard-copy blueprints, they took them to the site
and, after they examined them manually, decided on a plan of action.
In contrast, the new system delivers the blueprints wirelessly to the
laptops of crews in the field, who can magnify or highlight areas of
concern to generate a rapid response. This process reduces the time it
takes to respond to an emergency by several hours.
Along with data governance, organizations use the database approach
to efficiently and effectively manage their data. We discuss the
database approach in Section 5.2.
Before you go on …
1. What are some of the difficulties involved in managing data?
2. Define data governance, master data, and transactional
data.
5.2 The Database Approach
From the mid-1950s, when businesses first adopted computer
applications, until the early 1970s, organizations managed their data
in a file management environment. This environment evolved because
organizations typically automated their functions one application at a
time. Therefore, the various automated systems developed
independently from one another, without any overall planning. Each
application required its own data, which were organized in a data file.
Before you go on …
1. What is a data model?
2. What is a primary key? A secondary key?
3. What is an entity? An attribute? An instance?
4. What are the advantages and disadvantages of relational
databases?
Driver well-being.
Connected cars have functions that involve the driver’s ability and
fitness to drive. These apps include fatigue detection, automatic
detection of inebriated drivers, and the ability of the car to
summon medical assistance if the driver is unable to do so.
Another interesting app involves the use of in-car cameras to
detect distracted driving, such as texting while driving.
Location-based services.
Data from each car can be used to provide location-based offers.
For example, the driver above could receive an offer for a discount
on her favorite cup of coffee as she arrived at the coffee shop.
Questions
1. Describe several other uses (other than the ones discussed in
this case) for data from connected cars.
2. Would data from connected cars be considered Big Data? Why
or why not? Support your answer.
Questions
1. Discuss the advantages of Google Cloud Spanner to
organizations.
2. If your company does not operate globally, would Google
Cloud Spanner still be valuable? Why or why not? Support
your answer.
Microsegmentation of Customers.
Segmentation of a company’s customers means dividing them into
groups that share one or more characteristics. Microsegmentation
simply means dividing customers up into very small groups, or even
down to an individual customer.
For example, Paytronix Systems (www.paytronix.com)
provides loyalty and rewards program software for thousands of
different restaurants. Paytronix gathers restaurant guest data from a
variety of sources beyond loyalty and gift programs, including social
media. Paytronix then analyzes this Big Data to help its restaurant
clients microsegment their guests. Restaurant managers are now able
to more precisely customize their loyalty and gift programs. Since they
have taken these steps, they have noted improved profitability and
customer satisfaction in their restaurants.
Human Resources.
Employee benefits, particularly healthcare, represent a major business
expense. Consequently, some companies have turned to Big Data to
better manage these benefits. Caesars Entertainment
(www.caesars.com), for example, analyzes health-insurance claim
data for its 65,000 employees and their covered family members.
Managers can track thousands of variables that indicate how
employees use medical services, such as the number of their
emergency room visits and whether employees choose a generic or
brand name drug.
Consider the following scenario: Data revealed that too many
employees with medical emergencies were being treated at hospital
emergency rooms rather than at less expensive urgent-care facilities.
The company launched a campaign to remind employees of the high
cost of emergency room visits, and they provided a list of alternative
facilities. Subsequently, 10,000 emergencies shifted to less expensive
alternatives, for a total savings of $4.5 million.
Big Data is also having an impact on hiring. An example is Catalyte
(www.catalyte.io), a technology outsourcing company that hires
teams for programming jobs. Traditional recruiting is typically too
slow, and hiring managers often subjectively choose candidates who
are not the best fit for the job. Catalyte addresses this problem by
requiring candidates to fill out an online assessment. It then uses the
assessment to collect thousands of data points about each candidate.
In fact, the company collects more data based on how candidates
answer than on what they answer.
For example, the assessment might give a problem requiring calculus
to an applicant who is not expected to know the subject. How the
candidate responds—laboring over an answer, answering quickly, and
then returning later, or skipping the problem entirely—provides
insight into how that candidate might deal with challenges that he or
she will encounter on the job. That is, someone who labors over a
difficult question might be effective in an assignment that requires a
methodical approach to problem solving, whereas an applicant who
takes a more aggressive approach might perform better in a different
job setting.
The benefit of this Big Data approach is that it recognizes that people
bring different skills to the table and there is no one-size-fits-all
person for any job. Analyzing millions of data points can reveal which
attributes candidates bring to specific situations.
As one measure of success, employee turnover at Catalyte averages
about 15 percent per year, compared with more than 30 percent for its
U.S. competitors and more than 20 percent for similar companies
overseas.
Product Development.
Big Data can help capture customer preferences and put that
information to work in designing new products. For example, Ford
Motor Company (www.ford.com) was considering a “three blink”
turn indicator that had been available on its European cars for years.
Unlike the turn signals on its U.S. vehicles, this indicator flashes three
times at the driver’s touch and then automatically shuts off.
Ford decided that conducting a full-scale market research test on this
blinker would be too costly and time consuming. Instead, it examined
auto-enthusiast websites and owner forums to discover what drivers
were saying about turn indicators. Using text-mining algorithms,
researchers culled more than 10,000 mentions and then summarized
the most relevant comments.
The results? Ford introduced the three-blink indicator on the Ford
Fiesta it released in 2010, and by 2013 it was available on most Ford
products. Although some Ford owners complained online that they
have had trouble getting used to the new turn indicator, many others
defended it. Ford managers note that the use of text-mining
algorithms was critical in this effort because they provided the
company with a complete picture that would not have been available
using traditional market research.
Operations.
For years, companies have been using information technology to make
their operations more efficient. Consider United Parcel Service (UPS).
The company has long relied on data to improve its operations.
Specifically, it uses sensors in its delivery vehicles that can, among
other things, capture each truck’s speed and location, the number of
times it is placed in reverse, and whether the driver’s seat belt is
buckled. This data is uploaded at the end of each day to a UPS data
center, where it is analyzed overnight. By combining GPS information
and data from sensors installed on more than 46,000 vehicles, UPS
reduced fuel consumption by 8.4 million gallons, and it cut 85 million
miles off its routes.
Marketing.
Marketing managers have long used data to better understand their
customers and to target their marketing efforts more directly. Today,
Big Data enables marketers to craft much more personalized
messages.
The United Kingdom’s InterContinental Hotels Group (IHG;
www.ihg.com) has gathered details about the members of its
Priority Club rewards program, such as income levels and whether
members prefer family-style or business-traveler accommodations.
The company then consolidated all this information with information
obtained from social media into a single data warehouse. Using its
data warehouse and analytics software, the hotelier launched a new
marketing campaign. Where previous marketing campaigns
generated, on average, between 7 and 15 customized marketing
messages, the new campaign generated more than 1,500. IHG rolled
out these messages in stages to an initial core of 12 customer groups,
each of which was defined by 4,000 attributes. One group, for
example, tended to stay on weekends, redeem reward points for gift
cards, and register through IHG marketing partners. Using this
information, IHG sent these customers a marketing message that
alerted them to local weekend events.
The campaign proved to be highly successful. It generated a 35 percent
higher rate of customer conversions, or acceptances, than previous,
similar campaigns.
Government Operations.
Consider the United Kingdom. According to the INRIX Traffic
Scorecard, although the United States has the worst traffic congestion
on average, London topped the world list for metropolitan areas. In
London, drivers wasted an average of 101 hours per year in gridlock.
Congestion is bad for business. The INRIX study estimated that the
cost to the U.K. economy would be £307 billion ($400 billion)
between 2013 and 2030.
Congestion is also harmful to urban resilience, negatively affecting
both environmental and social sustainability, in terms of emissions,
global warming, air quality, and public health. As for the livability of a
modern city, congestion is an important component of the urban
transport user experience (UX).
Calculating levels of UX satisfaction at any given time involves solving
a complex equation with a range of key variables and factors: total
number of transport assets (road and rail capacity, plus parking
spaces), users (vehicles, pedestrians), incidents (roadwork, accidents,
breakdowns), plus expectations (anticipated journey times and
passenger comfort).
The growing availability of Big Data sources within London—for
example, traffic cameras and sensors on cars and roadways—can help
to create a new era of smart transport. Analyzing this Big Data offers
new ways for traffic analysts in London to “sense the city” and enhance
transport via real-time estimation of traffic patterns and rapid
deployment of traffic management strategies.
Before you go on …
1. Define Big Data.
2. Describe the characteristics of Big Data.
3. Describe how companies can use Big Data to a gain
competitive advantage.
Data Integration.
In addition to storing data in their source systems, organizations need
to extract the data, transform them, and then load them into a data
mart or warehouse. This process is often called ETL, although the term
data integration is increasingly being used to reflect the growing
number of ways that source system data can be handled. For example,
in some cases, data are extracted, loaded into a mart or warehouse,
and then transformed (i.e., ELT rather than ETL).
Data extraction can be performed either by handwritten code such as
SQL queries or by commercial data-integration software. Most
companies employ commercial software. This software makes it
relatively easy to (1) specify the tables and attributes in the source
systems that are to be used; (2) map and schedule the movement of
the data to the target, such as a data mart or warehouse; (3) make the
required transformations; and, ultimately, (4) load the data.
After the data are extracted, they are transformed to make them more
useful. For example, data from different systems may be integrated
around a common key, such as a customer identification number.
Organizations adopt this approach to create a 360-degree view of all of
their interactions with their customers. As an example of this process,
consider a bank. Customers can engage in a variety of interactions:
visiting a branch, banking online, using an ATM, obtaining a car loan,
and more. The systems for these touch points—defined as the
numerous ways that organizations interact with customers, such as e-
mail, the Web, direct contact, and the telephone—are typically
independent of one another. To obtain a holistic picture of how
customers are using the bank, the bank must integrate the data from
the various source systems into a data mart or warehouse.
Other kinds of transformations also take place. For example, format
changes to the data may be required, such as using male and female to
denote gender, as opposed to 0 and 1 or M and F. Aggregations may be
performed, say on sales figures, so that queries can use the summaries
rather than recalculating them each time. Data-cleansing software
may be used to clean up the data; for example, eliminating duplicate
records for the same customer.
Finally, data are loaded into the warehouse or mart during a specified
period known as the “load window.” This window is becoming smaller
as companies seek to store ever-fresher data in their warehouses. For
this reason, many companies have moved to real-time data
warehousing, where data are moved using data-integration processes
from source systems to the data warehouse or mart almost instantly.
For example, within 15 minutes of a purchase at Walmart, the details
of the sale have been loaded into a warehouse and are available for
analysis.
Data Quality.
The quality of the data in the warehouse must meet users’ needs. If it
does not, then users will not trust the data and ultimately will not use
it. Most organizations find that the quality of the data in source
systems is poor and must be improved before the data can be used in
the data warehouse. Some of the data can be improved with data-
cleansing software. The better, long-term solution, however, is to
improve the quality at the source system level. This approach requires
the business owners of the data to assume responsibility for making
any necessary changes to implement this solution.
To illustrate this point, consider the case of a large hotel chain that
wanted to conduct targeted marketing promotions using zip code data
it collected from its guests when they checked in. When the company
analyzed the zip code data, they discovered that many of the zip codes
were 99999. How did this error occur? The answer is that the clerks
were not asking customers for their zip codes, but they needed to enter
something to complete the registration process. A short-term solution
to this problem was to conduct the marketing campaign using city and
state data instead of zip codes. The long-term solution was to make
certain the clerks entered the actual zip codes. The latter solution
required the hotel managers to assume responsibility for making
certain their clerks enter the correct data.
Governance.
To ensure that BI is meeting their needs, organizations must
implement governance to plan and control their BI activities.
Governance requires that people, committees, and processes be in
place. Companies that are effective in BI governance often create a
senior-level committee composed of vice presidents and directors who
(1) ensure that the business strategies and BI strategies are in
alignment, (2) prioritize projects, and (3) allocate resources. These
companies also establish a middle management–level committee that
oversees the various projects in the BI portfolio to ensure that these
projects are being completed in accordance with the company’s
objectives. Finally, lower-level operational committees perform tasks
such as creating data definitions and identifying and solving data
problems. All of these committees rely on the collaboration and
contributions of business users and IT personnel.
Users.
Once the data are loaded in a data mart or warehouse, they can be
accessed. At this point, the organization begins to obtain business
value from BI; all of the prior stages constitute creating BI
infrastructure.
There are many potential BI users, including IT developers; frontline
workers; analysts; information workers; managers and executives; and
suppliers, customers, and regulators. Some of these users are
information producers whose primary role is to create information for
other users. IT developers and analysts typically fall into this category.
Other users—including managers and executives—are information
consumers, because they use information created by others.
Companies have reported hundreds of successful data-warehousing
applications. You can read client success stories and case studies at the
websites of vendors such as NCR Corp. (www.ncr.com) and Oracle
(www.oracle.com). For a more detailed discussion, visit the Data
Warehouse Institute (www.tdwi.org). The benefits of data
warehousing include the following:
End users can access needed data quickly and easily through Web
browsers because these data are located in one place.
End users can conduct extensive analysis with data in ways that
were not previously possible.
End users can obtain a consolidated view of organizational data.
These benefits can improve business knowledge, provide competitive
advantage, enhance customer service and satisfaction, facilitate
decision making, and streamline business processes.
Despite their many benefits, data warehouses have some limitations.
IT’s About Business 5.3 points out these limitations and considers an
emerging solution; namely, data lakes.
Before you go on …
1. Differentiate between data warehouses and data marts.
2. Describe the characteristics of a data warehouse.
3. What are three possible architectures for data warehouses and
data marts in an organization?
Knowledge.
In the information technology context, knowledge is distinct from data
and information. As you learned in Chapter 1, data are a collection of
facts, measurements, and statistics; information is organized or
processed data that are timely and accurate. Knowledge is information
that is contextual, relevant, and useful. Simply put, knowledge is
information in action. Intellectual capital (or intellectual assets)
is another term for knowledge.
To illustrate, a bulletin listing all of the courses offered by your
university during one semester would be considered data. When you
register, you process the data from the bulletin to create your schedule
for the semester. Your schedule would be considered information.
Awareness of your work schedule, your major, your desired social
schedule, and characteristics of different faculty members could be
construed as knowledge, because it can affect the way you build your
schedule. You see that this awareness is contextual and relevant (to
developing an optimal schedule of classes) as well as useful (it can lead
to changes in your schedule). The implication is that knowledge has
strong experiential and reflective elements that distinguish it from
information in a given context. Unlike information, knowledge can be
used to solve a problem.
Numerous theories and models classify different types of knowledge.
In the next section, we will focus on the distinction between explicit
knowledge and tacit knowledge.
Query Languages
The most commonly performed database operation is searching for
information. Structured query language (SQL) is the most
popular query language used for interacting with a database. SQL
allows people to perform complicated searches by using relatively
simple statements or key words. Typical key words are SELECT (to
choose a desired attribute), FROM (to specify the table or tables to be
used), and WHERE (to specify conditions to apply in the query).
To understand how SQL works, imagine that a university wants to
know the names of students who will graduate cum laude (but not
magna or summa cum laude) in May 2018. (Refer to Figure 5.3 in this
chapter.) The university IT staff would query the student relational
database with an SQL statement such as the following:
SELECT Student_Name
FROM Student_Database
WHERE Grade_Point_Average > = 3.40 and
Grade_Point_Average < 3.60.
The SQL query would return John Jones and Juan Rodriguez.
Another way to find information in a database is to use query by
example (QBE). In QBE, the user fills out a grid or template—also
known as a form—to construct a sample or a description of the data
desired. Users can construct a query quickly and easily by using drag-
and-drop features in a DBMS such as Microsoft Access. Conducting
queries in this manner is simpler than keying in SQL commands.
Entity–Relationship Modeling
Designers plan and create databases through the process of entity–
relationship (ER) modeling, using an entity–relationship
(ER) diagram. There are many approaches to ER diagramming. You
will see one particular approach here, but there are others. The good
news is that if you are familiar with one version of ER diagramming,
then you will be able to easily adapt to any other version.
ER diagrams consist of entities, attributes, and relationships. To
properly identify entities, attributes, and relationships, database
designers first identify the business rules for the particular data model.
Business rules are precise descriptions of policies, procedures, or
principles in any organization that stores and uses data to generate
information. Business rules are derived from a description of an
organization’s operations, and help to create and enforce business
processes in that organization. Keep in mind that you determine these
business rules, not the MIS department.
Entities are pictured in rectangles, and relationships are described on
the line between two entities. The attributes for each entity are listed,
and the primary key is underlined. The data dictionary provides
information on each attribute, such as its name, if it is a key, part of a
key, or a non-key attribute; the type of data expected (alphanumeric,
numeric, dates, etc.); and valid values. Data dictionaries can also
provide information on why the attribute is needed in the database;
which business functions, applications, forms, and reports use the
attribute; and how often the attribute should be updated.
ER modeling is valuable because it allows database designers to
communicate with users throughout the organization to ensure that all
entities and the relationships among the entities are represented. This
process underscores the importance of taking all users into account
when designing organizational databases. Notice that all entities and
relationships in our example are labeled in terms that users can
understand.
Relationships illustrate an association between entities. The degree
of a relationship indicates the number of entities associated with a
relationship. A unary relationship exists when an association is
maintained within a single entity. A binary relationship exists
when two entities are associated. A ternary relationship exists
when three entities are associated. In this chapter, we discuss only
binary relationships because they are the most common. Entity
relationships may be classified as one-to-one, one-to-many, or many-
to-many. The term connectivity describes the relationship
classification.
Connectivity and cardinality are established by the business rules of a
relationship. Cardinality refers to the maximum number of times an
instance of one entity can be associated with an instance in the related
entity. Cardinality can be mandatory single, optional single,
mandatory many, or optional many. Figure 5.9 displays the
cardinality symbols. Note that we have four possible cardinality
symbols: mandatory single, optional single, mandatory many, and
optional many.
FIGURE 5.15 First normal form for data from pizza shop.
Consider the table in Figure 5.15, and notice the very first column
(labeled Order Number). This column contains multiple entries for
each order—three rows for Order Number 1116 and three rows for
Order Number 1117. These multiple rows for an order are called
repeating groups. The table in Figure 5.15 also contains multiple
entities: ORDER, CUSTOMER, and PIZZA. Therefore, we move on to
second normal form.
To produce second normal form, we break the table in Figure 5.15 into
smaller tables to eliminate some of its data redundancy. Second
normal form does not allow partial functional dependencies. That is,
in a table in second normal form, every non-key attribute must be
functionally dependent on the entire primary key of that table. Figure
5.16 shows the data from the pizza shop in second normal form.
FIGURE 5.16 Second normal form for data from pizza
shop.
If you examine Figure 5.16, you will see that second normal form has
not eliminated all the data redundancy. For example, each Order
Number is duplicated three times, as are all customer data. In third
normal form, non-key attributes are not allowed to define other non-
key attributes. That is, third normal form does not allow transitive
dependencies in which one non-key attribute is functionally
dependent on another. In our example, customer information depends
both on Customer ID and Order Number. Figure 5.17 shows the data
from the pizza shop in third normal form. Third normal form structure
has the following important points:
It is completely free of data redundancy.
All foreign keys appear where needed to link related tables.
FIGURE 5.17 Third normal form for data from pizza shop.
Let’s look at the primary and foreign keys for the tables in third
normal form:
The ORDER relation: The primary key is Order Number and the
foreign key is Customer ID.
The CUSTOMER relation: The primary key is Customer ID.
The PIZZA relation: The primary key is Pizza Code.
The ORDER–PIZZA relation: The primary key is a composite key,
consisting of two foreign keys, Order Number and Pizza Code.
Now consider an order at the pizza shop. The tables in third normal
form can produce the order in the following manner by using the join
operation (see Figure 5.18). The join operation combines records
from two or more tables in a database to obtain information that is
located in different tables. In our example, the join operation
combines records from the four normalized tables to produce an
ORDER. Here is how the join operation works:
The ORDER relation provides the Order Number (the primary
key), Order Date, and Total Price.
The primary key of the ORDER relation (Order Number) provides
a link to the ORDER–PIZZA relation (the link numbered 1 in
Figure 5.18).
The ORDER–PIZZA relation supplies the Quantity to ORDER.
The primary key of the ORDER–PIZZA relation is a composite
key that consists of Order Number and Pizza Code. Therefore, the
Pizza Code component of the primary key provides a link to the
PIZZA relation (the link numbered 2 in Figure 5.18).
The PIZZA relation supplies the Pizza Name and Price to ORDER.
The Customer ID in ORDER (a foreign key) provides a link to the
CUSTOMER relation (the link numbered 3 in Figure 5.18).
The CUSTOMER relation supplies the Customer FName,
Customer LName, Customer Address, and Zip Code to ORDER.
Before you go on …
1. What is structured query language?
2. What is query by example?
3. What is an entity? An attribute? A relationship?
4. Describe one-to-one, one-to-many, and many-to-many
relationships.
5. What is the purpose of normalization?
6. Why do we need the join operation?
Summary
5.1 Discuss ways that common challenges in managing data can be
addressed using data governance.
The following are three common challenges in managing data:
Data are scattered throughout organizations and are collected by
many individuals using various methods and devices. These data
are frequently stored in numerous servers and locations and in
different computing systems, databases, formats, and human and
computer languages.
Data come from multiple sources.
Information systems that support particular business processes
impose unique requirements on data, which results in repetition
and conflicts across an organization.
One strategy for implementing data governance is master data
management. Master data management provides companies with the
ability to store, maintain, exchange, and synchronize a consistent,
accurate, and timely “single version of the truth” for the company’s
core master data. Master data management manages data gathered
from across an organization, manages data from multiple sources, and
manages data across business processes within an organization.
5.2 Discuss the advantages and disadvantages of relational databases.
Relational databases enable people to compare information quickly by
row or column. Users also can easily retrieve items by finding the
point of intersection of a particular row and column. However, large-
scale relational databases can be composed of numerous interrelated
tables, making the overall design complex, with slow search and access
times.
5.3 Define Big Data and its basic characteristics.
Big Data is composed of high-volume, high-velocity, and high-variety
information assets that require new forms of processing to enhance
decision making, lead to insights, and optimize business processes. Big
Data has three distinct characteristics that distinguish it from
traditional data: volume, velocity, and variety.
Volume: Big Data consists of vast quantities of data.
Velocity: Big Data flows into an organization at incredible speeds.
Variety: Big Data includes diverse data in differing formats.
5.4 Explain the elements necessary to successfully implement and
maintain data warehouses.
To successfully implement and maintain a data warehouse, an
organization must do the following:
Link source systems that provide data to the warehouse or mart.
Prepare the necessary data for the data warehouse using data
integration technology and processes.
Decide on an appropriate architecture for storing data in the data
warehouse or data mart.
Select the tools and applications for the variety of organizational
users.
Establish appropriate metadata, data quality, and governance
processes to ensure that the data warehouse or mart meets its
purposes.
5.5 Describe the benefits and challenges of implementing knowledge
management systems in organizations.
Organizations can realize many benefits with KMSs, including the
following:
Best practices readily available to a wide range of employees
Improved customer service
More efficient product development
Improved employee morale and retention
Challenges to implementing KMSs include the following:
Employees must be willing to share their personal tacit
knowledge.
Organizations must create a knowledge management culture that
rewards employees who add their expertise to the knowledge
base.
The knowledge base must be continually maintained and updated.
Companies must be willing to invest in the resources needed to
carry out these operations.
5.6 Understand the processes of querying a relational database,
entity-relationship modeling, and normalization and joins.
The most commonly performed database operation is requesting
information. Structured query language is the most popular query
language used for this operation. SQL allows people to perform
complicated searches by using relatively simple statements or key
words. Typical key words are SELECT (to specify a desired attribute),
FROM (to specify the table to be used), and WHERE (to specify
conditions to apply in the query).
Another way to find information in a database is to use query by
example. In QBE, the user fills out a grid or template—also known as a
form—to construct a sample or a description of the data desired. Users
can construct a query quickly and easily by using drag-and-drop
features in a DBMS such as Microsoft Access. Conducting queries in
this manner is simpler than keying in SQL commands.
Designers plan and create databases through the process of entity–
relationship modeling, using an entity–relationship diagram. ER
diagrams consist of entities, attributes, and relationships. Entities are
pictured in boxes, and relationships are represented as diamonds. The
attributes for each entity are listed, and the primary key is underlined.
ER modeling is valuable because it allows database designers to
communicate with users throughout the organization to ensure that all
entities and the relationships among the entities are represented. This
process underscores the importance of taking all users into account
when designing organizational databases. Notice that all entities and
relationships in our example are labeled in terms that users can
understand.
Normalization is a method for analyzing and reducing a relational
database to its most streamlined form to ensure minimum
redundancy, maximum data integrity, and optimal processing
performance. When data are normalized, attributes in each table
depend only on the primary key.
The join operation combines records from two or more tables in a
database to produce information that is located in different tables.
Chapter Glossary
attribute Each characteristic or quality of a particular entity.
best practices The most effective and efficient ways to accomplish
business processes.
Big Data A collection of data so large and complex that it is difficult
to manage using traditional database management systems.
binary relationship A relationship that exists when two entities are
associated.
bit A binary digit—that is, a 0 or a 1.
business rules Precise descriptions of policies, procedures, or
principles in any organization that stores and uses data to generate
information.
byte A group of eight bits that represents a single character.
clickstream data Data collected about user behavior and browsing
patterns by monitoring users’ activities when they visit a website.
connectivity Describes the classification of a relationship: one-to-
one, one-to-many, or many-to-many.
database management system (DBMS) The software program
(or group of programs) that provides access to a database.
data dictionary A collection of definitions of data elements; data
characteristics that use the data elements; and the individuals,
business functions, applications, and reports that use these data
elements.
data file (also table) A collection of logically related records.
data governance An approach to managing information across an
entire organization.
data lake A central repository that stores all of an organization’s data,
regardless of their source or format.
data mart A low-cost, scaled-down version of a data warehouse that
is designed for the end-user needs in a strategic business unit (SBU) or
a department.
data model A diagram that represents entities in the database and
their relationships.
data warehouse A repository of historical data that are organized by
subject to support decision makers in the organization.
entity Any person, place, thing, or event of interest to a user.
entity–relationship (ER) diagram Document that shows data
entities and attributes and relationships among them.
entity–relationship (ER) modeling The process of designing a
database by organizing data entities to be used and identifying the
relationships among them.
explicit knowledge The more objective, rational, and technical
types of knowledge.
field A characteristic of interest that describes an entity.
foreign key A field (or group of fields) in one table that uniquely
identifies a row (or record) of another table.
instance Each row in a relational table, which is a specific, unique
representation of the entity.
intellectual capital (or intellectual assets) Other terms for
knowledge.
join operation A database operation that combines records from two
or more tables in a database.
knowledge management (KM) A process that helps organizations
identify, select, organize, disseminate, transfer, and apply information
and expertise that are part of the organization’s memory and that
typically reside within the organization in an unstructured manner.
knowledge management systems (KMSs) Information
technologies used to systematize, enhance, and expedite intra- and
interfirm knowledge management.
master data A set of core data, such as customer, product, employee,
vendor, geographic location, and so on, that spans an enterprise’s
information systems.
master data management A process that provides companies with
the ability to store, maintain, exchange, and synchronize a consistent,
accurate, and timely “single version of the truth” for the company’s
core master data.
multidimensional structure Storage of data in more than two
dimensions; a common representation is the data cube.
normalization A method for analyzing and reducing a relational
database to its most streamlined form to ensure minimum
redundancy, maximum data integrity, and optimal processing
performance.
primary key A field (or attribute) of a record that uniquely identifies
that record so that it can be retrieved, updated, and sorted.
query by example (QBE) To obtain information from a relational
database, a user fills out a grid or template—also known as a form—to
construct a sample or a description of the data desired.
record A grouping of logically related fields.
relational database model Data model based on the simple
concept of tables in order to capitalize on characteristics of rows and
columns of data.
relationships Operators that illustrate an association between two
entities.
secondary key A field that has some identifying information, but
typically does not uniquely identify a record with complete accuracy.
structured data Highly organized data in fixed fields in a data
repository such as a relational database that must be defined in terms
of field name and type (e.g., alphanumeric, numeric, and currency).
structured query language (SQL) The most popular query
language for requesting information from a relational database.
table A grouping of logically related records.
tacit knowledge The cumulative store of subjective or experiential
learning, which is highly personal and hard to formalize.
ternary relationship A relationship that exists when three entities
are associated.
transactional data Data generated and captured by operational
systems that describe the business’s activities, or transactions.
unary relationship A relationship that exists when an association is
maintained within a single entity.
unstructured data Data that do not reside in a traditional relational
database.
Discussion Questions
1. Is Big Data really a problem on its own, or are the use, control, and
security of the data the true problems? Provide specific examples to
support your answer.
2. What are the implications of having incorrect data points in your
Big Data? What are the implications of incorrect or duplicated
customer data? How valuable are decisions that are based on faulty
information derived from incorrect data?
3. Explain the difficulties involved in managing data.
4. What are the problems associated with poor-quality data?
5. What is master data management? What does it have to do with
high-quality data?
6. Explain why master data management is so important in companies
that have multiple data sources.
7. Describe the advantages and disadvantages of relational databases.
8. Explain why it is important to capture and manage knowledge.
9. Compare and contrast tacit knowledge and explicit knowledge.
10. Draw the entity–relationship diagram for a company that has
departments and employees. In this company, a department must
have at least one employee, and company employees may work in only
one department.
11. Draw the entity–relationship diagram for library patrons and the
process of checking out books.
12. You are working at a doctor’s office. You gather data on the
following entities: PATIENT, PHYSICIAN, PATIENT DIAGNOSIS,
and TREATMENT. Develop a table for the entity, PATIENT VISIT.
Decide on the primary keys and/or foreign keys that you want to use
for each entity.
Problem-Solving Activities
1. Access various employment websites (e.g., www.monster.com
and www.dice.com) and find several job descriptions for a database
administrator. Are the job descriptions similar? What are the salaries
offered in these positions?
2. Access the websites of several real estate companies. Find the sites
that do the following: take you through a step-by-step process for
buying a home, provide virtual reality tours of homes in your price
range (say, $200,000 to $250,000) and location, provide mortgage
and interest rate calculators, and offer financing for your home. Do the
sites require that you register to access their services? Can you request
that an e-mail be sent to you when properties in which you might be
interested become available? How do the processes outlined influence
your likelihood of selecting a particular company for your real estate
purchase?
3. It is possible to find many websites that provide demographic
information. Access several of these sites and see what they offer. Do
the sites differ in the types of demographic information they offer? If
so, how? Do the sites require a fee for the information they offer?
Would demographic information be useful to you if you wanted to
start a new business? If so, how and why?
4. Search the Web for uses of Big Data in homeland security.
Specifically, read about the spying by the U.S. National Security
Agency (NSA). What role did technology and Big Data play in this
questionable practice?
5. Search the web for the article “Why Big Data and Privacy Are Often
at Odds.” What points does this article present concerning the delicate
balance between shared data and customer privacy?
6. Access the websites of IBM (www.ibm.com) and Oracle
(www.oracle.com), and trace the capabilities of their latest data
management products, including Web connections.
7. Enter the website of the Gartner Group (www.gartner.com).
Examine the company’s research studies pertaining to data
management. Prepare a report on the state of the art of data
management.
8. Diagram a knowledge management system cycle for a fictional
company that sells customized T-shirts to students.
Chapter Closing Case
Data Enhances Fans’ Experience of the Tour de
France