Untitled
Untitled
Untitled
Many ETL solutions are cloud-based, which accounts for their speed and scalability.
But large enterprises with traditional, on-premise infrastructure and data
management processes often use custom-built scripts to collect and perform data
loading on their own data into storage systems through customized configurations.
This can:
Slow down analysis. Each time a data source is added or changed, the system has to
be reconfigured, which takes time and hampers the ability to make quick decisions.
Increase the likelihood of errors. Changes and reconfigurations open up the door
for human error, duplicate or missing data, and other problems.
Require specialized knowledge. In-house IT teams often lack the skill (and
bandwidth) needed to code and monitor ETL functions themselves.
Require costly equipment. In addition to investment in the right human resources,
organizations have to purchase, house, and maintain hardware and other equipment to
run the process on-site.
Unorganized Data: Loading your data can become unorganized very fast. For ETL
voyagers, common roadblocks that many encounters early on can be resolved with
proper planning and delivery.
Universal formatting: Before you begin loading your data, make sure that you
identify where it is coming from and where you want to go.
Loss of data: Tracking the status of all data is critical for a smooth loading
process.
Speed: Although it's exciting to be closer to your final destination, do not rush
through this phase. Errors are most likely to occur during this time
Full Load: This is where all of your data is selected, moved in bulk, and then
replaced by new data. Although it is not as complex to navigate through, loading
time is much slower. With the overwhelming amount of data being moved at once, it
is much easier for data to get lost within the big move.
-Incremental Load: This is where you are moving new data in intervals. Due to its
intricate nature, delivery time is much faster than its counterpart. However, this
speed comes at a cost. Incremental loads are more likely to encounter problems due
to the nature of having to manage them as individual batches rather than one big
group. Incremental Load Periodically applies ongoing changes as per the
requirement. After the data is loaded into the data warehouse database, verify the
referential integrity between the dimensions and the fact tables to ensure that all
records belong to the appropriate records in the other tables. The DBA must verify
that each record in the fact table is related to one record in each dimension table
that will be used in combination with that fact table.
-Initial Load: For the very first time loading all the data warehouse tables.
Full Refresh: Deleting the contents of a table and reloading it with fresh data.
OLAP Concepts -
What is OLAP? Operations of OLAP?
OLAP (for online analytical processing) is software for performing multidimensional
analysis at high speeds on large volumes of data from a data warehouse, data mart,
or some other unified, centralized data store.
Most business data have multiple dimensions-multiple categories into which the data
are broken down for presentation, tracking, or analysis. For example, sales figures
might have several dimensions related to location (region, country, state/province,
store), time (year, month, week, day), product (clothing, men/women/children,
brand, type), and more.
But in a data warehouse, data sets are stored in tables, each of which can organize
data into just two of these dimensions at a time.
OLAP extracts data from multiple relational data sets and reorganizes it into a
multidimensional format that enables very fast processing and very insightful
analysis.
Drill-down
The drill-down operation converts less-detailed data into more-detailed data
through one of two methods-moving down in the concept hierarchy or adding a new
dimension to the cube. For example, if you view sales data for an organization's
calendar or fiscal quarter, you can drill-down to see sales for each month, moving
down in the concept hierarchy of the "time" dimension.
Roll up
Roll up is the opposite of the drill-down function-it aggregates data on an OLAP
cube by moving up in the concept hierarchy or by reducing the number of dimensions.
For example, you could move up in the concept hierarchy of the "location" dimension
by viewing each country's data, rather than each city.
Pivot
The pivot function rotates the current cube view to display a new representation of
the data-enabling dynamic multidimensional views of data. The OLAP pivot function
is comparable to the pivot table feature in spreadsheet software, such as Microsoft
Excel, but while pivot tables in Excel can be challenging, OLAP pivots are
relatively easier to use (less expertise is required) and have a faster response
time and query performance.
ROLAP
ROLAP stands for Relational OLAP Model, a relational OLAP application.
These are servers that sit between a relational back-end server and the user front-
end tools.
They save and manage warehouse data via a relational or extended-relational
database management system, and they employ OLAP middleware to fill in the gaps.
ROLAP models execute optimization for each DBMS back end, aggregate navigation
logic implementation, and other tools and services.
ROLAP systems mainly use data from relational databases, where the base data and
dimension tables are stored as relational tables. This model also allows for
multidimensional data analysis.
This method works by modifying the data in a relational database to simulate the
slicing and dicing functionality of standard OLAP. Each method of slicing and
dicing is essentially the same as adding a "WHERE" clause to a SQL statement.
Advantages
Can Handle Large Volumes of Information: The data size limitation of ROLAP
technology depends on the data size of the underlying RDBMS. So, ROLAP itself does
not restrict the data amount.
RDBMS already comes with a lot of features. So ROLAP technologies, (which works on
top of the RDBMS) can control these functionalities.
Disadvantages
Performance may be Slow: Each ROLAP report is a SQL query (or multiple SQL queries)
in a relational database, and the query time can be extended if the underlying data
is substantial.
SQL Functions Limit ROLAP Technology: SQL statements are used to query a relational
database, and SQL statements do not meet all needs
MOLAP
MOLAP stands for Multidimensional OLAP Model, an application based on
multidimensional DBMSs.
The foundation of a MOLAP model is a native logical model that allows
multidimensional data and operations directly. Data is physically stored in
multidimensional arrays and accessed using positional algorithms.
The scalability of ROLAP technology is generally higher than that of MOLAP
technology.
One of the key differences between MOLAP and ROLAP is that data is summarised and
stored efficiently in a multidimensional cube rather than a relational database.
Data is formatted into proprietary forms per the client's reporting requirements in
the MOLAP model, with computations pre-generated on the cubes.
Advantages
Excellent Performance:
MOLAP cubes are designed for quick data retrieval and are ideal for slicing and
dicing activities.
Can Conduct Sophisticated Calculations:
When the cube is constructed, all evaluations are pre-generated. As a result, not
only are complex calculations possible, but they also return rapidly.
Disadvantages
It Can Only Handle a Certain Amount of Data:
Because all calculations are done when the cube is produced, a vast quantity of
data cannot be stored in the cube itself.
Additional Investment is Required:
Cube technology is usually proprietary and not already in use within the company.
As a result, additional human and capital resources will likely be required to
implement MOLAP technology