My BAAssignmentrony
My BAAssignmentrony
My BAAssignmentrony
RONY K J
Re-Develop Products
Big Data is one of the best ways to collect and use feedback. It helps you understand how
customers perceive your services and products. Thus, you’re able to make necessary changes,
and re-develop your products. When you analyze unstructured social media text, it allows you to
uncover general feedback from your customers. You can even disintegrate the feedback in
various geographical locations and demographic groups.
In addition to this, Big Data allows you to test numerous variations of high end computer-aided
designs within seconds. For instance, you can gather information about lead times, material affect
costs, performance and more. It allows you to raise the productivity and efficiency of various
production processes.
Get more data
Data that was previously floating around and perhaps regarded useless can now be gathered and
used by businesses to improve their productivity.
There is a vast amount of data being produced on online platforms such as social media. Big Data
technologies can efficiently manage this data in a profitable way. Processing such data can help
out in solving crises by enhancing response time.
Decision making
With the structured and unstructured data, managers and executives of big and small business
enterprises can gain important insights that will enable them to make timely decisions.
Better analytics
Great data platform processing tools like Apache Kafka have made the analysis of vital data
instantaneous. A business can gain instant insights into certain customer behavioral trends to
enable it to react promptly to avoid losses.
Learning
Most learning materials that existed in hard copies are now available as soft copies on various
online platforms. Teachers can utilize applications such as Bubble Score in conveying multiple-
choice tests in a more transparent process that allows the use of mobile device cameras. Such
equipment along with Big Data has made it possible for teachers to rank books and trail
development by sending outputs. A data-driven classroom is an easier way of determining what
students are learning and what are the most suitable levels of studying certain subjects? The e-
learning digital materials are produced through the use of prognostic analytics that enables the
producers to merge a lecture plan to particular students’ learning needs.
Healthcare
Technological innovations such as Big Data have greatly influenced the operations in the
healthcare industry. Analysis of big data has boosted the overall efficiency of healthcare service
delivery. Both physicians and patients can now interact and analyze the progress of the patient’s
treatment by checking out the patient’s history.
Public Sector
Several departments in the public sector such as security agencies and municipalities utilize Big
Data analytic tools in processing vital data. Deceit or fraud recognition, economic or power
investigations, environmentally interconnected explorations and ecological fortifications are just
some of the examples where the public sector has used Big Data to restructure operations for
efficiency.
Data visualization is the process of displaying data/information in graphical charts, figures and
bars.
It is used as means to deliver visual reporting to users for the performance, operations or general
statistics of an application, network, hardware or virtually any IT asset.
Data visualization has become the de facto standard for modern business intelligence (BI). The
success of the two leading vendors in the BI space, Tableau and Qlik -- both of which heavily
emphasize visualization -- has moved other vendors toward a more visual approach in their
software. Virtually all BI software has strong data visualization functionality.
Data visualization tools have been important in democratizing data and analytics and making
data-driven insights available to workers throughout an organization. They are typically easier to
operate than traditional statistical analysis software or earlier versions of BI software. This has
led to a rise in lines of business implementing data visualization tools on their own, without
support from IT.
Data visualization software also plays an important role in big data and advanced analytics
projects. As businesses accumulated massive troves of data during the early years of the big data
trend, they needed a way to quickly and easily get an overview of their data. Visualization tools
were a natural fit.
Visualization is central to advanced analytics for similar reasons. When a data scientist is writing
advanced predictive analytics or machine learning algorithms, it becomes important to visualize
the outputs to monitor results and ensure that models are performing as intended. This is
because visualizations of complex algorithms are generally easier to interpret than numerical
outputs.
Examples of data visualization
Data visualization tools can be used in a variety of ways. The most common use today is as a BI
reporting tool. Users can set up visualization tools to generate automatic dashboards that track
company performance across key performance indicators and visually interpret the results.
Many business departments implement data visualization software to track their own initiatives.
For example, a marketing team might implement the software to monitor the performance of an
email campaign, tracking metrics like open rate, click-through rate and conversion rate.
As data visualization vendors extend the functionality of these tools, they are increasingly being
used as front ends for more sophisticated big data environments. In this setting, data
visualization software helps data engineers and scientists keep track of data sources and do basic
exploratory analysis of data sets prior to or after more detailed advanced analyses.
Supply chain analytics is the application of mathematics, statistics, predictive modeling and
machine-learning techniques to find meaningful patterns and knowledge in order, shipment and
transactional and sensor data. An important goal of supply chain analytics is to improve
forecasting and efficiency and be more responsive to customer needs. For example, predictive
analytics on point-of-sale terminal data stored in a demand signal repository can help a business
anticipate consumer demand, which in turn can lead to cost-saving adjustments to inventory and
faster delivery.
Supply chain analytics software is generally available in two forms: embedded in supply chain
software, or in a separate, dedicated business intelligence and analytics tool that has access to
supply chain data. Most ERP vendors offer supply chain analytics features, as do vendors of
specialized supply chain management software.
Some ERP and SCM vendors have begun applying complex event processing (CEP) to their
platforms for real-time supply chain analytics. Most ERP and SCM vendors have one-to-one
integrations but there is no standard. However, the Supply Chain Operations Reference (SCOR)
model provides standard metrics for comparing supply chain performance to industry
benchmarks.
Ideally, supply chain analytics software would be applied to the entire chain, but in practice, it is
often focused on key operational subcomponents, such as demand planning, manufacturing
production, inventory management or transportation management. For example, supply chain
finance analytics can help identify increased capital costs or opportunities to boost working
capital and procure-to-pay analytics can help identify the best suppliers and provide early
warning of budget overruns in certain expense categories, and transportation analytics software
can predict the impact of weather on shipments.
4. CorrelatioN
When two sets of data are strongly linked together we say they have a High Correlation. The
word Correlation is made of Co- (meaning "together"), and Relation. Correlation is Positive
when the values increase together, and. Correlation is Negative when one value decreases as
the other increases.
Correlation is a statistical measure that indicates the extent to which two or more variables
fluctuate together. A positive correlation indicates the extent to which those variables increase
or decrease in parallel; a negative correlation indicates the extent to which one variable increases
as the other decreases.
There could be essentially two types of data you can work with when determining correlation
Univariate Data: In a simple set up we work with a single variable. We measure central tendency
to enquire about the representative data, dispersion to measure the deviations around the
central tendency, skewness to measure the shape and size of the distribution and kurtosis to
measure the concentration of the data at the central position. This data, relating to a single
variable is called univariate data.
Bivariate data: But it often becomes essential in our analysis to study two variables
simultaneously. For example, a> height and weight of a person, b> age and blood pressure, etc.
This statistical data on two characters of any individual, measured simultaneously are termed as
bivariate data.
In fact, there may or may not be any association between these bivariate data. Here the word
‘correlation’ means the extent of association between any pair of bivariate data, recorded on any
individual.
1.Positive correlation
2.Negative correlation
3 Zero correlation
4 Spurious correlation
Positive correlation: If due to increase of any of the two data, the other data also increases, we
say that those two data are positively correlated.
For example, height and weight of a male or female are positively correlated.
Negative correlation: If due to increase of any of the two, the other decreases, we say that those
two data are negatively correlated.
For example, the price and demand of a commodity are negatively correlated. When the price
increases, the demand generally goes down.
Zero correlation: If in between the two data, there is no clear-cut trend. i.e. , the change in one
does not guarantee the co-directional change in the other, the two data are said to be non-
correlated or may be said to possess, zero correlation.
For example, quality like affection, kindness is in most cases non-correlated with the academic
achievements, or better to say that intellect of a person is purely non-correlated with complexion.
Spurious correlation: If the correlation is due to the influence of any other ‘third’ variable, the
data is said to be spuriously correlated.
For example, children with “body control problems” and clumsiness has been reported as being
associated with adult obesity. One can probably say that uncontrolled and clumsy kids participate
less in sports and outdoor activities and that is the ‘third’ variable here. At most times, it is
difficult to figure out the ‘third’ variable and even if that is achieved, it is even more difficult to
gauge the extent of its influence on the two primary variables.
The decisions are taken in different types of environment. The type of environment also
influences the way the decision is made
1. Certainty:
In this type of decision making environment, there is only one type of event that can take place.
It is very difficult to find complete certainty in most of the business decisions. However, in many
routine type of decisions, almost complete certainty can be noticed. These decisions, generally,
are of very little significance to the success of business.
2. Uncertainty:
In the environment of uncertainty, more than one type of event can take place and the decision
maker is completely in dark regarding the event that is likely to take place. The decision maker is
not in a position, even to assign the probabilities of hap-pening of the events.
Such situations generally arise in cases where happening of the event is determined by external
factors. For example, demand for the product, moves of competitors, etc. are the factors that
involve uncertainty.
3. Risk
Under the condition of risk, there are more than one possi-ble events that can take place.
However, the decision maker has adequate information to assign probability to the happening or
non- happening of each possible event. Such information is generally based on the past
experience.
Modern infor-mation systems help in using these techniques for decision making under
conditions of uncertainty and risk.
6. Cluster Analysis
Cluster analysis is a class of techniques that are used to classify objects or cases into relative
groups called clusters. Cluster analysis is also called classification analysis or numerical taxonomy.
In cluster analysis, there is no prior information about the group or cluster membership for any
of the objects.
Cluster Analysis has been used in marketing for various purposes. Segmentation of consumers
in cluster analysis is used on the basis of benefits sought from the purchase of the product. It
can be used to identify homogeneous groups of buyers.
Cluster analysis involves formulating a problem, selecting a distance measure, selecting a
clustering procedure, deciding the number of clusters, interpreting the profile clusters and finally,
assessing the validity of clustering.
The variables on which the cluster analysis is to be done should be selected by keeping past
research in mind. It should also be selected by theory, the hypotheses being tested, and the
judgment of the researcher. An appropriate measure of distance or similarity should be selected;
the most commonly used measure is the Euclidean distance or its square.
The non-hierarchical methods in cluster analysis are frequently referred to as K means clustering.
The two-step procedure can automatically determine the optimal number of clusters by
comparing the values of model choice criteria across different clustering solutions. The choice of
clustering procedure and the choice of distance measure are interrelated. The relative sizes of
clusters in cluster analysis should be meaningful. The clusters should be interpreted in terms of
cluster centroids.
Clustering is a fundamental modelling technique, which is all about grouping. The steps involved
in clustering are valid for all techniques.
1.Choose the Right Variable – The concept involves identifying what is the right attribute and
how much is it worth it. Here, one must select a variable that one feels may be important for
identifying and understanding differences among groups of observation within the data.
2.Scaling the Data – In this, the data samples from different sources may be grouped in different
scales. For example, if we are working on personal data, such as age where it goes from 0 to 100,
weight between 40-180 and height between 1-6 feet. Here, the variables in the analysis vary in
range; the variable with the largest range will have the greatest impact on the results.
3.Calculate Distances- Here, if the variables in the analysis vary in range, the variable with the
largest range will have the greatest impact on the results.
A Point to note is that each of the attributes has different scales. If we try to come out with an
equation, then normalization must be considered, where we may have to bring all attributes and
variables. For example, given that we are doing analysis on weather and evaluate the sample
data from India & US, the scale is different in this case. This is because one would be using metric
system and the other is using US system. Thus, our objective is to bring them to the same
standard. Also, the basic purpose of Cluster Analysis is to calculate distances
Here, one objective can be to group similar points together into one cluster.
1) One way is that we can take the center of the cluster and find out the center of the next
group and calculate distance between the centers.
2) Or take the closest point and find distance between closest points.
3) Or take the largest distance points and find out the distant between them.
Simple linkage – produces elongated clusters. It is the shortest distance between a point in one
cluster and a point in the other cluster.
Complete linkage– longest distance between a point in one cluster and a point in the other cluster
Average linkage– average distance between each point in one cluster and each point in the other
cluster
Centroid – distance between the centroids (mean vector over the variables) of the two clusters
Ward– combines clusters that lead to the smallest distance within clusters, sum of all squares
over all variables