Weka 9
Weka 9
Weka 9
Business intelligence (BI) mainly refers to computer-based techniques used in extracting, identifying, and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes.
BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining and predictive analytics. Business intelligence aims to support better business decision-making. Thus a BI system can be called a decision support system (DSS). Though the term business intelligence is sometimes used as a synonym for competitive intelligence, because they both support decision making, BI uses technologies, processes, and applications to analyze mostly internal, structured data and business processes while competitive intelligence gathers, analyzes and disseminates information with a topical focus on company competitors. Business intelligence understood broadly can include the subset of competitive intelligence.
History
In a 1958 article, IBM researcher Hans Peter Luhn used the term business intelligence. He defined intelligence as: "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal." In 1989, Howard Dresner (later a Gartner Group analyst) proposed "business intelligence" as an umbrella term to describe "concepts and methods to improve business decision making by using factbased support systems."[2] It was not until the late 1990s that this usage was widespread.
1. "Business Intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making." 2. When using this definition, business intelligence also includes technologies such as data integration, data quality, data warehousing, master data management, text and content analytics, and many others that the market sometimes lumps into the Information Management segment. Therefore, Forrester refers to data preparation and data usage as two separate, but closely linked segments of the business intelligence architectural stack.
1|Page
WEKA
Future
A 2009 Gartner paper predicted[24] these developments in the business intelligence market: Because of lack of information, processes, and tools, through 2012, more than 35 percent of the top 5,000 global companies will regularly fail to make insightful decisions about significant changes in their business and markets. By 2012, business units will control at least 40 percent of the total budget for business intelligence. By 2012, one-third of analytic applications applied to business processes will be delivered through coarse-grained application mashups. A 2009 Information Management special report predicted the top BI trends: "green computing, social networking, data visualization, mobile BI, predictive analytics, composite applications, cloud computing andmultitouch."[25] Other business intelligence trends include the following:[26] Third party SOA-BI products increasingly address ETL issues of volume and throughput. Cloud computing and Software-as-a-Service (SaaS) are ubiquitous. Companies embrace in-memory processing, 64-bit processing, and pre-packaged analytic BI applications. Operational applications have callable BI components, with improvements in response time, scaling, and concurrency.
2|Page
WEKA
WEKA
WEKA(Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. WEKA is free software available under the GNU General Public License.
WEKA GUI
The WEKA Knowledge Explorer is a graphical user interface that allows users to do data mining.
Data Preprocess using WEKA Even though WEKA allows users to upload .csv data files, WEKA mainly uses .arff (AttributeRelation File Format) file format. Thus weka provides the users the functionality to converts .csv to .arff file format. There are two ways to convert, 1: When you upload the .csv file WEKA explorer will prompt users to convert 2. Users can use WEKA CLI. 1. Step 1: Using weka CLI to convert .csv -> .arff 2. java weka.core.converters.CSVLoader bank-data.csv > bank-data.arff
ARFF : It is a text file that describes a list of instances sharing a set of attributes. link. 3 main parts: @relation, @attribute, @data. Sample from bank-data.arff data file that I will be using in this demo. 3|Page
WEKA
@relation bank-data @attribute id {ID12101,ID12102,ID12103,ID12104,ID12105,...ID12700} @attribute age numeric @attribute sex {FEMALE,MALE} @attribute region {INNER_CITY,TOWN,RURAL,SUBURBAN} @attribute income numeric @attribute married {NO,YES} @attribute children numeric @attribute car {NO,YES} @attribute save_act {NO,YES} @attribute current_act {NO,YES} @attribute mortgage {NO,YES} @attribute pep {YES,NO} @data ID12101,48,FEMALE,INNER_CITY,17546,NO,1,NO,NO,NO,NO,YES ID12102,40,MALE,TOWN,30085.1,YES,3,YES,NO,YES,YES,NO ID12103,51,FEMALE,INNER_CITY,16575.4,YES,0,YES,YES,YES,NO,NO ID12104,23,FEMALE,TOWN,20375.4,YES,3,NO,NO,YES,NO,NO ID12105,57,FEMALE,RURAL,50576.3,YES,0,NO,YES,NO,NO,NO 3. Step 2: Loading bank-data.arff to WEKA.
4|Page
WEKA
4. Step 3: Filtering Attributes Need to remove unique attributes before we can start on data mining.
5|Page
WEKA
5. Step 4: Discretization
This step involves in categorizing the data. According to WEKA documentation this step is crucial as techniques such as "association rule mining" can only be applied on categorized data.
Discretization must be perform on numeric or continuous attributes. Ex: Discretization performed on "age" and "income" attributes. Research on Stock Market Data.
Visualization
Every selected Attributes Vs Every Selected Attributes Shown is the set of scatter plots.
Scatter Plots: A scatter plot reveals relationships or association between two variables How scatter plot helps: It helps answer the questions regarding the involved variables X & Y such as
Are X and Y Related? Are X and Y Positively correlated? Are X and Y Negatively correlated? Does Ys variation depends Xs variation? Are there outliers?
6|Page
WEKA
7|Page