DWM1 Riya
DWM1 Riya
DWM1 Riya
ACTIVITY REPORT
SUBJECT:- DATAWARE HOUSING AND MINING
ACTIVITY 1
“DATA MINING TOOL :- WEKA”
The foundation of any Machine Learning application is data - not just a little data but a
huge data which is termed as Big Data in the current terminology.
To train the machine to analyze big data, you need to have several considerations on the
data −
• The data must be clean.
• It should not contain null values.
Besides, not all the columns in the data table would be useful for the type of analytics
that you are trying to achieve. The irrelevant data columns or ‘features’ as termed in
Machine Learning terminology, must be removed before the data is fed into a machine
learning algorithm.
Features of Weka
1. Preprocess
The preprocessing of data is a crucial task in data mining. Because most of the data is
raw, there are chances that it may contain empty or duplicate values, have garbage
values, outliers, extra columns, or have a different naming convention. All these things
degrade the results.
2. Classify
3. Cluster
4. Associate
Association rules highlight all the associations and correlations between items of a
dataset. In short, it is an if-then statement that depicts the probability of relationships
between data items. A classic example of association refers to a connection between the
sale of milk and bread.
5. Select Attributes
Every dataset contains a lot of attributes, but several of them may not be significantly
valuable. Therefore, removing the unnecessary and keeping the relevant details are very
important for building a good model.
6. Visualize
In the visualize tab, different plot matrices and graphs are available to show the trends
and errors identified by the model.
How to Install Dataset in Weka Tool
2) After successful download, open the file location and double-click on the downloaded
file. The Step-Up wizard will appear. Click on Next
3) The License Agreement terms will open. Read it thoroughly and click on “I Agree”.
8) After the installation is complete, the following window will appear. Click on Next.
9) Click on Finish.
16) Then open file and choose local disk : C Then go to program files and choose Weka-
3-8-6 , click on the data. Now there are default data set are available . Now let us click
on the “Diabetes”
21) Now after this preprocessing , Click the “Classify” tab. This is the area for running
algorithms against a loaded dataset in Weka. You will note that the “ZeroR” algorithm is
selected by default. Click the “Start” button to run this algorithm.
ZeroR is the simplest classification method which relies on the target and
ignores all predictors.
25) Now we can choose the classifier this is “Logistics” and then start.
27) Now we can show the visualize threshold curves and then testing negative
29) Now you can see the recall area and precision area
30) Now we can choose the classifier this is “J48” and then start and we can see
visualize tree
31) Now will go to clustering , to choose SimpleKMeans clustering And click on start
35) Now go to “Associate” , Choose the “FilteredAssociater” and click on start
38) Now you can go to select attributes and choose attribute evaluator is
“CfsSubsetEvul” and also select Search method is “BestFirst” then click on start
40) Can also do the visualization each point you can view each Plot Matrix
Conclusion
In conclusion, we have learned about the Weka Tool, i.e., is an open-source software
designed for data mining and machine learning. We discovered how to install Weka and
explored its various components. WEKA is a powerful tool for developing machine
learning models. It provides implementation of several most widely used ML
algorithms. Before these algorithms are applied to your dataset, it also allows you to
preprocess the data. We learned how data set are apply in the Weka for preprocessing,
classification, clustering, etc. We learn about the algorithm that use in the Weka.