Cheat Sheet: Building A KNIME Workflow For Beginners: Explore Analyze

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Cheat Sheet: Building a KNIME Workflow for Beginners

Getting started with KNIME Analytics Platform EXPLORE ANALYZE


• Read through the installation guide at knime.com/installation Decision Tree: The Learner node trains a C4.5

(All visualizations are interactive)


Scatter Plot: Represents Sunburst Chart: Displays Stacked Area Chart: Plots
or a CART decision tree. The configuration
• Check out the 7 things you should do after installing KNIME input data rows as points categorical columns multiple numerical data Decision Tree
window includes options for pruning, early
Analytics Platform at in a two dimensional through a hierarchy of columns on top of each
stopping, information measures, splitting
www.knime.com/blog/seven-things-to-do-after-installing-knime plot. Input dimensions rings. Each ring is sliced other using the previous
values, and more. Both the Learner and the
(columns) on the x-y axis according to the nominal line as the base reference.
Predictor node provide an interactive view
• Take the E-Learning Course at plot and graphical values in the corresponding The areas in between lines
where the decision tree is displayed together
www.knime.com/knime-self-paced-courses properties can be column and to the selected are colored for easier
with the input data propagation.
changed in the configura- hierarchy. This is a comparison. This chart is
Understanding the traffic light system: tion window or interac- powerful chart for commonly used to k-Means: Implements the k-Means clustering
tively in the node view. multivariate analysis. visualize trending topics. k-Means
algorithm. Number of clusters must be set
Not configured: Node is not yet configured and cannot prior to node execution. This node builds the
Pie/Donut Chart
be executed with its current settings Line Plot
Line Plot: Plots numerical values in data columns Color Manager
Color Manager: Assigns a color property to Pie Chart: Visualizes one aggregated metric for clusters. The Cluster Assigner node finds the
(y-axis) against values in a reference column each input row based on the row’s value in a different data partitions with colored slices on a closest cluster and assigns it to the input
Configured: Node has been correctly configured and (x-axis). Data points are connected via colored circle where the areas are proportional to the metric data row. Being an unsupervised algorithm,
may be executed at any time selected column. This color property affects the
lines. If the reference column on the x-axis graphical representation in the upcoming views. values. The partitions are defined by a categorical this node pair doesn’t follow the classic
contains sorted time values, the line plot graphical- column. Learner - Predictor scheme.
Executed: Node has been successfully executed and
results can be viewed and used in downstream nodes. ly represents the evolution of a time series. Logistic Regression: The Learner node
Logistic Regression
Box Plot Bar Chart Bar Chart: Visualizes one or more aggregated trains a logistic regression model to predict
Concatenate Data Explorer
Data Explorer: Provides an interactive view to Box Plot: Visualizes numeric columns using the metrics for different data partitions with rectangular categorical target values. The configuration
Dynamic ports: Additional input ports can be added summarize the statistics of the input data via quartile statistics. Watch out for the points at bars where the heights are proportional to the window includes options for solver, input
by clicking the three dots in the bottom left corner statistical measures and histograms - for the end of the whiskers - they might mark metric values. The partitions are defined by a feature choice, regularization functions to
of a node. both numerical and nominal columns. outliers! categorical column. avoid overfitting, & more.
Scorer
Scorer: Calculates a number of performance
measures such as accuracy, F1-score, or
READ Cohen’s Kappa, to quantify the quality of a
classifier.
CSV Reader Model Reader
Learner Nodes: Supervised algorithms in KNIME
CSV Reader: Reads CSV files. It has an auto-detect Model Reader: Reads machine learning models
Analytics Platform have a Learner node to train a
function to automatically guess the file structure. As for generated with any of the Learner nodes. Models are Numeric Scorer Numeric Scorer: Calculates a number of
model on a previously labelled training set.
other reader nodes, clicking the three dots in the lower usually saved after training and reused in deploy- numerical error measures, such as root mean
left corner allows to add one input port to connect to ment. squared error, mean absolute error, or R², to
external data sources. Predictor Nodes: Used for applying models. The quantify the quality of a numerical predictor
Amazon
Table Reader Table Reader: Reads data from a .table file. .table two inputs are the trained model and the data to model.
Authentication Amazon S3 Connector CSV Reader
files are organized using a KNIME proprietary format, process. The output contains the original data and
including the full file structure and are optimized for the model predictions. ROC Curve: Displays the Receiver Operating
AWS space and speed - providing maximum performance ROC Curve Characteristic (ROC) curve of a classifier
with minimum configuration! working on a binary class problem. One of the
Excel Reader two classes is arbitrarily chosen as the
Excel Reader: Reads content from sheets in Excel files Google Sheets positive class and the ROC curve is built on
(XLS, XLSX). Sheet and cells to be read can be defined
Reader
Google Sheets Reader: Reads data from a Google the probabilities/scores produced for that
in the configuration window. Sheet file. Authentication occurs on the Google class on the input data set.
site. Google credentials are not saved within the
KNIME workflow. Integrations to many open source data analytics tools are
Table Creator
Table Creator: Allows users to manually create a data also available. Some use the KNIME node GUI (H2O, Weka,
table in its configuration window as a data sheet. Keras, Spark MLlib). Others offer nodes with a
In reader and writer nodes, the file path is expressed relatively development environment for scripting and debugging (R,
Data cells can be copied and pasted in the sheet. to a key location of the current KNIME installation, like
Perfect for generating small data sets. Python, Java).
workflow, workflow data area, and mountpoint.

TRANSFORM DEPLOY Resources

Data to Report: Marks the data table to be exported to BIRT • KNIME Forum: Join our global
GroupBy: Groups the rows of a table by the unique Data to Report
GroupBy Math Formula Math Formula: Implements a number of math Joiner Joiner: Joins rows from two data tables - a partially open source reporting tool integrated within community and engage in conversa-
values in selected columns and calculates
operations across multiple input columns, from based on common values in one or more KNIME. When switching from KNIME to BIRT, the marked tions at forum.knime.com
aggregation and statistical measures for the
simple sum and average, to logarithms and key columns. The most common join types data sets are imported into BIRT. The Image To Report • KNIME Books: More tips, ideas, and
defined groups. Despite its simple name, it offers
exponentials. All Math Formula operators are also are possible: inner join, left outer join, right node marks the input images to be exported to BIRT. lessons from knime.com/knimepress
powerful functionality and has many unsuspected
available in the Column Expressions node. outer join, and full outer join. • KNIME Events: Take a course, attend a
usages. For example - row deduplication. Excel Writer
String to Date&Time Sorter workshop, or join a meetup at
Pivoting
Pivoting: Extends the aggregation functionality of String to Date&Time: Converts values in a String Sorter: Sorts the table in ascending or Excel Writer (XLS): Writes the input data table to a sheet knime.com/learning/events
the GroupBy node by creating an output data table column into Date&Time values. The Date&Time descending order based on the values of a in an Excel file (XLS or XLSX). • KNIME Blog: Engaging topics,
with columns and rows for the unique values in format contained in the String values can be chosen column. In addition, it is possible to
challenges, industry news, and
selected input columns. Note: the unique values of manually defined or auto guessed. sort based on multiple columns.
the grouping column become rows and the unique knowledge at knime.com/blog
Table Writer: Writes the input data table to a file using the
values of the pivoting column become columns.
Table Writer
.table KNIME proprietary format. This format includes the • KNIME Hub: Browse and share
full file structure and is optimized for space and speed. workflows, nodes, and components,
Rule Engine Cell Splitter Cell Splitter: Splits values in a selected column into Concatenate Concatenate: Merges two or more data
tables vertically by piling up cells in Including the table structure in the file is a great advantage - with the KNIME community. Add
Rule Engine: Applies a set of rules to each row of two or more substrings, as defined by a delimiter
the input data table. All Rule Engine operators are match. Delimiter is a set character, such as a columns with the same name. Cells in not especially when exchanging data files among users. ratings, or comments to other
also available in the Column Expressions node. comma, space, or any other character or character overlapping columns are filled with missing workflows at hub.knime.com
sequence. values.
CSV Writer
• More Guides: Still using SAS or Excel?
CSV Writer: Writes out input data table into a CSV file or Transition to KNIME Analytics Platform
to a remote location denoted by an URL with these handy guides at
Partitioning Column Filter Column Filter: Filters columns in or out from the Missing Value
Missing Value: Defines a strategy to deal knime.com/knimepress
Partitioning: Splits data into two subsets accord- input data table according to a filtering rule.
ing to a sampling strategy. This node is generally with missing values in the input data table - • KNIME Server: For team-based
Columns to be retained can be manually picked or either globally on all columns, or individually
used to produce a training and a test set to train
Google Sheets
Google Sheets Writer: Writes the input data table into a collaboration, automation, management,
selected according to their type, or of a regex for each single column.
Writer
and evaluate a machine learning model. expression matching their name. Google Sheet file. Authentication occurs on the Google and deployment check out KNIME
site. Google credentials are not saved within the KNIME Server at knime.com/knime-server
workflow. • Beginners Space
Row Filter Column Rename
String Manipulation String Manipulation: Performs operations on on KNIME Hub:
Row Filter: Filters rows in or out from the input Column Rename: Assigns new names and types String values in columns, such as combining
data table according to a filtering rule. The
Send to Tableau Find the collection
to selected columns, as configured in the dialog. S two or more Strings together, extracting one or Server
of example
filtering rule can match a value in a selected more substrings, trimming blank spaces, and Connectors to Tableau: Export input data table into a
column or numbers in a numerical range. Tableau file or server for reporting. workflows using
so on. All operators are also available in the
Column Expressions node. these cheat sheet
nodes. tinyurl.com/KNIME-Beginner
KNIME Press
Extend your KNIME knowledge with our collection of books from KNIME Press. For beginner and advanced users, through to those interested in specialty topics such as topic detection, data blending, and classic
solutions to common use cases using KNIME Analytics Platform - there’s something for everyone. Available for download at www.knime.com/knimepress.

KNIME ®
BEGINNER·S
LUCK

Decision
Tree Learner
File Reader Partitioning

Decision Tree
training to Predictor Scorer
original 80 vs. 20 predict income
data set

attach class confusion matrix


probabilities + scores

A Guide to KNIME Analytics Platform for Beginners


Authors: Satoru Hayasaka and Rosaria Silipo

KNIME for Life Sciences


A Collection of Use Cases

Blend & Transform Validate & Deploy

SECOND EDITION
Data Blending with KNIME
Model & Production Consume &
Visualize Creation Production Interact
Process

Optimize & Capture Monitor & Update

KNIME®
Rosaria Silipo & Lada Rudnitckaia

© 2022 KNIME AG. All rights reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.

You might also like