Knime L3 Study Material
Knime L3 Study Material
Knime L3 Study Material
1
Agenda
§ Session 1
§ KNIME Software Overview
§ Working with KNIME Server
§ Connect to KNIME Server
§ Server-Side Workflow Execution
§ Remote Workflow Editor
§ Permissions & Versioning
§ Session 2
§ Introduction to Components
§ Component Configuration
§ Composite Views
§ WebPortal Applications
§ Session 3
§ KNIME Server REST API
§ Integrated Deployment
§ KNIME Server Administration
Create Productionize
Blend & Transform Model & Visualize Deploy & Manage Consume & Interact
KNIME KNIME
Analytics Platform Server
Great
Model or Report
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Create Productionize
Gather & Wrangle Model & Visualize Deploy & Manage Consume & Optimize
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Features:
§ Self Documenting
§ No limits: All nodes
Workflow § DB, Spark, DL, Python
etc.
§ Task packaging
§ Mix and Match
§ Sharable / Reusable /
Instantiated
Component
https://www.knime.com/blog/knime-analytics-platform-40-components-are-for-sharing
As a web-based
application
within a workflow
(manual/automated)
https://www.knime.com/blog/knime-meets-knime-will-they-blend
© 2021 KNIME AG. All rights reserved. 12
Data Science Practice: Multiple Stakeholders’ Needs
Data Engineers Smart Business Users (more than Excel)
Data Science “coders” (Python, etc.)
Application Users – Interaction required
Data Science Specialists
Data Science Visual workflow / Application Users – Made to spec
generalists Report Consumers
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Features:
§ Visualizations
§ Plotly, JavaScript, etc.
§ Reports Creation BIRT
§ Integration with
§ Excel
§ Functionality exploitation
not just CSVs
§ PowerBI
§ Tableau
§ Qlik
https://www.knime.com/community/continental-nodes-for-knime-xls-formatter § Spotfire
§ …
© 2021 KNIME AG. All rights reserved. § 15
Empower Business Users Appropriately
Features:
§ Workflows and webportal nodes
build interactive applications
& dashboards
§ KNIME WebPortal manages access
https://www.knime.com/blog/principles-of-guided-analytics
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Features:
§ Scheduled
§ Triggered
§ Called (Rest / SAAS)
§ Call Actions based on status
§ Scale and Pin Execution
§ View, edit, execute workflows
remotely
https://docs.knime.com
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Features:
§ Many Techniques available
§ LIME
§ SHAP
§ Shapley
§ Partial Dependence / ICE
§ Binary Classification Inspector
https://hub.knime.com/knime/extensions/org.knime.features.mli/latest
Archive
Document
Explore &
Analyze
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Features:
§ Client Customizations
§ Custom update sites
§ Manage preferences via profiles
§ Node repository & libraries
§ Monitor server activity
§ Running and scheduled jobs
§ Adjust permissions
§ Manage ongoing services
Features:
KNIME
Server § Single sign-on (SSO) to KNIME
Server
§ Integrate with multiple identity
providers
§ Flexible configuration capabilities
Client
Identity
to map users and groups
Provider
§ Manage all aspects of KNIME
usage
https://docs.knime.com
https://hub.knime.com/
KNIME Forum
Account Credentials
31
Set Up a New Mount Point
Components
36
Executing a Workflow on the Server – Remote Execution
43
Remote Workflow Editor – 1/3
§ Permissions can be set for all types of items: workflows, workflow groups,
components, and data files
§ Permissions are assigned to either individual people or user groups
§ The user who uploads an item, automatically becomes its owner
§ Users with admin rights have no restrictions on permissions
§ The owner, plus everyone with admin rights, can assign and change
permissions for an item
§ It is also possible to set permissions on schedules, such that a schedule can be
maintained/changed by a team member while the owner is e.g. on vacation
Read Download a workflow job - See the content of a workflow group § File: download data and
including data execute workflows that use the
data
§ Component: use and download
Write Overwrite, create snapshots, Create and upload new items in a Overwrite a file or component
and delete workflows workflow group
Everybody Else
Highlight differences:
§ Nodes included/excluded
§ Node configurations
55
Config Details – Access KNIME Server
*for double names and double surnames the whitespace has been removed
§ Configure a mount point for KNIME Server with the details provided in the
Config Details – Access KNIME Server slide at the end of the slide deck
§ Download Server Training Material in your LOCAL workspace (hint: drag
and drop or copy and paste the entire folder)
Metanodes Components
WebPortal Usage Executed in the background JavaScript Views and Widgets inside the
component are shown on a WebPortal page
Execution mode Normal execution Allows Simple Streaming execution
Recommended uses Workflow cleaning Enabling custom interactions, producing
interactive views, sharing functionalities
§ Flow Variables are -by default - only available locally inside the component
§ Configure the component input/output to pass Flow Variables
from/to outside the component
§ A layout can be defined for any Component that contains at least one widget
node or JavaScript-enabled view
§ The layout editor can be accessed from the top toolbar, when inside the
component
§ The “Append the IDs to node names” button on the top bar shows the ID of each
node
§ This is useful to reorder the items in the layout structure for the WebPortal
90
Data Science Practice: Multiple Stakeholders’ Needs
Data Engineers Smart Business Users (more than Excel)
Data Science “coders” (Python, etc.)
Application Users – Interaction required
Data Science Specialists
Data Science Visual workflow / Application Users – Made to spec
generalists Report Consumers
IT Operations
Centralized resources / strategies
Standards and preferred platforms used, Financial / Risk Oversite
infrastructure options Costs allocation
Exit strategies Compliance officer
IT Security Data/model Governance,
Data, applications traceability, GDPR
Extending data
Incorporate domain Amplifies the best
science to the
experts’ knowledge data science
Business Analysts
KNIME
Guided
Analytics &
Automation
Interaction Points
§ If a workflow is selected in the left section, its details page is shown in the
section on the right
103
Classic CRM Analytics
Model
*for double names and double surnames the whitespace has been removed
§ Use the Text Output Widget node to write the webpage description
Text for the WebPage (hint: use html as text format):
<h2>Define Cluster Parameters</h2>
<p>Set parameters to be taken into account in the following clustering.</p>
<p>Click 'Next' to start the clustering process.</p>
<P>If you do not know what a clustering process is, check <a
href="https://en.wikipedia.org/wiki/Cluster_analysis">Cluster Analysis</a> and specifically the <a
href="https://en.wikipedia.org/wiki/K-means_clustering">k-Means algorithm</a>.
§ Encapsulate the 4 created nodes in a component and configure 2 outports: one
for the Integer Widget node and one for the Column Filter Widget node
§ Define the layout of the items with in order to have the items ordered as shown
in the figure
Text Output
Widget
Column
Filter Widget
Output in server
Input data response
§ The workflow can also be executed by external tools such as Postman or Curl
for debugging purposes
§ KNIME Server as backend for third party analytical applications
Issues:
• Development =!
Deployment
• Needs Copy/Paste, Rewrite
• Transport of models is non-
trivial
# read data
raw_target_data = read_xls_data() productionize # Predictions, running as a flask service
# remove duplicates, handle missing values:
target_data = basic_data_cleanup_with_pandas(raw_target_data) # load saved components
raw_feature_data = fetch_db_data_using_psycopg2() feature_scaler,trained_RF = load_models_with_joblib()
# remove duplicates, handle missing values:
feature_data = basic_data_cleanup_with_pandas(raw_feature_data) # read and prepare data
raw_prediction_data = get_dataframe_from_request()
# basic feature engineering with sklearn prediction_data = basic_data_cleanup_with_pandas(raw_prediction_data)
feature_scaler = sklearn.preprocessing.StandardScaler().fit(feature_data) scaled_data = feature.scaler.transform(prediction_data)
standardized_features = feature_scaler.transform(feature_data)
filtered_feature_data = variance_feature_filter_with_sklearn(standardized_features,target_data) # generate predictions
predictions = trained_RF.predict(scaled_data)
# build model with sklearn prediction_probs = trained_RF.predict_proba(scaled_data)
training_feature_data,testing_feature_data,training_target_data,testing_target_data = predictions = join_tables_with_pandas(predictions,prediction_probs)
split_data_with_sklearn(filtered_feature_data,target_data)
RF_params = RF_hyperparameter_search_with_sklearn(training_feature_data,training_target_data) # return results from the service:
trained_RF = build_RF_using_params(training_feature_data,training_target_data,RF_params) return_dataframe_to_service(predictions)
# validate model
generate_validation_report(trained_RF,testing_feature_data,testing_target_data)
#--------------------
# save models
save_models_with_joblib((feature_scaler,trained_RF))
Executor(s)
KNIME Server Large Message
Queue
Workflow
Repository Request
Request
Tomcat
Request
…
Web Container
…
Client
Features:
§ KNIME Analytics Platform
§ KNIME Server Small & Medium
§ KNIME Server Large BYOL
§ Supports Server Large with multiple
Executors
§ Has an embedded Executor so can be
stand-alone
§ KNIME Executors
§ Multiple Executors that can be used by
KNIME Server Large
§ Pay as you go (PAYG) offering supports
elastic scaling
https://www.knime.com/knime-software-on-amazon-web-services
§ Bring your own license (BYOL) offering
https://www.knime.com/knime-software-on-microsoft-azure uses cores from your Server license
Executors BYOL
Features:
§ Supplement traditionally licensed
Executor Executor Executors with Pay-as-you-Go
(PAYG) model
ı
Executor Executor
§ Meet periodic demand peaks
KNIME § Fulfill need for speciality hardware
Server Executors PAYG
(e.g. GPU‘s)
Executor Executor § Meet budgeting needs
ı
Executor Executor
148
Set Properties
Features:
RAM
Executor
CPU
§ Match workflow needs to
Executor capabilities
KNIME
Executor
GPU CPU § Partition compute resources
Server
by capability, department,
RAM GPU
usage, …
Executor
§ Workflow needs determined
by workflow publisher
Features:
CPU RAM Marketing
Executor § Logical groupings of
Group 1
Executors
KNIME Executor
CPU Database Finance
Finance § Match users/groups to
Server Group 2
Executor Groups
Executor
CPU GPU Engineering § Partition compute resources
Group 3 by groups, department, …
§ Partitioning managed by
Server administrators
§ Features:
§ Easier IT Operations.
§ Manage Analytics Platform preferences centrally
§ Include dependencies – e.g. driver files.
§ Deliver updates to configurations automatically
§ Different departments/teams
have different requirements
§ Client-profiles
§ Python-Linux
KNIME Server § Python-macOS
§ R-Linux
client-profiles
§ Databases-Win7
Databases- Python-macOS § Big Data-Win10
Big Data-Win10
Win7 (etc) § Executor
1. Knime.ini
By adding lines to the knime.ini (file available in the same directory as the KNIME Analytics Platform
executable)
On application startup of KNIME Analytics Platform, KNIME Server is queried for the specified
preference profiles. Preferences are applied before finishing startup
Path to the workflow: Examples à REST à Predict Results Using REST API
§ Right click menu à Show API Definition
§ Explore Execution Endpoint: GET Request
§ Try out and execute from browser
§ Capture workflow parts using Capture Workflow Start & Capture Workflow End
nodes for
§ Data Preparation
§ Scoring
§ Combine captures using Workflow Combiner node
§ Write combined deployment workflow relative to the modeling workflow
§ We will keep KNIME Server up and running for an additional week to let you
play around a little bit more with it
§ Interested in a trial license? Just send me an email at
180