Welcome to the Iguazio Data Science Platform

An initial introduction to the Iguazio Data Science Platform and the platform tutorials

Platform Overview

The Iguazio Data Science Platform ("the platform") is a fully integrated and secure data science platform as a service (PaaS), which simplifies development, accelerates performance, facilitates collaboration, and addresses operational challenges. The platform incorporates the following components:

A data science workbench that includes Jupyter Notebook, integrated analytics engines, and Python packages
Model management with experiments tracking and automated pipeline capabilities
Managed data and machine-learning (ML) services over a scalable Kubernetes cluster
A real-time serverless functions framework — Nuclio
An extremely fast and secure data layer that supports SQL, NoSQL, time-series databases, files (simple objects), and streaming
Integration with third-party data sources such as Amazon S3, HDFS, SQL databases, and streaming or messaging protocols
Real-time dashboards based on Grafana

Data Science Workflow

The platform provides a complete data science workflow in a single ready-to-use platform that includes all the required building blocks for creating data science applications from research to production:

Collect, explore, and label data from various real-time or offline sources
Run ML training and validation at scale over multiple CPUs and GPUs
Deploy models and applications into production with serverless functions
Log, monitor, and visualize all your data and services

The Tutorial Notebooks

The home directory of the platform's running-user directory (/User/<running user>) contains pre-deployed tutorial Jupyter notebooks with code samples and documentation to assist you in your development — including a demos directory with end-to-end use-case applications (see the next section) and a data-ingestion-and-preparation directory with examples for performing data ingestion and preparation tasks.

Note:

To view and run the tutorials from the platform, you first need to create a Jupyter Notebook service.

The welcome.ipynb notebook and main README.md file provide the same introduction in different formats.

Getting-Started Tutorial

Start out by running the getting-started tutorial to familiarize yourself with the platform and experience firsthand some of its main capabilities.

End-to-End Use-Case Applications

Iguazio provides full end-to-end use-case applications (demos) that demonstrate how to use the platform and related tools to address data science requirements for different industries and implementations.

Pre-Deployed Platform Demos

The platform comes pre-deployed with the following end-to-end use-case demos, which are available in the demos tutorial-notebooks directory. For more details, see demos/README.md (available also as a notebook):

Natural language processing (NLP) — processes natural-language textual data and generates a Nuclio serverless function that translates any given text string to another (configurable) language.
Stream enrichment — implements a typical stream-based data-engineering pipeline, including real-time data enrichment using a NoSQL table.
Smart stock trading — reads stock-exchange data from an internet service into a time-series database (TSDB) and performs real-time market-sentiment analysis on specific stocks; the data is saved to a platform NoSQL table for generating reports and analyzing and visualizing the data on a Grafana dashboard.
Real-time user segmentation — builds a stream-event processor on a sliding time window for tagging and untagging users based on programmatic rules of user behavior.

Additional Demos

You can download additional demos from GitHub — for example:

XGBoost classification — uses XGBoost to perform binary classification on the popular Iris ML data set, and runs parallel model training with hyperparameters.
Image classification — builds and trains an ML model that identifies (recognizes) and classifies (labels) images by using Keras, TensorFlow, and Horovod.

For information on the available demos, see the demo listing.

You can download all the additional demos from the demos repository by executing the following command:

# Get additional demos
!/User/get-additional-demos.sh

Additional Platform Resources

Introduction video
In-depth platform overview with a break down of the steps for developing a full data science workflow from development to production
Platform components, services, and development ecosystem introduction
References
nuclio-jupyter SDK for creating and deploying Nuclio functions with Python and Jupyter Notebook

Miscellaneous

Creating Virtual Environments in Jupyter Notebook

A virtual environment is a named, isolated, working copy of Python that maintains its own files, directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects. Virtual environments make it easy to cleanly separate projects and avoid problems with different dependencies and version requirements across components. See the virtual-env tutorial notebook for step-by-step instructions for using conda to create your own Python virtual environments, which will appear as custom kernels in Jupyter Notebook.

Updating the Tutorial Notebooks to the Latest Version

You can use the provided igz-tutorials-get.sh script to update the tutorial notebooks to the latest stable version available on GitHub. For details, see the update-tutorials.ipynb notebook.

The v3io Directory

The v3io directory that you see in the file browser of the Jupyter UI displays the contents of the v3io data mount for browsing the platform data containers. For information about the predefined data containers and data mounts and how to reference data in these containers, see Platform Data Containers.

Support

The Iguazio support team will be happy to assist with any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 953 Commits
assets/images		assets/images
data-ingestion-and-preparation		data-ingestion-and-preparation
demos		demos
examples		examples
getting-started-tutorial		getting-started-tutorial
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get-additional-demos.sh		get-additional-demos.sh
igz-tutorials-get.sh		igz-tutorials-get.sh
platform-overview.ipynb		platform-overview.ipynb
update-tutorials.ipynb		update-tutorials.ipynb
virtual-env.ipynb		virtual-env.ipynb
welcome.ipynb		welcome.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the Iguazio Data Science Platform

Platform Overview

Data Science Workflow

The Tutorial Notebooks

Getting-Started Tutorial

End-to-End Use-Case Applications

Pre-Deployed Platform Demos

Additional Demos

Additional Platform Resources

Miscellaneous

Creating Virtual Environments in Jupyter Notebook

Updating the Tutorial Notebooks to the Latest Version

The v3io Directory

Support

About

Releases

Packages

Languages

License

NirSe/tutorials

Folders and files

Latest commit

History

Repository files navigation

Welcome to the Iguazio Data Science Platform

Platform Overview

Data Science Workflow

The Tutorial Notebooks

Getting-Started Tutorial

End-to-End Use-Case Applications

Pre-Deployed Platform Demos

Additional Demos

Additional Platform Resources

Miscellaneous

Creating Virtual Environments in Jupyter Notebook

Updating the Tutorial Notebooks to the Latest Version

The v3io Directory

Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages