Statistics with Rust, Second Edition
()
About this ebook
"Statistics with Rust, Second Edition" is designed to help you learn quickly, focusing on practical statistics using Rust scripts. The book is for readers who know the basics of statistics and machine learning. It gives quick explanations so you can try out concepts with hands-on coding. The book uses the newest version of Rust, 1.72.0, to help users build and secure statistical and machine learning algorithms. Each chapter is full of useful programs and code examples that will walk you through tasks like data manipulation, statistical tests, regression analysis, building machine learning models, and natural language processing.
This second edition brings all chapters up to date with the latest in stats and Rust programming. It focuses on how you can put these things to practical use, with a detailed look at advanced algorithms like PCA, SVM, neural networks, and ensemble methods. We've also included some natural language processing topics, such as text preprocessing, tokenization, and word embeddings. The book also shows you how to combine Rust's performance and safety with statistical analysis, giving you the tools you need to do data analysis efficiently and reliably.
Table of Content
Introduction to Rust for Statisticians
Data Handling and Preprocessing
Descriptive Statistics
Probability Distributions and Random Variables
Inferential Statistics
Regression Analysis
Bayesian Statistics
Multivariate Statistical Methods
Nonlinear Models and Machine Learning
Model Evaluation and Validation
Text and Natural Language Processing
Related to Statistics with Rust, Second Edition
Related ebooks
Statistics with Rust, Second Edition: Explore rust programming and its powerful crates across data science, machine learning and NLP projects Rating: 0 out of 5 stars0 ratingsMachine Learning with Rust Rating: 0 out of 5 stars0 ratingsRust In Practice, Second Edition Rating: 0 out of 5 stars0 ratingsThe Rust Guide to Generative AI Rating: 0 out of 5 stars0 ratingsPractical Rust 1.x Cookbook, Second Edition Rating: 0 out of 5 stars0 ratingsRust for Network Programming and Automation, Second Edition Rating: 0 out of 5 stars0 ratingsRust In Practice Rating: 0 out of 5 stars0 ratingsRust In Practice: A Programmers Guide to Build Rust Programs, Test Applications and Create Cargo Packages Rating: 0 out of 5 stars0 ratingsR Programming Unlocked: Easy Learning Rating: 0 out of 5 stars0 ratingsRust for Network Programming and Automation Rating: 0 out of 5 stars0 ratingsLinux Essentials for Hackers & Pentesters Rating: 0 out of 5 stars0 ratingsApache Cassandra Essentials Rating: 4 out of 5 stars4/5PostgreSQL Administration Essentials Rating: 0 out of 5 stars0 ratingsLearn Rust Programming: Safe Code, Supports Low Level and Embedded Systems Programming with a Strong Ecosystem (English Edition) Rating: 0 out of 5 stars0 ratingsParallel Python with Dask Rating: 0 out of 5 stars0 ratingsParallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset Rating: 0 out of 5 stars0 ratingsC# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications Rating: 0 out of 5 stars0 ratingsRust for Beginners Rating: 0 out of 5 stars0 ratingsRAG-Driven Generative AI: Build custom retrieval augmented generation pipelines with LlamaIndex, Deep Lake, and Pinecone Rating: 0 out of 5 stars0 ratingsLearning NServiceBus Sagas Rating: 0 out of 5 stars0 ratingsData Analysis Foundations with Python: Master Data Analysis with Python: From Basics to Advanced Techniques Rating: 0 out of 5 stars0 ratingsInstant Heat Maps in R How-to Rating: 0 out of 5 stars0 ratingsHow To Become A Data Scientist With ChatGPT: A Beginner's Guide to ChatGPT-Assisted Programming Rating: 0 out of 5 stars0 ratingsBuilding Python Real-Time Applications with Storm Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials - Second Edition Rating: 4 out of 5 stars4/5Large Scale Machine Learning with Python Rating: 2 out of 5 stars2/5
Intelligence (AI) & Semantics For You
Algorithms to Live By: The Computer Science of Human Decisions Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5The Alignment Problem: How Can Machines Learn Human Values? Rating: 4 out of 5 stars4/5The Definitive Guide To Success With Midjourney Rating: 0 out of 5 stars0 ratingsAdvances in Financial Machine Learning Rating: 5 out of 5 stars5/5Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World Rating: 4 out of 5 stars4/5Deep Utopia: Life and Meaning in a Solved World Rating: 0 out of 5 stars0 ratingsSuperagency: What Could Possibly Go Right with Our AI Future Rating: 0 out of 5 stars0 ratingsChatGPT Rating: 3 out of 5 stars3/5Midjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5Python for Beginners: A Crash Course to Learn Python Programming in 1 Week Rating: 0 out of 5 stars0 ratingsDancing with Qubits: How quantum computing works and how it can change the world Rating: 5 out of 5 stars5/5The ChatGPT Revolution: How to Simplify Your Work and Life Admin with AI Rating: 0 out of 5 stars0 ratingsHands-On System Design: Learn System Design, Scaling Applications, Software Development Design Patterns with Real Use-Cases Rating: 0 out of 5 stars0 ratingsArtificial Intelligence Programming with Python: From Zero to Hero Rating: 0 out of 5 stars0 ratingsReal-World Natural Language Processing: Practical applications with deep learning Rating: 0 out of 5 stars0 ratingsJAVA for Beginner's Crash Course: Java for Beginners Guide to Program Java, jQuery, & Java Programming Rating: 4 out of 5 stars4/5AI Literacy Fundamentals Rating: 0 out of 5 stars0 ratingsCybernetics: Understanding the Intersection of Machines and Human Systems Rating: 0 out of 5 stars0 ratings80 Ways to Use ChatGPT in the Classroom Rating: 5 out of 5 stars5/5Deep Learning with PyTorch Rating: 5 out of 5 stars5/5Generative AI Tools for Developers: A Practical Guide Rating: 0 out of 5 stars0 ratings
Reviews for Statistics with Rust, Second Edition
0 ratings0 reviews
Book preview
Statistics with Rust, Second Edition - Keiko Nakamura
Statistics with Rust
Second Edition
Explore rust programming and its powerful crates across data science, machine learning and NLP projects
Keiko Nakamura
Preface
Statistics with Rust, Second Edition
is designed to help you learn quickly, focusing on practical statistics using Rust scripts. The book is for readers who know the basics of statistics and machine learning. It gives quick explanations so you can try out concepts with hands-on coding.
The book uses the newest version of Rust, 1.72.0, to help users build and secure statistical and machine learning algorithms. Each chapter is full of useful programs and code examples that will walk you through tasks like data manipulation, statistical tests, regression analysis, building machine learning models, and natural language processing.
We've covered great Rust crates featured throughout, including:
ndarray and ndarray-linalg: For efficient handling of multi-dimensional arrays and linear algebra operations.
ndarray-stats: To perform statistical computations on arrays.
rand and rand_distr: For generating random numbers and working with probability distributions.
smartcore: A machine learning library used for implementing algorithms like decision trees and random forests.
linfa: A toolkit providing implementations of Support Vector Machines and other algorithms.
tch: Rust bindings for PyTorch, enabling the creation and training of neural networks.
finalfusion: For working with word embeddings in natural language processing tasks.
rust-stemmers: To perform stemming in text preprocessing.
regex: For pattern matching and text manipulation.
unicode-segmentation: To accurately tokenize Unicode strings.
This second edition brings all chapters up to date with the latest in stats and Rust programming. It focuses on how you can put these things to practical use, with a detailed look at advanced algorithms like PCA, SVM, neural networks, and ensemble methods. We've also included some natural language processing topics, such as text preprocessing, tokenization, and word embeddings.
The book also shows you how to combine Rust's performance and safety with statistical analysis, giving you the tools you need to do data analysis efficiently and reliably. The book's got lots of practical code and explanations that are easy to understand, which helps you learn the skills you need to get to grips with data using Rust.
GitforGits
Prerequisites
This book is perfect for every data user, including data scientist, NLP engineers, rust programmers, data engineers, data analysts and all those who are knowning simple basics of statistics and eager to use Rust programming for data science and machine learning projects.
Codes Usage
Are you in need of some helpful code examples to assist you in your programming and documentation? Look no further! Our book offers a wealth of supplemental material, including code examples and exercises.
Not only is this book here to aid you in getting your job done, but you have our permission to use the example code in your programs and documentation. However, please note that if you are reproducing a significant portion of the code, we do require you to contact us for permission.
But don't worry, using several chunks of code from this book in your program or answering a question by citing our book and quoting example code does not require permission. But if you do choose to give credit, an attribution typically includes the title, author, publisher, and ISBN. For example, Statistics with Rust, Second Edition by Keiko Nakamura
.
If you are unsure whether your intended use of the code examples falls under fair use or the permissions outlined above, please do not hesitate to reach out to us at support@gitforgits.com.
We are happy to assist and clarify any concerns.
Prologue
I set out to write Statistics with Rust
with one clear vision: to merge the precision of statistical analysis with the performance and safety of Rust programming. This second edition builds upon the foundation of the first. The journey has been both challenging and rewarding, and I am pleased to present it. Rust has evolved remarkably over the past few years. It is now at version 1.72.0 and has brought enhanced features and a growing ecosystem of libraries. I was inspired to revisit the original content and incorporate the latest advancements in both statistics and Rust. My goal has always been to provide a hands-on experience, and this edition intensifies that focus with more practical programs and real-world examples.
This edition includes a deeper exploration of machine learning and natural language processing, which I am excited to share. I have added new chapters on nonlinear models, multivariate techniques, and text analysis. You will find implementations of algorithms like Support Vector Machines, Neural Networks, and Principal Component Analysis, all using Rust's powerful crates such as smartcore, linfa, and tch. These examples prove that Rust is the ideal tool for complex data analysis tasks.
Working with real datasets often presents challenges like handling large volumes of data and ensuring code efficiency. Rust is the ideal choice for tackling these issues thanks to its memory safety and performance. The book will show you how to leverage Rust's features to write robust and efficient code. You'll learn how to manipulate data using ndarray, perform statistical computations with ndarray-stats, and process text data with crates like regex and unicode-segmentation. Furthermore, I have made it a priority to include a strong focus on practical application. Each chapter includes code snippets and full programs that you can run and modify. You will have hands-on projects that reinforce the concepts discussed, whether you're building a regression model, performing hypothesis testing, or working on time series analysis.
I am especially proud of how this book bridges the gap between theory and practice. Statistics can feel abstract, but I've made them tangible and accessible by implementing the concepts in Rust. I explain not just the how, but also the why behind each method. This helps you develop a deeper understanding of both statistics and programming. Writing this book has been an incredible journey of learning and discovery for me. The Rust community is growing and innovating in ways that continually inspire me. This book will empower you to harness Rust's capabilities in your statistical work and open up new possibilities for analysis and application.
I'm grateful for the opportunity to share this passion with you. I'm confident you'll apply these tools and concepts in your own projects, and I'm certain we'll continue to make advancements together in the fields of statistics and Rust programming.
— Keiko Nakamura
Copyright © 2024 by GitforGits
All rights reserved. This book is protected under copyright laws and no part of it may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without the prior written permission of the publisher. Any unauthorized reproduction, distribution, or transmission of this work may result in civil and criminal penalties and will be dealt with in the respective jurisdiction at anywhere in India, in accordance with the applicable copyright laws.
Published by: GitforGits
Publisher: Sonal Dhandre
www.gitforgits.com
support@gitforgits.com
Printed in India
First Printing: October 2024
Cover Design by: Kitten Publishing
For permission to use material from this book, please contact GitforGits at support@gitforgits.com.
Content
Preface
GitforGits
Acknowledgement
Chapter 1: Introduction to Rust for Statisticians
Overview
Why Rust for Data Analysis and Statistics?
High Performance
Memory Safety and Reliability
Safe Concurrency
Interoperability and Ecosystem Integration
WebAssembly and Cross-Platform Capabilities
Modern Language for Modern Challenges
Comparing Rust and Python for Statistics
Performance Considerations
Memory Management and Safety
Concurrency and Parallelism
Ecosystem and Libraries
Development Experience
Scalability and Maintainability
Deployment and Portability
Setting up Rust on Linux
Install Required Dependencies
Install Rustup
Configure Environment
Essential Rust Libraries for Statistics
‘Ndarray’ for N-Dimensional Arrays
‘Polars’ for DataFrames
‘Statrs’ for Statistical Computations
‘Plotters’ for Data Visualization
Setting up Statistical Project
Create a New Cargo Project
Add Dependencies
Build Project
Install System Dependencies
Write Code
Run Project
Understanding Rust's Ownership Model
Ownership Rules
Borrowing
Lifetimes
Summary
Chapter 2: Data Handling and Preprocessing
Introduction
Understanding Data Preprocessing
Why Preprocess Data?
Process of Data Handling and Preprocessing
Sample Dataset Overview
Loading and Parsing Data
Setting up Project
Adding Dependencies
Defining Data Structure
Fetching Dataset
Loading and Parsing CSV Data
Main Function
Exploring Dataset
Viewing Sample Records
Counting Unique Job Titles
Calculating Average Salary
Data Cleaning
Handling Missing Values
Removing Duplicates
Correcting Inconsistencies
Data Transformation
Scaling Numerical Data
Encoding Categorical Variables
Feature Engineering
Data Splitting
Final Code Integration
Summary
Chapter 3: Descriptive Statistics
Introduction
Understanding Descriptive Statistics
Measures of Central Tendency
Calculating Mean
Calculating Median
Calculating Mode
Applying All Measures of Central Tendency
Measures of Dispersion
Calculating Range
Calculating Variance
Calculating Standard Deviation
Applying All Measures of Dispersion
Exploratory Data Analysis (EDA)
EDA Overview
Visualizing Data with Plotters
Analyzing Categorical Variables
Correlation Analysis
Summarizing Data with Descriptive Statistics
Sample Summary Table
Implementing a Summary Function
Sample Program: Analyzing Our Dataset
Summary
Chapter 4: Probability Distributions and Random Variables
Introduction
Understanding Random Variables and Probability Distributions
Random Variables
Probability Distributions
Discrete Probability Distributions
Common Discrete Distributions
Uniform Distribution
Bernoulli Distribution
Binomial Distribution
Poisson Distribution
Geometric Distribution
Continuous Probability Distributions
Uniform Distribution
Normal (Gaussian) Distribution
Exponential Distribution
Beta Distribution
Gamma Distribution
Generating Random Variables
Sampling from Distributions
Sampling Benefits
Sample Program: Sampling from a Normal Distribution
Estimating Distribution Parameters
Method of Moments (MoM)
Maximum Likelihood Estimation (MLE)
Bayesian Estimation
Least Squares
Summary
Chapter 5: Inferential Statistics
Introduction
Fundamentals of Inferential Statistics
Why Inferential Statistics?
Key Concepts
Hypothesis Testing
Hypothesis Testing Process
Sample Program: Comparing Salaries
Performing Hypothesis Testing
Chi-Square Test for Independence
Confidence Intervals
Confidence Interval for Mean
Confidence Interval for Proportion
Parametric Tests
Paired T-Test
One-Way ANOVA
Non-Parametric Tests
Wilcoxon Rank-Sum Test (Mann-Whitney U Test)
Kruskal-Wallis Test
Summary
Chapter 6: Regression Analysis
Overview
Introduction to Regression Analysis
Overview
Applications of Regression Analysis
Types of Regression Analysis
Simple Linear Regression
Understanding Equation
Applying Simple Regression
Multiple Linear Regression
Understanding Equation
Applying Multiple Regression
Polynomial Regression
Understanding the Equation
Applying Polynomial Regression
Logistic Regression
Understanding Equation
Applying Logistic Regression
Summary
Chapter 7: Bayesian Statistics
Overview
Introduction to Bayesian Statistics
Overview
Bayes' Theorem
Advantages of Bayesian Statistics
Bayesian Inference
Understanding Bayesian Inference
Bayesian Inference Procedure
Putting Bayesian Inference into Action
Advanced Markov Chain Monte Carlo Methods
Introduction to Hamiltonian Monte Carlo (HMC)
Implementing HMC
Model Comparison and Selection
Importance of Model Comparison
Model Comparison using DIC
Model Comparison using WAIC
Summary
Chapter 8: Multivariate Statistical Methods
Introduction
Principal Component Analysis (PCA)
Understanding PCA
Implementing PCA
Canonical Correlation Analysis (CCA)
Understanding CCA
Putting CCA into Action
Linear Discriminant Analysis (LDA)
Understanding LDA
Performing LDA Algorithm
Independent Component Analysis (ICA)
Understanding ICA
Applying ICA
Multidimensional Scaling (MDS)
Understanding MDS
Implementing MDS
Summary
Chapter 9: Nonlinear Models and Machine Learning
Overview
Introduction to Nonlinear Models and Machine Learning
Why Nonlinear Models?
Machine Learning Breakthroughs
Decision Trees
Understanding Decision Trees
Key Characteristics
Building a Decision Tree
Implementing Decision Trees
Support Vector Machines (SVM)
Understanding SVM
Implementing SVM
Neural Networks
Understanding Neural Networks
Implementing Neural Networks
Ensemble Methods
Understanding Ensemble Methods
Implementing Random Forests
Summary
Chapter 10: Model Evaluation and Validation
Introduction
The Importance of Model Evaluation and Validation
Why Model Evaluation?
Common Evaluation Techniques
Train-Test Split
Understanding Train-Test Split
Implementing Train-Test Split
Cross-Validation Technique
Understanding Cross-Validation
Implementing K-Fold Cross-Validation
Hyperparameter Tuning
Understanding Hyperparameter Tuning
Implementing Grid Search
Model Selection Techniques
Understanding AIC and BIC
Implementing AIC and BIC
Resampling Methods
Understanding Resampling Methods
Implementing Bootstrapping and Permutation Tests
Summary
Chapter 11: Text and Natural Language Processing
Introduction
Overview of Natural Language Processing
Historical Development of NLP
Adoption and Applications of NLP
Key Processes in Natural Language Processing
Tokenization
Stopword Removal
Stemming and Lemmatization
Part-of-Speech (POS) Tagging
Named Entity Recognition (NER)
Dependency Parsing
Sentiment Analysis
Machine Translation
Text Summarization
Text Preprocessing and Tokenization
Key Preprocessing Techniques
Tokenization Approaches
Implementing Text Preprocessing and Tokenization
Implement Stopword Removal
Stemming and Lemmatization
Understanding Stemming and Lemmatization
Implementing Stemming
Information Retrieval with TF-IDF
Understanding TF-IDF
Implementing TF-IDF
Word Embeddings and Word2Vec
Understanding Word Embeddings
Implementing Word Embeddings
Summary
Index
Epilogue
Chapter 1: Introduction to Rust for Statisticians
Overview
In this first chapter, I will introduce you to the fascinating world of statistics and Rust