Discover millions of ebooks, audiobooks, and so much more with a free trial

Only €10,99/month after trial. Cancel anytime.

Statistics with Rust, Second Edition
Statistics with Rust, Second Edition
Statistics with Rust, Second Edition
Ebook297 pages2 hours

Statistics with Rust, Second Edition

Rating: 0 out of 5 stars

()

Read preview

About this ebook

"Statistics with Rust, Second Edition" is designed to help you learn quickly, focusing on practical statistics using Rust scripts. The book is for readers who know the basics of statistics and machine learning. It gives quick explanations so you can try out concepts with hands-on coding. The book uses the newest version of Rust, 1.72.0, to help users build and secure statistical and machine learning algorithms. Each chapter is full of useful programs and code examples that will walk you through tasks like data manipulation, statistical tests, regression analysis, building machine learning models, and natural language processing.

This second edition brings all chapters up to date with the latest in stats and Rust programming. It focuses on how you can put these things to practical use, with a detailed look at advanced algorithms like PCA, SVM, neural networks, and ensemble methods. We've also included some natural language processing topics, such as text preprocessing, tokenization, and word embeddings. The book also shows you how to combine Rust's performance and safety with statistical analysis, giving you the tools you need to do data analysis efficiently and reliably.

 

 

Table of Content

Introduction to Rust for Statisticians

Data Handling and Preprocessing

Descriptive Statistics

Probability Distributions and Random Variables

Inferential Statistics

Regression Analysis

Bayesian Statistics

Multivariate Statistical Methods

Nonlinear Models and Machine Learning

Model Evaluation and Validation

Text and Natural Language Processing

LanguageEnglish
PublisherGitforGits
Release dateOct 10, 2024
ISBN9798230554684
Statistics with Rust, Second Edition

Related to Statistics with Rust, Second Edition

Related ebooks

Intelligence (AI) & Semantics For You

View More

Reviews for Statistics with Rust, Second Edition

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Statistics with Rust, Second Edition - Keiko Nakamura

    Statistics with Rust

    Second Edition

    Explore rust programming and its powerful crates across data science, machine learning and NLP projects

    Keiko Nakamura

    Preface

    Statistics with Rust, Second Edition is designed to help you learn quickly, focusing on practical statistics using Rust scripts. The book is for readers who know the basics of statistics and machine learning. It gives quick explanations so you can try out concepts with hands-on coding.

    The book uses the newest version of Rust, 1.72.0, to help users build and secure statistical and machine learning algorithms. Each chapter is full of useful programs and code examples that will walk you through tasks like data manipulation, statistical tests, regression analysis, building machine learning models, and natural language processing.

    We've covered great Rust crates featured throughout, including:

    ndarray and ndarray-linalg: For efficient handling of multi-dimensional arrays and linear algebra operations.

    ndarray-stats: To perform statistical computations on arrays.

    rand and rand_distr: For generating random numbers and working with probability distributions.

    smartcore: A machine learning library used for implementing algorithms like decision trees and random forests.

    linfa: A toolkit providing implementations of Support Vector Machines and other algorithms.

    tch: Rust bindings for PyTorch, enabling the creation and training of neural networks.

    finalfusion: For working with word embeddings in natural language processing tasks.

    rust-stemmers: To perform stemming in text preprocessing.

    regex: For pattern matching and text manipulation.

    unicode-segmentation: To accurately tokenize Unicode strings.

    This second edition brings all chapters up to date with the latest in stats and Rust programming. It focuses on how you can put these things to practical use, with a detailed look at advanced algorithms like PCA, SVM, neural networks, and ensemble methods. We've also included some natural language processing topics, such as text preprocessing, tokenization, and word embeddings.

    The book also shows you how to combine Rust's performance and safety with statistical analysis, giving you the tools you need to do data analysis efficiently and reliably. The book's got lots of practical code and explanations that are easy to understand, which helps you learn the skills you need to get to grips with data using Rust.

    GitforGits

    Prerequisites

    This book is perfect for every data user, including data scientist, NLP engineers, rust programmers, data engineers, data analysts and all those who are knowning simple basics of statistics and eager to use Rust programming for data science and machine learning projects.

    Codes Usage

    Are you in need of some helpful code examples to assist you in your programming and documentation? Look no further! Our book offers a wealth of supplemental material, including code examples and exercises.

    Not only is this book here to aid you in getting your job done, but you have our permission to use the example code in your programs and documentation. However, please note that if you are reproducing a significant portion of the code, we do require you to contact us for permission.

    But don't worry, using several chunks of code from this book in your program or answering a question by citing our book and quoting example code does not require permission. But if you do choose to give credit, an attribution typically includes the title, author, publisher, and ISBN. For example, Statistics with Rust, Second Edition by Keiko Nakamura.

    If you are unsure whether your intended use of the code examples falls under fair use or the permissions outlined above, please do not hesitate to reach out to us at support@gitforgits.com. 

    We are happy to assist and clarify any concerns.

    Prologue

    I set out to write Statistics with Rust with one clear vision: to merge the precision of statistical analysis with the performance and safety of Rust programming. This second edition builds upon the foundation of the first. The journey has been both challenging and rewarding, and I am pleased to present it. Rust has evolved remarkably over the past few years. It is now at version 1.72.0 and has brought enhanced features and a growing ecosystem of libraries. I was inspired to revisit the original content and incorporate the latest advancements in both statistics and Rust. My goal has always been to provide a hands-on experience, and this edition intensifies that focus with more practical programs and real-world examples.

    This edition includes a deeper exploration of machine learning and natural language processing, which I am excited to share. I have added new chapters on nonlinear models, multivariate techniques, and text analysis. You will find implementations of algorithms like Support Vector Machines, Neural Networks, and Principal Component Analysis, all using Rust's powerful crates such as smartcore, linfa, and tch. These examples prove that Rust is the ideal tool for complex data analysis tasks.

    Working with real datasets often presents challenges like handling large volumes of data and ensuring code efficiency. Rust is the ideal choice for tackling these issues thanks to its memory safety and performance. The book will show you how to leverage Rust's features to write robust and efficient code. You'll learn how to manipulate data using ndarray, perform statistical computations with ndarray-stats, and process text data with crates like regex and unicode-segmentation. Furthermore, I have made it a priority to include a strong focus on practical application. Each chapter includes code snippets and full programs that you can run and modify. You will have hands-on projects that reinforce the concepts discussed, whether you're building a regression model, performing hypothesis testing, or working on time series analysis.

    I am especially proud of how this book bridges the gap between theory and practice. Statistics can feel abstract, but I've made them tangible and accessible by implementing the concepts in Rust. I explain not just the how, but also the why behind each method. This helps you develop a deeper understanding of both statistics and programming. Writing this book has been an incredible journey of learning and discovery for me. The Rust community is growing and innovating in ways that continually inspire me. This book will empower you to harness Rust's capabilities in your statistical work and open up new possibilities for analysis and application.

    I'm grateful for the opportunity to share this passion with you. I'm confident you'll apply these tools and concepts in your own projects, and I'm certain we'll continue to make advancements together in the fields of statistics and Rust programming.

    — Keiko Nakamura

    Copyright © 2024 by GitforGits

    All rights reserved. This book is protected under copyright laws and no part of it may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without the prior written permission of the publisher. Any unauthorized reproduction, distribution, or transmission of this work may result in civil and criminal penalties and will be dealt with in the respective jurisdiction at anywhere in India, in accordance with the applicable copyright laws.

    Published by: GitforGits

    Publisher: Sonal Dhandre

    www.gitforgits.com

    support@gitforgits.com

    Printed in India

    First Printing: October 2024

    Cover Design by: Kitten Publishing

    For permission to use material from this book, please contact GitforGits at support@gitforgits.com.

    Content

    Preface

    GitforGits

    Acknowledgement

    Chapter 1: Introduction to Rust for Statisticians

    Overview

    Why Rust for Data Analysis and Statistics?

    High Performance

    Memory Safety and Reliability

    Safe Concurrency

    Interoperability and Ecosystem Integration

    WebAssembly and Cross-Platform Capabilities

    Modern Language for Modern Challenges

    Comparing Rust and Python for Statistics

    Performance Considerations

    Memory Management and Safety

    Concurrency and Parallelism

    Ecosystem and Libraries

    Development Experience

    Scalability and Maintainability

    Deployment and Portability

    Setting up Rust on Linux

    Install Required Dependencies

    Install Rustup

    Configure Environment

    Essential Rust Libraries for Statistics

    ‘Ndarray’ for N-Dimensional Arrays

    ‘Polars’ for DataFrames

    ‘Statrs’ for Statistical Computations

    ‘Plotters’ for Data Visualization

    Setting up Statistical Project

    Create a New Cargo Project

    Add Dependencies

    Build Project

    Install System Dependencies

    Write Code

    Run Project

    Understanding Rust's Ownership Model

    Ownership Rules

    Borrowing

    Lifetimes

    Summary

    Chapter 2: Data Handling and Preprocessing

    Introduction

    Understanding Data Preprocessing

    Why Preprocess Data?

    Process of Data Handling and Preprocessing

    Sample Dataset Overview

    Loading and Parsing Data

    Setting up Project

    Adding Dependencies

    Defining Data Structure

    Fetching Dataset

    Loading and Parsing CSV Data

    Main Function

    Exploring Dataset

    Viewing Sample Records

    Counting Unique Job Titles

    Calculating Average Salary

    Data Cleaning

    Handling Missing Values

    Removing Duplicates

    Correcting Inconsistencies

    Data Transformation

    Scaling Numerical Data

    Encoding Categorical Variables

    Feature Engineering

    Data Splitting

    Final Code Integration

    Summary

    Chapter 3: Descriptive Statistics

    Introduction

    Understanding Descriptive Statistics

    Measures of Central Tendency

    Calculating Mean

    Calculating Median

    Calculating Mode

    Applying All Measures of Central Tendency

    Measures of Dispersion

    Calculating Range

    Calculating Variance

    Calculating Standard Deviation

    Applying All Measures of Dispersion

    Exploratory Data Analysis (EDA)

    EDA Overview

    Visualizing Data with Plotters

    Analyzing Categorical Variables

    Correlation Analysis

    Summarizing Data with Descriptive Statistics

    Sample Summary Table

    Implementing a Summary Function

    Sample Program: Analyzing Our Dataset

    Summary

    Chapter 4: Probability Distributions and Random Variables

    Introduction

    Understanding Random Variables and Probability Distributions

    Random Variables

    Probability Distributions

    Discrete Probability Distributions

    Common Discrete Distributions

    Uniform Distribution

    Bernoulli Distribution

    Binomial Distribution

    Poisson Distribution

    Geometric Distribution

    Continuous Probability Distributions

    Uniform Distribution

    Normal (Gaussian) Distribution

    Exponential Distribution

    Beta Distribution

    Gamma Distribution

    Generating Random Variables

    Sampling from Distributions

    Sampling Benefits

    Sample Program: Sampling from a Normal Distribution

    Estimating Distribution Parameters

    Method of Moments (MoM)

    Maximum Likelihood Estimation (MLE)

    Bayesian Estimation

    Least Squares

    Summary

    Chapter 5: Inferential Statistics

    Introduction

    Fundamentals of Inferential Statistics

    Why Inferential Statistics?

    Key Concepts

    Hypothesis Testing

    Hypothesis Testing Process

    Sample Program: Comparing Salaries

    Performing Hypothesis Testing

    Chi-Square Test for Independence

    Confidence Intervals

    Confidence Interval for Mean

    Confidence Interval for Proportion

    Parametric Tests

    Paired T-Test

    One-Way ANOVA

    Non-Parametric Tests

    Wilcoxon Rank-Sum Test (Mann-Whitney U Test)

    Kruskal-Wallis Test

    Summary

    Chapter 6: Regression Analysis

    Overview

    Introduction to Regression Analysis

    Overview

    Applications of Regression Analysis

    Types of Regression Analysis

    Simple Linear Regression

    Understanding Equation

    Applying Simple Regression

    Multiple Linear Regression

    Understanding Equation

    Applying Multiple Regression

    Polynomial Regression

    Understanding the Equation

    Applying Polynomial Regression

    Logistic Regression

    Understanding Equation

    Applying Logistic Regression

    Summary

    Chapter 7: Bayesian Statistics

    Overview

    Introduction to Bayesian Statistics

    Overview

    Bayes' Theorem

    Advantages of Bayesian Statistics

    Bayesian Inference

    Understanding Bayesian Inference

    Bayesian Inference Procedure

    Putting Bayesian Inference into Action

    Advanced Markov Chain Monte Carlo Methods

    Introduction to Hamiltonian Monte Carlo (HMC)

    Implementing HMC

    Model Comparison and Selection

    Importance of Model Comparison

    Model Comparison using DIC

    Model Comparison using WAIC

    Summary

    Chapter 8: Multivariate Statistical Methods

    Introduction

    Principal Component Analysis (PCA)

    Understanding PCA

    Implementing PCA

    Canonical Correlation Analysis (CCA)

    Understanding CCA

    Putting CCA into Action

    Linear Discriminant Analysis (LDA)

    Understanding LDA

    Performing LDA Algorithm

    Independent Component Analysis (ICA)

    Understanding ICA

    Applying ICA

    Multidimensional Scaling (MDS)

    Understanding MDS

    Implementing MDS

    Summary

    Chapter 9: Nonlinear Models and Machine Learning

    Overview

    Introduction to Nonlinear Models and Machine Learning

    Why Nonlinear Models?

    Machine Learning Breakthroughs

    Decision Trees

    Understanding Decision Trees

    Key Characteristics

    Building a Decision Tree

    Implementing Decision Trees

    Support Vector Machines (SVM)

    Understanding SVM

    Implementing SVM

    Neural Networks

    Understanding Neural Networks

    Implementing Neural Networks

    Ensemble Methods

    Understanding Ensemble Methods

    Implementing Random Forests

    Summary

    Chapter 10: Model Evaluation and Validation

    Introduction

    The Importance of Model Evaluation and Validation

    Why Model Evaluation?

    Common Evaluation Techniques

    Train-Test Split

    Understanding Train-Test Split

    Implementing Train-Test Split

    Cross-Validation Technique

    Understanding Cross-Validation

    Implementing K-Fold Cross-Validation

    Hyperparameter Tuning

    Understanding Hyperparameter Tuning

    Implementing Grid Search

    Model Selection Techniques

    Understanding AIC and BIC

    Implementing AIC and BIC

    Resampling Methods

    Understanding Resampling Methods

    Implementing Bootstrapping and Permutation Tests

    Summary

    Chapter 11: Text and Natural Language Processing

    Introduction

    Overview of Natural Language Processing

    Historical Development of NLP

    Adoption and Applications of NLP

    Key Processes in Natural Language Processing

    Tokenization

    Stopword Removal

    Stemming and Lemmatization

    Part-of-Speech (POS) Tagging

    Named Entity Recognition (NER)

    Dependency Parsing

    Sentiment Analysis

    Machine Translation

    Text Summarization

    Text Preprocessing and Tokenization

    Key Preprocessing Techniques

    Tokenization Approaches

    Implementing Text Preprocessing and Tokenization

    Implement Stopword Removal

    Stemming and Lemmatization

    Understanding Stemming and Lemmatization

    Implementing Stemming

    Information Retrieval with TF-IDF

    Understanding TF-IDF

    Implementing TF-IDF

    Word Embeddings and Word2Vec

    Understanding Word Embeddings

    Implementing Word Embeddings

    Summary

    Index

    Epilogue

    Chapter 1: Introduction to Rust for Statisticians

    Overview

    In this first chapter, I will introduce you to the fascinating world of statistics and Rust

    Enjoying the preview?
    Page 1 of 1