diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 00000000..a807cf4c --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,24 @@ +## PR Description + +_Add your description_ + +## Related Issues + +- Closes # + +### Checklist + +- [ ] I have gone through the [contributing guide](https://github.com/animator/learn-python/blob/main/CONTRIBUTING.md) +- [ ] I have updated my branch and synced it with project `main` branch before making this PR + +## Undertaking + +I declare that: + +1. The content I am submitting is original and has not been plagiarized. +2. No portion of the work has been copied from any other source without proper attribution. +3. The work has been checked for plagiarism, and I assure its authenticity. + +I understand that any violation of this undertaking may have legal consequences that I will bear and could result in the withdrawal of any recognition associated with the work. + +- [ ] I Agree diff --git a/.gitignore b/.gitignore index c2bb62ee..912ae74d 100644 --- a/.gitignore +++ b/.gitignore @@ -11,3 +11,6 @@ book.pdf cover.pdf README-pdf.md style.theme +images/banner.png +images/favicon.ico +cover.png diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 00000000..41dd9ebe --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,127 @@ +# Contributor Covenant Code of Conduct + +## Our Pledge + +We as members, contributors, and leaders pledge to make participation in our +community a harassment-free experience for everyone, regardless of age, body +size, visible or invisible disability, ethnicity, sex characteristics, gender +identity and expression, level of experience, education, socio-economic status, +nationality, personal appearance, race, religion, or sexual identity +and orientation. + +We pledge to act and interact in ways that contribute to an open, welcoming, +diverse, inclusive, and healthy community. + +## Our Standards + +Examples of behavior that contributes to a positive environment for our +community include: + +* Demonstrating empathy and kindness toward other people +* Being respectful of differing opinions, viewpoints, and experiences +* Giving and gracefully accepting constructive feedback +* Accepting responsibility and apologizing to those affected by our mistakes, + and learning from the experience +* Focusing on what is best not just for us as individuals, but for the + overall community + +Examples of unacceptable behavior include: + +* The use of sexualized language or imagery, and sexual attention or + advances of any kind +* Trolling, insulting or derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or email + address, without their explicit permission +* Other conduct which could reasonably be considered inappropriate in a + professional setting + +## Enforcement Responsibilities + +Community leaders are responsible for clarifying and enforcing our standards of +acceptable behavior and will take appropriate and fair corrective action in +response to any behavior that they deem inappropriate, threatening, offensive, +or harmful. + +Community leaders have the right and responsibility to remove, edit, or reject +comments, commits, code, wiki edits, issues, and other contributions that are +not aligned to this Code of Conduct, and will communicate reasons for moderation +decisions when appropriate. + +## Scope + +This Code of Conduct applies within all community spaces, and also applies when +an individual is officially representing the community in public spaces. +Examples of representing our community include using an official e-mail address, +posting via an official social media account, or acting as an appointed +representative at an online or offline event. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported to the community leaders responsible for enforcement. +All complaints will be reviewed and investigated promptly and fairly. + +All community leaders are obligated to respect the privacy and security of the +reporter of any incident. + +## Enforcement Guidelines + +Community leaders will follow these Community Impact Guidelines in determining +the consequences for any action they deem in violation of this Code of Conduct: + +### 1. Correction + +**Community Impact**: Use of inappropriate language or other behavior deemed +unprofessional or unwelcome in the community. + +**Consequence**: A private, written warning from community leaders, providing +clarity around the nature of the violation and an explanation of why the +behavior was inappropriate. A public apology may be requested. + +### 2. Warning + +**Community Impact**: A violation through a single incident or series +of actions. + +**Consequence**: A warning with consequences for continued behavior. No +interaction with the people involved, including unsolicited interaction with +those enforcing the Code of Conduct, for a specified period of time. This +includes avoiding interactions in community spaces as well as external channels +like social media. Violating these terms may lead to a temporary or +permanent ban. + +### 3. Temporary Ban + +**Community Impact**: A serious violation of community standards, including +sustained inappropriate behavior. + +**Consequence**: A temporary ban from any sort of interaction or public +communication with the community for a specified period of time. No public or +private interaction with the people involved, including unsolicited interaction +with those enforcing the Code of Conduct, is allowed during this period. +Violating these terms may lead to a permanent ban. + +### 4. Permanent Ban + +**Community Impact**: Demonstrating a pattern of violation of community +standards, including sustained inappropriate behavior, harassment of an +individual, or aggression toward or disparagement of classes of individuals. + +**Consequence**: A permanent ban from any sort of public interaction within +the community. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], +version 2.0, available at +https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. + +Community Impact Guidelines were inspired by [Mozilla's code of conduct +enforcement ladder](https://github.com/mozilla/diversity). + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see the FAQ at +https://www.contributor-covenant.org/faq. Translations are available at +https://www.contributor-covenant.org/translations. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d307781b..0a046b41 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,14 +1,55 @@ # Contributing Guidelines -Contributions welcome! +We value your participation in this open source project. This page will give you a quick overview of how to get involved. -**Before spending lots of time on something, ask for feedback on your idea first!** +## Important Note -Please search issues and pull requests before adding something new to avoid duplicating efforts and conversations. +- Do not raise issues or send PRs for changing issue template, adding header-footer, badges or any buttons. +- Do not raise issues or send PRs for for any website changes. +- Do not choose an entire topic or a whole bunch of sections. Just choose a single small section for which you will be contributing when you raise a new issue. -This project welcomes following types of contributions: +## How can I contribute? -- **Ideas**: participate in an issue thread or start your own to have your voice heard. -- **Writing**: contribute your expertise in an area by helping expand the included content. -- **Copy editing**: fix typos, clarify language, and generally improve the quality of the content. -- **Formatting**: help keep content easy to read with consistent formatting. +You can contribute to this project by adding content on a new topic or improving existing content (in the `README.md` file). + +The list of topics for which we are looking for content are provided below along with the location where the content has to be added: + +- Advanced Python - [Link](https://github.com/animator/learn-python/tree/main/contrib/advanced-python) +- Pandas - [Link](https://github.com/animator/learn-python/tree/main/contrib/pandas) +- NumPy - [Link](https://github.com/animator/learn-python/tree/main/contrib/numpy) +- SciPy - [Link](https://github.com/animator/learn-python/tree/main/contrib/scipy) +- Data Science & Machine Learning - [Link](https://github.com/animator/learn-python/tree/main/contrib/machine-learning) +- Plotting & Visualization - [Link](https://github.com/animator/learn-python/tree/main/contrib/plotting-visualization) +- Interacting with Databases - [Link](https://github.com/animator/learn-python/tree/main/contrib/database) +- Web Scrapping - [Link](https://github.com/animator/learn-python/tree/main/contrib/web-scrapping) +- API Development - [Link](https://github.com/animator/learn-python/tree/main/contrib/api-development) +- Data Structures & Algorithms - [Link](https://github.com/animator/learn-python/tree/main/contrib/ds-algorithms) **(Not accepting)** +- Python Mini Projects - [Link](https://github.com/animator/learn-python/tree/main/contrib/mini-projects) **(Not accepting)** +- Python Question Bank - [Link](https://github.com/animator/learn-python/tree/main/contrib/question-bank) **(Not accepting)** + +You can check out some content ideas below. + +## Process + +**Step 1**: Raise a **new issue** that you want to "Add content". We will assign the issue to you and label it. +**Do not choose an entire topic or a whole bunch of sections. Just choose a single small section for which you will be contributing when you raise a new issue.** +**Step 2**: Star and fork THIS repository. +**Step 3**: Now in your fork, go the correct topic folder as provided in the links above. +**Step 4**: Edit `index.md` file to add the title of the content and the corresponding file name where the content will be available in this folder. +**Step 5**: Add the content in markdown format in the file (extension `.md`). +**Step 6**: Raise a PR with your changes. Accept the pledge that the content is original and not stolen from any other source. +**Step 7**: Wait for review and PR merge. + +## Some Content Ideas + +- **NumPy**: Introduction, Arrays, Indexing and Slicing, Operations on Arrays, Concatenating Arrays, Reshaping Arrays, Splitting Arrays, Statistical Operations on Arrays, Loading Arrays from Files, Saving NumPy Arrays in Files, etc. +- **Pandas**: Introduction, Importing and Exporting Data, DataFrames, Pandas Series Vs NumPy ndarray, Descriptive Statistics, Data Aggregations, Sorting a DataFrame, Group by Functions, Altering the Index, Other DataFrame Operations, Handling Missing Values, Import and Export of Data between Pandas and MySQL, Pandas plotting, etc. +- **Plotting & Visualization**: Matplotlib, Seaborn, Customisation of Plots (Marker, colour, Line width and Line Style), Line chart, Bar Chart, Histogram, Scatter Chart, Plotting Quartiles and Box plot, Pie Chart, etc. +- **Data Science & Machine Learning**: Scikit-learn, TensorFlow, PyTorch, regression, classification, clustering, ensemble model, deep learning, etc. +- **API Development**: Building APIs using FastAPI, CRUD API Development, etc. +- **Web Scrapping**: beautifulsoup, requests, etc. +- **Advanced Python**: OOP In Python, Generators, List Comprehensions, Lambda functions, In-depth Function Arguments, Regular Expressions, Exception Handling, Partial functions, Code Introspection, Closures, Decorators, Map, Filter, Reduce, etc. + +## Any doubt? + +In case you are new to the open source ecosystem, we would be more than happy to guide you through the entire process. Just join our [Discord server](https://bit.ly/heyfoss) and drop a message in relevant channel. diff --git a/README.md b/README.md index f6ceaa89..5cb7379d 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,7 @@ +[![Discord Server Invite](https://img.shields.io/badge/DISCORD-JOIN%20SERVER-5663F7?style=for-the-badge&logo=discord&logoColor=white)](https://bit.ly/heyfoss) + +Contributors should go through the [Contributing Guide](https://github.com/animator/learn-python/blob/main/CONTRIBUTING.md) to learn how you can contribute to the project. + ![Learn Python 3 Logo](images/learn-python.png) Learn Python 3 @@ -5,15 +9,13 @@ Learn Python 3 by **Ankit Mahato** [[About](https://animator.github.io)] *Version 2022.10* -[![Discord Server Invite](https://img.shields.io/badge/DISCORD-JOIN%20SERVER-brightgreen?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/2s49SCNfyJ) - # How to read this book? This book can be consumed in 3 ways: - A nice web interface - [Link](https://animator.github.io/learn-python/) ![Learn Python 3 Website](images/web.png) -- A Downloadable PDF - [Link](https://github.com/animator/learn-python/blob/main/pdf/learn-python-v2022.10.pdf) +- A Downloadable PDF - [Link](https://github.com/animator/learn-python/raw/main/pdf/learn-python-v2022.10.pdf) ![Learn Python 3 PDF](images/pdf.png) - Directly on GitHub - [Link](https://github.com/animator/learn-python) @@ -211,7 +213,7 @@ A **software** is a collection of programs where each program provides a sequenc These instructions have to be provided in **machine language** or **low level language** (0s and 1s) that is difficult to read or write for a human being. -This led to the invention of **high-level programming languages** in which programs can be easily written and managed. The human-readable programs written using high-level languages are converted into computer-readable machine code or bytecode using **compilers** or **interpreters**. +This led to the invention of **high-level programming languages** in which programs can be easily written and managed. The human-readable programs written using high-level languages are converted into computer-readable machine code or byte-code using **compilers** or **interpreters**. There are many high-level programming languages that are currently in wide use. @@ -219,7 +221,7 @@ Some of the popular languages are Java, C, C++, C#, Go, Swift, JavaScript, PHP, ## Introduction to Python -Guido van Rossum started the development of Python in December 1989. He released the first version (0.9.9) of Python for general public on February 20, 1991. +Guido van Rossum started the development of Python in December 1989. He released the first version (0.9.0) of Python for general public on February 20, 1991. The language evolved over the next few decades and so did its definition, the current version of which is stated below: @@ -287,7 +289,7 @@ Python programmers also have at their disposal the vast ecosystem of more than 2 **6. Web Application Development** -Some of the most popular web development frameworks (django, flask, etc.) are written in Python. This coupled with the availablity of packages to connect to any database makes Python a great choice for web application development. +Some of the most popular web development frameworks (django, flask, etc.) are written in Python. This coupled with the availability of packages to connect to any database makes Python a great choice for web application development. ## Installing Python in Windows @@ -658,7 +660,7 @@ The backslash (`\`) character can be used in a string literal to escape characte | `\"` | Double quote (`"`) | | `\a` | ASCII Bell (BEL) | | `\b` | ASCII Backspace (BS) | -| `\f` | ASCII Formfeed (FF) | +| `\f` | ASCII Form-feed (FF) | | `\n` | ASCII Linefeed (LF) | | `\r` | ASCII Carriage Return (CR) | | `\t` | ASCII Horizontal Tab (TAB) | @@ -2300,7 +2302,7 @@ Evaluate the expression **Solution** -Parantesized expression `(...)` has the highest precedence so `+` is evaluated first +Parenthesized expression `(...)` has the highest precedence so `+` is evaluated first `15 - (2 + 4)` = `15 - 6` = `9` @@ -2725,7 +2727,7 @@ Exception handling is the process of properly handling an exception which can po When an error occurs, the program throws an exception. -The runtime system attempts to find an **exception handler**, a block of code that can handle a particular type of error. Once located, the suitable exception handler **catches the exeception** and executes the code block which can attempt to recover from the error. In case the error is unrecoverable, the handler provides a way to gently exit the program. +The runtime system attempts to find an **exception handler**, a block of code that can handle a particular type of error. Once located, the suitable exception handler **catches the exception** and executes the code block which can attempt to recover from the error. In case the error is unrecoverable, the handler provides a way to gently exit the program. The `try` statement in Python specifies the exception handlers and/or cleanup code for a code block. @@ -2734,7 +2736,7 @@ The various parts of a try statement are: - `try` block: The block of statements within which an exception might be thrown. - `except` clause(s): One or more exception handlers. Each `except` clause handles a particular type of exception. In case an exception of a particular type occurs in the `try` block, the corresponding `except` clause code block is executed. - `else` clause: An optional `else` clause can also be included after the last `except` block. In case no exception is raised, none of the `except` blocks are executed. In this case, the `else` code block is executed. -- `finally` clause: An optional `finally` clause can be added at the end of the try statement which includes a block of statements that are executed regardless of whether or not any error occured inside the try block. This block is usually setup for code cleanup and closing all open file objects. +- `finally` clause: An optional `finally` clause can be added at the end of the try statement which includes a block of statements that are executed regardless of whether or not any error occurred inside the try block. This block is usually setup for code cleanup and closing all open file objects. Here's the general form of these statements: @@ -3539,7 +3541,7 @@ The backslash (`\`) character can be used in a string to escape characters that | `\"` | Double quote (`"`) | | `\a` | ASCII Bell (BEL) | | `\b` | ASCII Backspace (BS) | -| `\f` | ASCII Formfeed (FF) | +| `\f` | ASCII Form-feed (FF) | | `\n` | ASCII Linefeed (LF) | | `\r` | ASCII Carriage Return (CR) | | `\t` | ASCII Horizontal Tab (TAB) | @@ -3997,7 +3999,7 @@ It has to be noted that the method counts non-overlapping occurrences, so it doe 2 ``` -In the above example, `ala` is counted twice as the first occurence is in `valh"ala"` and the next occurance is in `"ala"la`. Although `ala` can be located again in `al"ala"`, it overlaps with the occurance `"ala"la`, hence it is not counted. +In the above example, `ala` is counted twice as the first occurrence is in `valh"ala"` and the next occurrence is in `"ala"la`. Although `ala` can be located again in `al"ala"`, it overlaps with the occurrence `"ala"la`, hence it is not counted. ### find() @@ -4639,7 +4641,7 @@ If you do not wish to modify the existing list and create a new list with items Python lists have a built-in `sort()` method which sorts the items in-place using `<` comparisons between items. -The method also accepts 2 keyworded arguments: +The method also accepts 2 key-worded arguments: - `key` is used to specify a function which is called on each list element prior to making the comparisons. - `reverse` is a boolean which specifies whether the list is to be sorted in descending order. @@ -5481,7 +5483,7 @@ In case of a tuple, the modified tuple is actually a completely new tuple with c The original tuple is not modified as it is immutable. But, as `t` is no longer pointing to the original tuple, it is freed from memory. -Thus, it is recommended that instead of `+=`, `append()` and `extend()` methods should be employed to add new items programatically as it will raise an error in case the code is trying to modify a tuple. +Thus, it is recommended that instead of `+=`, `append()` and `extend()` methods should be employed to add new items programmatically as it will raise an error in case the code is trying to modify a tuple. ``` python >>> l = ["Hi", "Ed", "Punk"] @@ -5669,7 +5671,7 @@ This method accepts the following: {'yr': 15, 18: True, 'name': 'Ed'} ``` -**3. Keyworded arguments** +**3. Key-worded arguments** ``` python >>> d = {"yr": 20, 18: True} @@ -5811,6 +5813,38 @@ In case of common keys between the two operands (dictionaries), the values of th In the above example, both `d` and `n` shared a common key `yr`. The value corresponding to `yr` in `n` gets priority. +The `|=` (union) augmented assignment operator can be used to update the dictionary with keys and values from another dictionary or an iterable of (key, value) pairs. + +``` python +>>> d = {"book": "Python", "year": 1990} +>>> dnew = {"author": "Guido"} +>>> d |= dnew +>>> d +{'book': 'Python', 'year': 1990, 'author': 'Guido'} +``` + +If both the dictionaries share a common key, then the value corresponding to the key in the right operand is the updated value. + +``` python +>>> d = {"book": "Python", "year": 1990} +>>> dnew = {"author": "Guido", "year": 2000} +>>> d |= dnew +>>> d +{'book': 'Python', 'year': 2000, 'author': 'Guido'} +``` + +In the above example, both `d` and `dnew` shared a common key `year`. The value (`1990`) corresponding to `year` in `d` is updated with the new value. + +The `|=` operator also works in case the right operand is an iterable instead of a dictionary as shown in the example below. + +``` python +>>> d = {"book": "Python", "year": 1990} +>>> inew = [("author", "Guido"), ("year", 2000)] +>>> d |= inew +>>> d +{'book': 'Python', 'year': 2000, 'author': 'Guido'} +``` + ## Traversing a Dictionary Compared to position based indexing of `list` or `tuple`, dictionaries are indexed based on the key. @@ -5936,6 +5970,26 @@ In case of string keys, they return the first and last occurring strings alphabe 3 ``` +### list() + +To get the list of all keys in a dictionary, use the `list()` built-in function. + +``` python +>>> d = {"book": "Python", "year": 1990, "author": "Guido"} +>>> list(d) +['book', 'year', 'author'] +``` + +### sorted() + +To get a sorted list of keys, you can use the `sorted()` built-in function. + +``` python +>>> d = {"book": "Python", "year": 1990, "author": "Guido"} +>>> sorted(d) +['author', 'book', 'year'] +``` + ## Creating a Copy of a Dictionary A new copy of a dictionary can be made using `copy()` method: @@ -6181,7 +6235,7 @@ Python has built-in functions to handle various aspects of datatypes such as che `type()` and `isinstance()` builtin functions are used for checking the data type of objects. -Check out the **Type Checking** section in the chapter **Variable, Objects & Data Types** to learn more about it in detail. +Check out [Type Checking](#type-checking) section in the chapter [Variable, Objects & Data Types](#variables-objects--data-types) to learn more about it in detail. ### Built-in Type Functions @@ -6202,7 +6256,7 @@ The following functions are often used to assign a default value or create an em | `frozenset()` | `frozenset()` | | `dict()` | `{}` | -The **Type Casting** section of the chapter **Variable, Objects & Data Types** covers these functions in detail in case any argument is passed. +The [Type Casting](#type-casting) section of the chapter [Variable, Objects & Data Types](#variables-objects--data-types) covers these functions in detail in case any argument is passed. ## I/O Functions @@ -6323,11 +6377,12 @@ Converts the integer Unicode code point into the corresponding Unicode string. Apart from built-in functions, the **Python Standard Library** also contains a wide range of built-in modules which are a group of functions organized based on functionality. -Some of the commonly used modules are mentioned below: +Some commonly used modules are: - `math` - Mathematical functions - `random` - Generate pseudo-random numbers - `statistics` - Statistical functions +- `copy` - Create shallow and deep copy of objects **Accessing Modules** @@ -6559,6 +6614,67 @@ If there are multiple modes with the same count, the first occurrence in the seq 'a' ``` +## copy Module + +### Limitation of Shallow Copy + +`copy()` method does not recurse to create copies of the child objects, so if the child objects are mutable (example nested list) any modification in the child object will get reflected in the both the parent objects. + +``` python +>>> old_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] + +# Copying old list into a new list +>>> new_list = old_list.copy() + +# Checking if both lists are pointing to the same object +>>> id(new_list)==id(old_list) +False + +# Checking if items of both lists are pointing to the same objects +>>> [id(new_list[idx])==id(old_list[idx]) for idx in range(len(old_list))] +[True, True, True] + +# Modify new list +>>> new_list[1][1] = 0 +>>> new_list +[[1, 2, 3], [4, 0, 6], [7, 8, 9]] +>>> old_list +[[1, 2, 3], [4, 0, 6], [7, 8, 9]] +``` + +As we can see in the output, `new_list[1][1]` was modified which is reflected in both `new_list` and `old_list`. + +The `copy` module provides the `deepcopy()` function which is helpful in mitigating this issue. + +### Deep Copy - deepcopy(x[, memo]) + +Deep copy overcomes the shortcomings of `copy()` and recursively creates copies of the child objects found in the original list. This leads to the creation of an independent copy of the original. + +``` python +>>> import copy +>>> old_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] + +# Copying old list into a new list +>>> new_list = copy.deepcopy(old_list) + +# Checking if both lists are pointing to the same object +>>> id(new_list)==id(old_list) +False + +# Checking if items of both lists are pointing to the same objects +>>> [id(new_list[idx])==id(old_list[idx]) for idx in range(len(old_list))] +[False, False, False] + +# Modify new list +>>> new_list[1][1] = 0 +>>> new_list +[[1, 2, 3], [4, 0, 6], [7, 8, 9]] +>>> old_list +[[1, 2, 3], [4, 5, 6], [7, 8, 9]] +``` + +Note the change in the `id()` equality of the children for `[True, True, True]` to `[False, False, False]`. As we can see in the output, `new_list[1][1]` was modified which gets reflected only in the `new_list`. + # File Handling ## File Handling in Python - Introduction & Overview @@ -7557,9 +7673,9 @@ print(d) Once a function is defined in the Python interpreter, it can be called any number of times. But, these function definitions are lost upon exiting the interpreter. -To solve this problem we can create a python script with the function definitions at the beginning of teh file, followed by the rest of the code which includes statements invoking the defined functions. +To solve this problem we can create a python script with the function definitions at the beginning of the file, followed by the rest of the code which includes statements invoking the defined functions. -But, this process is tedious and not managable as what makes user-defined functions powerful is that the programmer can - **Write once, and use many times**. +But, this process is tedious and not manageable as what makes user-defined functions powerful is that the programmer can - **Write once, and use many times**. Instead of repeating the function definition again and again for each new program, one can put all the function definitions in a file from which the required function can be imported and invoked either in script mode or interactive mode. @@ -7746,6 +7862,6 @@ restaurant/ A package is simply the directory containing sub-packages and modules, but when this package or a collection of packages are made available for others to use (eg. via PyPI) it is known as a **library**. -For example, `restaurant` can be called a library if it provides reusable codes to manage a restaurant and is built using multiple packages which handle the various aspects of a restaurant like human resource management, inventory management, order fulfilment and billing, etc. +For example, `restaurant` can be called a library if it provides reusable codes to manage a restaurant and is built using multiple packages which handle the various aspects of a restaurant like human resource management, inventory management, order fulfillment and billing, etc. One should note that the above definition is not strict and often the terms package and library are used interchangeably. diff --git a/contrib/advanced-python/asynchronous-context-managers-generators.md b/contrib/advanced-python/asynchronous-context-managers-generators.md new file mode 100644 index 00000000..00516495 --- /dev/null +++ b/contrib/advanced-python/asynchronous-context-managers-generators.md @@ -0,0 +1,110 @@ +## Asynchronous Context Managers and Generators in Python +Asynchronous programming in Python allows for more efficient use of resources by enabling tasks to run concurrently. Python provides support for asynchronous +context managers and generators, which help manage resources and perform operations asynchronously. + +### Asynchronous Context Managers +Asynchronous context managers are similar to regular context managers but are designed to work with asynchronous code. They use the async with statement and +typically include the '__aenter__' and '__aexit__' methods. + +### Creating an Asynchronous Context Manager +Here's a simple example of an asynchronous context manager: + +```bash +import asyncio + +class AsyncContextManager: + async def __aenter__(self): + print("Entering context") + await asyncio.sleep(1) # Simulate an async operation + return self + + async def __aexit__(self, exc_type, exc, tb): + print("Exiting context") + await asyncio.sleep(1) # Simulate cleanup + +async def main(): + async with AsyncContextManager() as acm: + print("Inside context") + +asyncio.run(main()) +``` + +Output: + +```bash +Entering context +Inside context +Exiting context +``` + +### Asynchronous Generators +Asynchronous generators allow you to yield values within an asynchronous function. They use the async def syntax along with the yield statement and are +iterated using the async for loop. + +### Creating an Asynchronous Generator +Here's a basic example of an asynchronous generator: + +```bash +import asyncio + +async def async_generator(): + for i in range(5): + await asyncio.sleep(1) # Simulate an async operation + yield i + +async def main(): + async for value in async_generator(): + print(value) + +asyncio.run(main()) +``` +Output: +```bash +0 +1 +2 +3 +4 +``` +### Combining Asynchronous Context Managers and Generators +You can combine asynchronous context managers and generators to create more complex and efficient asynchronous workflows. +Example: Fetching Data with an Async Context Manager and Generator +Consider a scenario where you need to fetch data from an API asynchronously and manage the connection using an asynchronous context manager: +```bash +import aiohttp +import asyncio + +class AsyncHTTPClient: + def __init__(self, url): + self.url = url + + async def __aenter__(self): + self.session = aiohttp.ClientSession() + self.response = await self.session.get(self.url) + return self.response + + async def __aexit__(self, exc_type, exc, tb): + await self.response.release() + await self.session.close() + +async def async_fetch(urls): + for url in urls: + async with AsyncHTTPClient(url) as response: + data = await response.text() + yield data + +async def main(): + urls = ["http://example.com", "http://example.org", "http://example.net"] + async for data in async_fetch(urls): + print(data) + +asyncio.run(main()) +``` +### Benefits of Asynchronous Context Managers and Generators +1. Efficient Resource Management: They help manage resources like network connections or file handles more efficiently by releasing them as soon as they are no longer needed. +2. Concurrency: They enable concurrent operations, improving performance in I/O-bound tasks such as network requests or file I/O. +3. Readability and Maintainability: They provide a clear and structured way to handle asynchronous operations, making the code easier to read and maintain. +### Summary +Asynchronous context managers and generators are powerful tools in Python that enhance the efficiency and readability +of asynchronous code. By using 'async with' for resource management and 'async for' for iteration, you can write more performant and maintainable asynchronous +programs. diff --git a/contrib/advanced-python/closures.md b/contrib/advanced-python/closures.md new file mode 100644 index 00000000..e363c15f --- /dev/null +++ b/contrib/advanced-python/closures.md @@ -0,0 +1,101 @@ +# Closures +In order to have complete understanding of this topic in python, one needs to be crystal clear with the concept of functions and the different types of them which are namely First Class Functions and Nested Functions. + +### First Class Functions +These are the normal functions used by the programmer in routine as they can be assigned to variables, passed as arguments and returned from other functions. +### Nested Functions +These are the functions defined within other functions and involve thorough usage of **Closures**. It is also referred as **Inner Functions** by some books. There are times when it is required to prevent a function or the data it has access to from being accessed from other parts of the code, and this is where Nested Functions come into play. Basically, its usage allows the encapsulation of that particular data/function within another function. This enables it to be virtually hidden from the global scope. + +## Defining Closures +In nested functions, if the outer function basically ends up returning the inner function, in this case the concept of closures comes into play. + +A closure is a function object that remembers values in enclosing scopes even if they are not present in memory. There are certain neccesary condtions required to create a closure in python : +1. The inner function must be defined inside the outer function. +2. The inner function must refer to a value defined in the outer function. +3. The inner function must return a value. + +## Advantages of Closures +* Closures make it possible to pass data to inner functions without first passing them to outer functions +* Closures can be used to create private variables and functions +* They also make it possible to invoke the inner function from outside of the encapsulating outer function. +* It improves code readability and maintainability + +## Examples implementing Closures +### Example 1 : Basic Implementation +```python +def make_multiplier_of(n): + def multiplier(x): + return x * n + return multiplier + +times3 = make_multiplier_of(3) +times5 = make_multiplier_of(5) + +print(times3(9)) +print(times5(3)) +``` +#### Output: +``` +27 +15 +``` +The **multiplier function** is defined inside the **make_multiplier_of function**. It has access to the n variable from the outer scope, even after the make_multiplier_of function has returned. This is an example of a closure. + +### Example 2 : Implementation with Decorators +```python +def decorator_function(original_function): + def wrapper_function(*args, **kwargs): + print(f"Wrapper executed before {original_function.__name__}") + return original_function(*args, **kwargs) + return wrapper_function + +@decorator_function +def display(): + print("Display function executed") + +display() +``` +#### Output: +``` + Wrapper executed before display + Display function executed +``` +The code in the example defines a decorator function: ***decorator_function*** that takes a function as an argument and returns a new function **wrapper_function**. The **wrapper_function** function prints a message to the console before calling the original function which appends the name of the called function as specified in the code. + +The **@decorator_function** syntax is used to apply the decorator_function decorator to the display function. This means that the display function is replaced with the result of calling **decorator_function(display)**. + +When the **display()** function is called, the wrapper_function function is executed instead. The wrapper_function function prints a message to the console and then calls the original display function. +### Example 3 : Implementation with for loop +```python +def create_closures(): + closures = [] + for i in range(5): + def closure(i=i): # Capture current value of i by default argument + return i + closures.append(closure) + return closures + +my_closures = create_closures() +for closure in my_closures: + print(closure()) + +``` +#### Output: +``` +0 +1 +2 +3 +4 +``` +The code in the example defines a function **create_closures** that creates a list of closure functions. Each closure function returns the current value of the loop variable i. + +The closure function is defined inside the **create_closures function**. It has access to the i variable from the **outer scope**, even after the create_closures function has returned. This is an example of a closure. + +The **i**=*i* argument in the closure function is used to capture the current value of *i* by default argument. This is necessary because the ****i** variable in the outer scope is a loop variable, and its value changes in each iteration of the loop. By capturing the current value of *i* in the default argument, we ensure that each closure function returns the correct value of **i**. This is responsible for the generation of output 0,1,2,3,4. + + +For more examples related to closures, [click here](https://dev.to/bshadmehr/understanding-closures-in-python-a-comprehensive-tutorial-11ld). + +## Summary +Closures in Python provide a powerful mechanism for encapsulating state and behavior, enabling more flexible and modular code. Understanding and effectively using closures enables the creation of function factories, allows functions to have state, and facilitates functional programming techniques diff --git a/contrib/advanced-python/dates_and_times.md b/contrib/advanced-python/dates_and_times.md new file mode 100644 index 00000000..983f6b23 --- /dev/null +++ b/contrib/advanced-python/dates_and_times.md @@ -0,0 +1,117 @@ +## Working with Dates and Times in Python +Handling dates and times is an essential aspect of many programming tasks. +Python provides robust modules to work with dates and times, making it easier to perform operations like formatting, parsing, and arithmetic. +This guide provides an overview of these modules and their key functionalities. + +## 1. 'datetime' Module +The datetime module supplies classes for manipulating dates and times. The main classes in the datetime module are: + +* date: Represents a date (year, month, day). +* time: Represents a time (hour, minute, second, microsecond). +* datetime: Combines date and time information. +* timedelta: Represents the difference between two dates or times. +* tzinfo: Provides time zone information objects. + +**Key Concepts:** + +* Naive vs. Aware: Naive datetime objects do not contain time zone information, while aware datetime objects do. +* Immutability: date and time objects are immutable; once created, they cannot be changed. + +Example: +```bash +import datetime +# Get the current date and time +now = datetime.datetime.now() +print("Current date and time:", now) +``` + +## 2. Formatting Dates and Times +Formatting involves converting datetime objects into human-readable strings. This is achieved using the strftime method, which stands for "string format time." +You can specify various format codes to dictate how the output string should be structured. + +**Common Format Codes:** + +* %Y: Year with century (e.g., 2024) +* %m: Month as a zero-padded decimal number (e.g., 01) +* %d: Day of the month as a zero-padded decimal number (e.g., 15) +* %H: Hour (24-hour clock) as a zero-padded decimal number (e.g., 13) +* %M: Minute as a zero-padded decimal number (e.g., 45) +* %S: Second as a zero-padded decimal number (e.g., 30) + +Example: +```bash +import datetime + +now = datetime.datetime.now() +formatted_now = now.strftime("%Y-%m-%d %H:%M:%S") +print("Formatted current date and time:", formatted_now) +``` + +## 3. Parsing Dates and Times +Parsing is the process of converting strings representing dates and times into datetime objects. The strptime method, which stands for "string parse time," +allows you to specify the format of the input string. + +Example: +```bash +import datetime + +date_string = "2024-05-15 13:45:30" +date_object = datetime.datetime.strptime(date_string, "%Y-%m-%d %H:%M:%S") +print("Parsed date and time:", date_object) +``` + +## 4. Working with Time Differences +The timedelta class is used to represent the difference between two datetime objects. This is useful for calculations involving durations, such as finding the +number of days between two dates or adding a certain period to a date. + +Example: +```bash +import datetime + +date1 = datetime.datetime(2024, 5, 15, 12, 0, 0) +date2 = datetime.datetime(2024, 5, 20, 14, 30, 0) + +difference = date2 - date1 +print("Difference:", difference) +print("Days:", difference.days) +print("Total seconds:", difference.total_seconds()) +``` + +## 5. Time Zones +Time zone handling in Python is facilitated by the pytz library. It allows you to convert naive datetime objects into timezone-aware objects and perform +operations across different time zones. + +**Key Concepts:** + +* Timezone-aware: A datetime object that includes timezone information. +* Localization: The process of associating a naive datetime with a time zone. + +Example: +```bash +import datetime +import pytz + +# Define a timezone +tz = pytz.timezone('Asia/Kolkata') + +# Get the current time in a specific timezone +now = datetime.datetime.now(tz) +print("Current time in Asia/Kolkata:", now) +``` + +## 6. Date Arithmetic +Date arithmetic involves performing operations like addition or subtraction on date or datetime objects using timedelta. This is useful for calculating future +or past dates based on a given date. + +Example: +```bash +import datetime + +today = datetime.date.today() +future_date = today + datetime.timedelta(days=10) +print("Date after 10 days:", future_date) +``` + +## Summary +Python’s datetime module and the pytz library provide comprehensive tools for working with dates, times, and time zones. They enable you to perform a wide range +of operations, from basic date manipulations to complex time zone conversions. diff --git a/contrib/advanced-python/decorator-kwargs-args.md b/contrib/advanced-python/decorator-kwargs-args.md new file mode 100644 index 00000000..63a41b36 --- /dev/null +++ b/contrib/advanced-python/decorator-kwargs-args.md @@ -0,0 +1,146 @@ +# Advanced Python +## Functions as First class objects +Functions in Python are so called first class objects, which means they can be treated as variables, viz. functions can be used as arguments or they can be returned using the return keyword. + +**Example** + +```python +def func1(): + def func2(): + print("Printing from the inner function, func2") + return func2 + +``` +Assigning func1 to function_call object +```python +function_call=func1() +``` +Calling the function +```python +>>> function_call() +``` +**Output** +``` +Printing from the inner function, func2 +``` +Here we have seen the use of function as a first class object, func2 was returned as the result of the execution of the outer function, func1. + +## *args +\* is an iterating operator used to unpack datatypes such as lists, tuples etc. +**For example** +```python +tuple1=(1,2,4,5,6,7) +print(tuple1) +print(*tuple1) +``` +In the above we have defined a tuple called tuple1 with the items (1,2,4,5,6,7). +First we print normally and the output for that is: +``` +(1, 2, 4, 5, 6, 7) + +``` +Then we print with the \* operator, where we will get the output as: +``` +1 2 4 5 6 7 +``` + +Here the \* operator has unpacked the tuple, tuple1. + +Now that you have understood why \* is used, we can take a look at *args. *args is used in functions so that positional arguments are stored in the variable args. *args is just a naming convention, *anything can be used +*args makes python functions flexible to handle dynamic arguments. +```python +def test1(*args): + print(args) + print(f"The number of elements in args = {len(args)}") +a=list(range(0,10)) +test1(*a) +``` +In the above snippet, we are sending a list of numbers to the test function which returns the following output: +``` +(0, 1, 2, 3, 4, 5, 6, 7, 8, 9) +The number of elements in args = 10 +``` +If in the test1 we do not use \* in the argument + +```python +def test1(*args): + print(args) + print(f"The number of elements in args = {len(args)}") +a=list(range(0,10)) +test1(a) +``` +we get the following result. This is a tuple containing a list. +``` +([0, 1, 2, 3, 4, 5, 6, 7, 8, 9],) +The number of elements in args = 1 +``` +## **kwargs +**kwargs stands for keyword arguments. This is used for key and value pairs and similar to *args, this makes functions flexible enough to handle dynamic key value pairs in arguments. +```python +def test2(**kwargs): + print(kwargs) + print(f"The number of elements in kwargs = {len(kwargs)}") +test2(a=1,b=2,c=3,d=4,e=5) +``` +The above snippet uses some key-value pairs and out test2 function gives the following output: +``` +{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5} +The number of elements in kwargs = 5 +``` +A dictionary with keys and values is obtained. + +## Decorators (@decorators) +Now that we understand what first class object, *args, **kwargs is, we can move to decorators. Decorators are used to perform a task that needs to be performed for existing functions. If some task has to be performed for each function, we can write a function which will perform the task without us having to make changes in each function. + +**Sample Code:** +```python +import time +def multiplication(a,b): + start=time.time() + c=a*b + total=time.time()-start + print("Time taken for execution of multiplication",total) + return c + +def addition(a,b): + start=time.time() + c=a+b + total=time.time()-start + print("Time taken for execution of addition ",total) + return c + +multiplication(4,5) +addition(4,5) +``` + +In the above code, we had to calculate time and print the execution time seperately for each function leading to repeatation of code. This is where decorators come in handy. +The same functionality can be achieved with the help of a decorator. + +**Here's how:** +```python +import time +def time_find(function): + def wrapper(*args, **kwargs): + starttime=time.time() + function(*args, **kwargs) + total=time.time()-starttime + print(f"Time Taken by {function.__name__} to run is ",total) + return wrapper + +@time_find #to use a decorator, simply use @ above a function. +def multiply(a, b): + print(a*b) + +@time_find +def addition(a,b): + print(a+b) + +multiply(4,5) +addition(4,5) +``` + +The above method eleminates redundant code and makes the code cleaner. You may have observed that we have used *args and **kwargs in the wrapper function. This is so that this decorator function is flexible for all types of functions and their parameters and this way it can find out the execution time of any function with as many parameters as needed, we just need to use our decorator @time_find. + + + + diff --git a/contrib/advanced-python/eval_function.md b/contrib/advanced-python/eval_function.md new file mode 100644 index 00000000..9e3ec54b --- /dev/null +++ b/contrib/advanced-python/eval_function.md @@ -0,0 +1,75 @@ +# Understanding the `eval` Function in Python +## Introduction + +The `eval` function in Python allows you to execute a string-based Python expression dynamically. This can be useful in various scenarios where you need to evaluate expressions that are not known until runtime. + +## Syntax +```python +eval(expression, globals=None, locals=None) +``` + +### Parameters: + +* expression: String is parsed and evaluated as a Python expression +* globals [optional]: Dictionary to specify the available global methods and variables. +* locals [optional]: Another dictionary to specify the available local methods and variables. + +## Examples +Example 1: +```python +result = eval('2 + 3 * 4') +print(result) # Output: 14 +``` +Example 2: + +```python +x = 10 +expression = 'x * 2' +result = eval(expression, {'x': x}) +print(result) # Output: 20 +``` +Example 3: +```python +x = 10 +def multiply(a, b): + return a * b +expression = 'multiply(x, 5) + 2' +result = eval(expression) +print("Result:",result) # Output: Result:52 +``` +Example 4: +```python +expression = input("Enter a Python expression: ") +result = eval(expression) +print("Result:", result) +#input= "3+2" +#Output: Result:5 +``` + +Example 5: +```python +import numpy as np +a=np.random.randint(1,9) +b=np.random.randint(1,9) +operations=["*","-","+"] +op=np.random.choice(operations) + +expression=str(a)+op+str(b) +correct_answer=eval(expression) +given_answer=int(input(str(a)+" "+op+" "+str(b)+" = ")) + +if given_answer==correct_answer: + print("Correct") +else: + print("Incorrect") + print("correct answer is :" ,correct_answer) + +#2 * 1 = 8 +#Incorrect +#correct answer is : 2 +#or +#3 * 2 = 6 +#Correct +``` +## Conclusion +The eval function is a powerful tool in Python that allows for dynamic evaluation of expressions. \ No newline at end of file diff --git a/contrib/advanced-python/exception-handling.md b/contrib/advanced-python/exception-handling.md new file mode 100644 index 00000000..3e0c6726 --- /dev/null +++ b/contrib/advanced-python/exception-handling.md @@ -0,0 +1,192 @@ +# Exception Handling in Python + +Exception Handling is a way of managing the errors that may occur during a program execution. Python's exception handling mechanism has been designed to avoid the unexpected termination of the program, and offer to either regain control after an error or display a meaningful message to the user. + +- **Error** - An error is a mistake or an incorrect result produced by a program. It can be a syntax error, a logical error, or a runtime error. Errors are typically fatal, meaning they prevent the program from continuing to execute. +- **Exception** - An exception is an event that occurs during the execution of a program that disrupts the normal flow of instructions. Exceptions are typically unexpected and can be handled by the program to prevent it from crashing or terminating abnormally. It can be runtime, input/output or system exceptions. Exceptions are designed to be handled by the program, allowing it to recover from the error and continue executing. + +## Python Built-in Exceptions + +There are plenty of built-in exceptions in Python that are raised when a corresponding error occur. +We can view all the built-in exceptions using the built-in `local()` function as follows: + +```python +print(dir(locals()['__builtins__'])) +``` + +|**S.No**|**Exception**|**Description**| +|---|---|---| +|1|SyntaxError|A syntax error occurs when the code we write violates the grammatical rules such as misspelled keywords, missing colon, mismatched parentheses etc.| +|2|TypeError|A type error occurs when we try to perform an operation or use a function with objects that are of incompatible data types.| +|3|NameError|A name error occurs when we try to use a variable, function, module or string without quotes that hasn't been defined or isn't used in a valid way.| +|4|IndexError|A index error occurs when we try to access an element in a sequence (like a list, tuple or string) using an index that's outside the valid range of indices for that sequence.| +|5|KeyError|A key error occurs when we try to access a key that doesn't exist in a dictionary. Attempting to retrieve a value using a non-existent key results this error.| +|6|ValueError|A value error occurs when we provide an argument or value that's inappropriate for a specific operation or function such as doing mathematical operations with incompatible types (e.g., dividing a string by an integer.)| +|7|AttributeError|An attribute error occurs when we try to access an attribute (like a variable or method) on an object that doesn't possess that attribute.| +|8|IOError|An IO (Input/Output) error occurs when an operation involving file or device interaction fails. It signifies that there's an issue during communication between your program and the external system.| +|9|ZeroDivisionError|A ZeroDivisionError occurs when we attempt to divide a number by zero. This operation is mathematically undefined, and Python raises this error to prevent nonsensical results.| +|10|ImportError|An import error occurs when we try to use a module or library that Python can't find or import succesfully.| + +## Try and Except Statement - Catching Exception + +The `try-except` statement allows us to anticipate potential errors during program execution and define what actions to take when those errors occur. This prevents the program from crashing unexpectedly and makes it more robust. + +Here's an example to explain this: + +```python +try: + # Code that might raise an exception + result = 10 / 0 +except: + print("An error occured!") +``` + +Output + +```markdown +An error occured! +``` + +In this example, the `try` block contains the code that you suspect might raise an exception. Python attempts to execute the code within this block. If an exception occurs, Python jumps to the `except` block and executes the code within it. + +## Specific Exception Handling + +You can specify the type of expection you want to catch using the `except` keyword followed by the exception class name. You can also have multiple `except` blocks to handle different exception types. + +Here's an example: + +```python +try: + # Code that might raise ZeroDivisionError or NameError + result = 10 / 0 + name = undefined_variable +except ZeroDivisionError: + print("Oops! You tried to divide by zero.") +except NameError: + print("There's a variable named 'undefined_variable' that hasn't been defined yet.") +``` + +Output + +```markdown +Oops! You tried to divide by zero. +``` + +If you comment on the line `result = 10 / 0`, then the output will be: + +```markdown +There's a variable named 'undefined_variable' that hasn't been defined yet. +``` + +## Important Note + +In this code, the `except` block are specific to each type of expection. If you want to catch both exceptions with a single `except` block, you can use of tuple of exceptions, like this: + +```python +try: + # Code that might raise ZeroDivisionError or NameError + result = 10 / 0 + name = undefined_variable +except (ZeroDivisionError, NameError): + print("An error occured!") +``` + +Output + +```markdown +An error occured! +``` + +## Try with Else Clause + +The `else` clause in a Python `try-except` block provides a way to execute code only when the `try` block succeeds without raising any exceptions. It's like having a section of code that runs exclusively under the condition that no errors occur during the main operation in the `try` block. + +Here's an example to understand this: + +```python +def calculate_average(numbers): + if len(numbers) == 0: # Handle empty list case seperately (optional) + return None + try: + total = sum(numbers) + average = total / len(numbers) + except ZeroDivisionError: + print("Cannot calculate average for a list containing zero.") + else: + print("The average is:", average) + return average #Optionally return the average here + +# Example usage +numbers = [10, 20, 30] +result = calculate_average(numbers) + +if result is not None: # Check if result is available (handles empty list case) + print("Calculation succesfull!") +``` + +Output + +```markdown +The average is: 20.0 +``` + +## Finally Keyword in Python + +The `finally` keyword in Python is used within `try-except` statements to execute a block of code **always**, regardless of whether an exception occurs in the `try` block or not. + +To understand this, let us take an example: + +```python +try: + a = 10 // 0 + print(a) +except ZeroDivisionError: + print("Cannot be divided by zero.") +finally: + print("Program executed!") +``` + +Output + +```markdown +Cannot be divided by zero. +Program executed! +``` + +## Raise Keyword in Python + +In Python, raising an exception allows you to signal that an error condition has occured during your program's execution. The `raise` keyword is used to explicity raise an exception. + +Let us take an example: + +```python +def divide(x, y): + if y == 0: + raise ZeroDivisionError("Can't divide by zero!") # Raise an exception with a message + result = x / y + return result + +try: + division_result = divide(10, 0) + print("Result:", division_result) +except ZeroDivisionError as e: + print("An error occured:", e) # Handle the exception and print the message +``` + +Output + +```markdown +An error occured: Can't divide by zero! +``` + +## Advantages of Exception Handling + +- **Improved Error Handling** - It allows you to gracefully handle unexpected situations that arise during program execution. Instead of crashing abruptly, you can define specific actions to take when exceptions occur, providing a smoother experience. +- **Code Robustness** - Exception Handling helps you to write more resilient programs by anticipating potential issues and providing approriate responses. +- **Enhanced Code Readability** - By seperating error handling logic from the core program flow, your code becomes more readable and easier to understand. The `try-except` blocks clearly indicate where potential errors might occur and how they'll be addressed. + +## Disadvantages of Exception Handling + +- **Hiding Logic Errors** - Relying solely on exception handling might mask underlying logic error in your code. It's essential to write clear and well-tested logic to minimize the need for excessive exception handling. +- **Performance Overhead** - In some cases, using `try-except` blocks can introduce a slight performance overhead compared to code without exception handling. Howerer, this is usually negligible for most applications. +- **Overuse of Exceptions** - Overusing exceptions for common errors or control flow can make code less readable and harder to maintain. It's important to use exceptions judiciously for unexpected situations. diff --git a/contrib/advanced-python/filter-function.md b/contrib/advanced-python/filter-function.md new file mode 100644 index 00000000..cbf9463e --- /dev/null +++ b/contrib/advanced-python/filter-function.md @@ -0,0 +1,86 @@ +# Filter Function + +## Definition +The filter function is a built-in Python function used for constructing an iterator from elements of an iterable for which a function returns true. + +**Syntax**: + ```python +filter(function, iterable) +``` +**Parameters**:
+*function*: A function that tests if each element of an iterable returns True or False.
+*iterable*: An iterable like sets, lists, tuples, etc., whose elements are to be filtered.
+*Returns* : An iterator that is already filtered. + +## Basic Usage +**Example 1: Filtering a List of Numbers**: +```python +# Define a function that returns True for even numbers +def is_even(n): + return n % 2 == 0 + +numbers = [1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] +even_numbers = filter(is_even, numbers) + +# Convert the filter object to a list +print(list(even_numbers)) # Output: [2, 4, 6, 8, 10] +``` + +**Example 2: Filtering with a Lambda Function**: +```python +numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] +odd_numbers = filter(lambda x: x % 2 != 0, numbers) + +print(list(odd_numbers)) # Output: [1, 3, 5, 7, 9] +``` + +**Example 3: Filtering Strings**: +```python +words = ["apple", "banana", "cherry", "date", "elderberry", "fig", "grape" , "python"] +long_words = filter(lambda word: len(word) > 5, words) + +print(list(long_words)) # Output: ['banana', 'cherry', 'elderberry', 'python'] +``` + +## Advanced Usage +**Example 4: Filtering Objects with Attributes**: +```python +class Person: + def __init__(self, name, age): + self.name = name + self.age = age + +people = [ + Person("Alice", 30), + Person("Bob", 15), + Person("Charlie", 25), + Person("David", 35) +] + +adults = filter(lambda person: person.age >= 18, people) +adult_names = map(lambda person: person.name, adults) + +print(list(adult_names)) # Output: ['Alice', 'Charlie', 'David'] +``` + +**Example 5: Using None as the Function**: +```python +numbers = [0, 1, 2, 3, 0, 4, 0, 5] +non_zero_numbers = filter(None, numbers) + +print(list(non_zero_numbers)) # Output: [1, 2, 3, 4, 5] +``` +**NOTE**: When None is passed as the function, filter removes all items that are false. + +## Time Complexity: +- The time complexity of filter() depends on two factors: + 1. The time complexity of the filtering function (the one you provide as an argument). + 2. The size of the iterable being filtered. +- If the filtering function has a constant time complexity (e.g., O(1)), the overall time complexity of filter() is linear (O(n)), where ‘n’ is the number of elements in the iterable. + +## Space Complexity: +- The space complexity of filter() is also influenced by the filtering function and the size of the iterable. +- Since filter() returns an iterator, it doesn’t create a new list in memory. Instead, it generates filtered elements on-the-fly as you iterate over it. Therefore, the space complexity is O(1). + +## Conclusion: +Python’s filter() allows you to perform filtering operations on iterables. This kind of operation consists of applying a Boolean function to the items in an iterable and keeping only those values for which the function returns a true result. In general, you can use filter() to process existing iterables and produce new iterables containing the values that you currently need.Both versions of Python support filter(), but Python 3’s approach is more memory-efficient due to the use of iterators. \ No newline at end of file diff --git a/contrib/advanced-python/generators.md b/contrib/advanced-python/generators.md new file mode 100644 index 00000000..96287efc --- /dev/null +++ b/contrib/advanced-python/generators.md @@ -0,0 +1,87 @@ +# Generators + +## Introduction + +Generators in Python are a sophisticated feature that enables the creation of iterators without the need to construct a full list in memory. They allow you to generate values on-the-fly, which is particularly beneficial for working with large datasets or infinite sequences. We will explore generators in depth, covering their types, mathematical formulation, advantages, disadvantages, and implementation examples. + +## Function Generators + +Function generators are created using the `yield` keyword within a function. When invoked, a function generator returns a generator iterator, allowing you to iterate over the values generated by the function. + +### Mathematical Formulation + +Function generators can be represented mathematically using set-builder notation. The general form is: + +``` +{expression | variable in iterable, condition} +``` + +Where: +- `expression` is the expression to generate values. +- `variable` is the variable used in the expression. +- `iterable` is the sequence of values to iterate over. +- `condition` is an optional condition that filters the values. + +### Advantages of Function Generators + +1. **Memory Efficiency**: Function generators produce values lazily, meaning they generate values only when needed, saving memory compared to constructing an entire sequence upfront. + +2. **Lazy Evaluation**: Values are generated on-the-fly as they are consumed, leading to improved performance and reduced overhead, especially when dealing with large datasets. + +3. **Infinite Sequences**: Function generators can represent infinite sequences, such as the Fibonacci sequence, allowing you to work with data streams of arbitrary length without consuming excessive memory. + +### Disadvantages of Function Generators + +1. **Single Iteration**: Once a function generator is exhausted, it cannot be reused. If you need to iterate over the sequence again, you'll have to create a new generator. + +2. **Limited Random Access**: Function generators do not support random access like lists. They only allow sequential access, which might be a limitation depending on the use case. + +### Implementation Example + +```python +def fibonacci(): + a, b = 0, 1 + while True: + yield a + a, b = b, a + b + +# Usage +fib_gen = fibonacci() +for _ in range(10): + print(next(fib_gen)) +``` + +## Generator Expressions + +Generator expressions are similar to list comprehensions but return a generator object instead of a list. They offer a concise way to create generators without the need for a separate function. + +### Mathematical Formulation + +Generator expressions can also be represented mathematically using set-builder notation. The general form is the same as for function generators. + +### Advantages of Generator Expressions + +1. **Memory Efficiency**: Generator expressions produce values lazily, similar to function generators, resulting in memory savings. + +2. **Lazy Evaluation**: Values are generated on-the-fly as they are consumed, providing improved performance and reduced overhead. + +### Disadvantages of Generator Expressions + +1. **Single Iteration**: Like function generators, once a generator expression is exhausted, it cannot be reused. + +2. **Limited Random Access**: Generator expressions, similar to function generators, do not support random access. + +### Implementation Example + +```python +# Generate squares of numbers from 0 to 9 +square_gen = (x**2 for x in range(10)) + +# Usage +for num in square_gen: + print(num) +``` + +## Conclusion + +Generators offer a powerful mechanism for creating iterators efficiently in Python. By understanding the differences between function generators and generator expressions, along with their mathematical formulation, advantages, and disadvantages, you can leverage them effectively in various scenarios. Whether you're dealing with large datasets or need to work with infinite sequences, generators provide a memory-efficient solution with lazy evaluation capabilities, contributing to more elegant and scalable code. diff --git a/contrib/advanced-python/index.md b/contrib/advanced-python/index.md new file mode 100644 index 00000000..81d1832e --- /dev/null +++ b/contrib/advanced-python/index.md @@ -0,0 +1,23 @@ +# List of sections + +- [OOPs](oops.md) +- [Decorators/\*args/**kwargs](decorator-kwargs-args.md) +- ['itertools' module](itertools.md) +- [Type Hinting](type-hinting.md) +- [Lambda Function](lambda-function.md) +- [Working with Dates & Times in Python](dates_and_times.md) +- [Regular Expressions in Python](regular_expressions.md) +- [JSON module](json-module.md) +- [Map Function](map-function.md) +- [Protocols](protocols.md) +- [Exception Handling in Python](exception-handling.md) +- [Generators](generators.md) +- [Match Case Statement](match-case.md) +- [Closures](closures.md) +- [Filter](filter-function.md) +- [Reduce](reduce-function.md) +- [List Comprehension](list-comprehension.md) +- [Eval Function](eval_function.md) +- [Magic Methods](magic-methods.md) +- [Asynchronous Context Managers & Generators](asynchronous-context-managers-generators.md) +- [Threading](threading.md) diff --git a/contrib/advanced-python/itertools.md b/contrib/advanced-python/itertools.md new file mode 100644 index 00000000..501a1274 --- /dev/null +++ b/contrib/advanced-python/itertools.md @@ -0,0 +1,144 @@ +# The 'itertools' Module in Python +The itertools module in Python provides a collection of fast, memory-efficient tools that are useful for creating and working with iterators. These functions +allow you to iterate over data in various ways, often combining, filtering, or extending iterators to generate complex sequences efficiently. + +## Benefits of itertools +1. Efficiency: Functions in itertools are designed to be memory-efficient, often generating elements on the fly and avoiding the need to store large intermediate results. +2. Conciseness: Using itertools can lead to more readable and concise code, reducing the need for complex loops and temporary variables. +3. Composability: Functions from itertools can be easily combined, allowing you to build complex iterator pipelines from simple building blocks. + +## Useful Functions in itertools
+Here are some of the most useful functions in the itertools module, along with examples of how to use them: + +1. 'count': Generates an infinite sequence of numbers, starting from a specified value. + +```bash +import itertools + +counter = itertools.count(start=10, step=2) +for _ in range(5): + print(next(counter)) +# Output: 10, 12, 14, 16, 18 +``` + +2. 'cycle': Cycles through an iterable indefinitely. + +```bash +import itertools + +cycler = itertools.cycle(['A', 'B', 'C']) +for _ in range(6): + print(next(cycler)) +# Output: A, B, C, A, B, C +``` + +3.'repeat': Repeats an object a specified number of times or indefinitely. + +```bash +import itertools + +repeater = itertools.repeat('Hello', 3) +for item in repeater: + print(item) +# Output: Hello, Hello, Hello +``` + +4. 'chain': Combines multiple iterables into a single iterable. + +```bash +import itertools + +combined = itertools.chain([1, 2, 3], ['a', 'b', 'c']) +for item in combined: + print(item) +# Output: 1, 2, 3, a, b, c +``` + +5. 'islice': Slices an iterator, similar to slicing a list. + +```bash +import itertools + +sliced = itertools.islice(range(10), 2, 8, 2) +for item in sliced: + print(item) +# Output: 2, 4, 6 +``` + +6. 'compress': Filters elements in an iterable based on a corresponding selector iterable. + +```bash +import itertools + +data = ['A', 'B', 'C', 'D'] +selectors = [1, 0, 1, 0] +result = itertools.compress(data, selectors) +for item in result: + print(item) +# Output: A, C +``` + +7. 'permutations': Generates all possible permutations of an iterable. + +```bash +import itertools + +perms = itertools.permutations('ABC', 2) +for item in perms: + print(item) +# Output: ('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B') +``` + +8. 'combinations': Generates all possible combinations of a specified length from an iterable. + +```bash +import itertools + +combs = itertools.combinations('ABC', 2) +for item in combs: + print(item) +# Output: ('A', 'B'), ('A', 'C'), ('B', 'C') +``` + +9. 'product': Computes the Cartesian product of input iterables. + +```bash +import itertools + +prod = itertools.product('AB', '12') +for item in prod: + print(item) +# Output: ('A', '1'), ('A', '2'), ('B', '1'), ('B', '2') +``` + +10. 'groupby': Groups elements of an iterable by a specified key function. + +```bash +import itertools + +data = [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 25}, {'name': 'Charlie', 'age': 30}] +sorted_data = sorted(data, key=lambda x: x['age']) +grouped = itertools.groupby(sorted_data, key=lambda x: x['age']) +for key, group in grouped: + print(key, list(group)) +# Output: +# 25 [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 25}] +# 30 [{'name': 'Charlie', 'age': 30}] +``` + +11. 'accumulate': Makes an iterator that returns accumulated sums, or accumulated results of other binary functions specified via the optional func argument. + +```bash +import itertools +import operator + +data = [1, 2, 3, 4, 5] +acc = itertools.accumulate(data, operator.mul) +for item in acc: + print(item) +# Output: 1, 2, 6, 24, 120 +``` + +## Conclusion +The itertools module is a powerful toolkit for working with iterators in Python. Its functions enable efficient and concise handling of iterable data, allowing you to create complex data processing pipelines with minimal memory overhead. +By leveraging itertools, you can improve the readability and performance of your code, making it a valuable addition to your Python programming arsenal. diff --git a/contrib/advanced-python/json-module.md b/contrib/advanced-python/json-module.md new file mode 100644 index 00000000..eed0393d --- /dev/null +++ b/contrib/advanced-python/json-module.md @@ -0,0 +1,289 @@ +# JSON Module + +## What is JSON? + +- [JSON]("https://www.json.org/json-en.html") (JavaScript Object Notation) is a format for structuring data. +- JSON is a lightweight, text-based data interchange format that is completely language-independent. +- Similar to XML, JSON is a format for structuring data commonly used by web applications to communicate with each other. + +## Why JSON? + +- Whenever we declare a variable and assign a value to it, the variable itself doesn't hold the value. Instead, the variable holds an address in memory where the value is stored. For example: + +```python +age = 21 +``` + +- When we use `age`, it gets replaced with `21`. However, *age doesn't contain 21, it contains the address of the memory location where 21 is stored*. + +- While this works locally, transferring this data, such as through an API, poses a challenge. Sending your computer’s entire memory with the addresses is impractical and insecure. This is where JSON comes to the rescue. + +### Example JSON + +- JSON supports most widely used data types including String + , Number, Boolean, Null, Array and Object. +- Here is an example of JSON file + +```json +{ + "name": "John Doe", + "age": 21, + "isStudent": true, + "address": null, + "courses": ["Math", "Science", "History"], + "grades": { + "Math": 95, + "Science": 89, + "History": 76 + } +} +``` + +# Python JSON + +Python too supports JSON with a built-in package called `json`. This package provides all the necessary tools for working with JSON Objects including `parsing, serializing, deserializing, and many more`. + +## 1. Python parse JSON string. + +- To parse JSON string Python firstly we import the JSON module. +- JSON string is converted to a Python object using `json.loads()` method of JSON module in Python. +- Example Code: + +```python +# Python program to convert JSON to Python +import json + +# JSON string +students ='{"id":"01", "name": "Yatharth", "department":"Computer Science Engineering"}' + +# Convert string to Python dict +students_dict = json.loads(students) +print(students_dict) + +print(students_dict['name']) + +``` + +- Ouput: + +```json +{"id": "01", "name": "Yatharth", "department": "Computer Science Engineering"} +``` + +## 2. Python load JSON file. + +- JSON data can also be directly fetch from a json file +- Example: + +```python +import json +# Opening JSON file +f = open('input.json',) + +# Returns JSON object as a dictionary +data = json.load(f) + +# Iterating through the json file +for i in data['students']: + print(i) + +# Closing file +f.close() +``` + +- JSON file + +```json +{ + "students":{ + { + "id": "01", + "name": "Yatharth", + "department": "Computer Science Engineering" + }, + { + "id": "02", + "name": "Raj", + "department": "Mechanical Engineering" + } + } +} +``` + +- Ouput + +```json +{"id": "01", "name": "Yatharth", "department": "Computer Science Engineering"} +{"id": "02", "name": "Raj", "department": "Mechanical Engineering"} +``` +- `json.load()`: Reads JSON data from a file object and deserializes it into a Python object. +- `json.loads()`: Deserializes JSON data from a string into a Python object. + + +## Addtiotnal Context +Relation between python data types and json data types is given in table below. + +| Python Object | JSON Object | +|-----------------|-------------| +| Dict | object | +| list, tuple | array | +| str | string | +| int, long, float | numbers | +| True | true | +| False | false | +| None | null | + + + +## 3. Python Dictionary to JSON String +- Parsing python dictionary to json string using `json.dumps()`. +- Example Code: +```python +import json + +# Data to be written +dictionary ={ + "id": "03", + "name": "Suraj", + "department": "Civil Engineering" +} + +# Serializing json +json_object = json.dumps(dictionary, indent = 4) +print(json_object) +``` +- Output: +``` json +{ + "department": "Civil Engineering", + "id": "02", + "name": "Suraj" +} +``` +## 4. Python Dictionary to JSON file. +- - Parsing python dictionary to json string using `json.dump()`. +- Example Code: +``` python +import json + +# Data to be written +dictionary ={ + "name" : "Satyendra", + "rollno" : 51, + "cgpa" : 8.8, + "phonenumber" : "123456789" +} + +with open("sample.json", "w") as outfile: + json.dump(dictionary, outfile) + +``` +- Ouput: `sample.json` +``` json +{ + "name" : "Satyendra", + "rollno" : 51, + "cgpa" : 8.8, + "phonenumber" : "123456789" +} + +``` +## 5. Append Python Dictionary to JSON String. +- Append to an already existing string using `json.update()`. +- Example : +```python +import json +# JSON data: +x = { + "id": "03", + "name": "Suraj" +} + +# python object to be appended +y = { "department": "Civil Engineering"} + +# parsing JSON string: +z = json.loads(x) + +# appending the data +z.update(y) + +# the result is a JSON string: +print(json.dumps(z)) + +``` +- Ouput: +```json +{"id": "03", "name": "Suraj", "department": "Civil Engineering"} +``` + + +## 6. Append Python Dictionary to JSON File. +- There is no direct function to append in file. So, we will load file in a dictionary, update dictionary then update content and convert back to json file format. +- `data.json` +``` json +{ + "students":{ + { + "id": "01", + "name": "Yatharth", + "department": "Computer Science Engineering" + }, + { + "id": "02", + "name": "Raj", + "department": "Mechanical Engineering" + } + } +} +``` +- Example Code: +``` python +import json + +# function to add to JSON +def write_json(new_data, filename='data.json'): + with open(filename,'r+') as file: + # First we load existing data into a dict. + file_data = json.load(file) + # Join new_data with file_data inside students + file_data["students"].append(new_data) + # Sets file's current position at offset. + file.seek(0) + # convert back to json. + json.dump(file_data, file, indent = 4) + +# python object to be appended +y = { + "id": "03", + "name": "Suraj", + "department": "Civil Engineering" +} + +write_json(y) + +``` +- Output: +```json +{ + "students":{ + { + "id": "01", + "name": "Yatharth", + "department": "Computer Science Engineering" + }, + { + "id": "02", + "name": "Raj", + "department": "Mechanical Engineering" + }, + { + "id": "03", + "name": "Suraj", + "department": "Civil Engineering" + } + } +} +``` + +The Python json module simplifies the handling of JSON data, offering a bridge between Python data structures and JSON representations, vital for data exchange and storage in modern applications. \ No newline at end of file diff --git a/contrib/advanced-python/lambda-function.md b/contrib/advanced-python/lambda-function.md new file mode 100644 index 00000000..93a53300 --- /dev/null +++ b/contrib/advanced-python/lambda-function.md @@ -0,0 +1,88 @@ +# Lambda Function + +Lambda functions in Python are small, anonymous functions that can be created on-the-fly. They are defined using the `lambda` keyword instead of the `def` keyword used for regular functions. Lambda functions are typically used for simple tasks where a full-blown function definition is not necessary. + +Here's an example of a lambda function that adds two numbers: + +```python +add = lambda x, y: x + y +print(add(3, 5)) # Output: 8 +``` + +The above lambda function is equivalent to the following regular function: + +```python +def add(x, y): + return x + y + +print(add(3, 5)) # Output: 8 +``` + +The difference between a regular function and a lambda function lies mainly in syntax and usage. Here are some key distinctions: + +1. **Syntax**: Lambda functions are defined using the `lambda` keyword, followed by parameters and a colon, while regular functions use the `def` keyword, followed by the function name, parameters, and a colon. + +2. **Name**: Lambda functions are anonymous; they do not have a name like regular functions. Regular functions are defined with a name. + +3. **Complexity**: Lambda functions are suitable for simple, one-liner tasks. They are not meant for complex operations or tasks that require multiple lines of code. Regular functions can handle more complex logic and can contain multiple statements and lines of code. + +4. **Usage**: Lambda functions are often used in situations where a function is needed as an argument to another function (e.g., sorting, filtering, mapping), or when you want to write concise code without defining a separate function. + +Lambda functions are used primarily for convenience and brevity in situations where a full function definition would be overkill or too cumbersome. They are handy for tasks that require a small, one-time function and can improve code readability when used judiciously. + +## Use Cases + +1. **Sorting**: Lambda functions are often used as key functions for sorting lists, dictionaries, or other data structures based on specific criteria. For example: + + ```python + students = [ + {"name": "Alice", "age": 20}, + {"name": "Bob", "age": 18}, + {"name": "Charlie", "age": 22} + ] + sorted_students = sorted(students, key=lambda x: x["age"]) + ``` + +2. **Filtering**: Lambda functions can be used with filter() to selectively include elements from a collection based on a condition. For instance: + + ```python + numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] + even_numbers = list(filter(lambda x: x % 2 == 0, numbers)) + ``` + +3. **Mapping**: Lambda functions are useful with map() to apply a transformation to each element of a collection. For example: + + ```python + numbers = [1, 2, 3, 4, 5] + squared_numbers = list(map(lambda x: x**2, numbers)) + ``` + +4. **Event Handling**: In GUI programming or event-driven systems, lambda functions can be used as event handlers to execute specific actions when an event occurs. For instance: + + ```python + button.clicked.connect(lambda: self.on_button_click(argument)) + ``` + +5. **Callback Functions**: Lambda functions can be passed as callback functions to other functions, especially when a simple operation needs to be performed in response to an event. For example: + + ```python + def process_data(data, callback): + # Process data + result = ... + # Execute callback function + callback(result) + + process_data(data, lambda x: print("Result:", x)) + ``` + +6. **Anonymous Functions in Higher-Order Functions**: Lambda functions are commonly used with higher-order functions such as reduce(), which applies a rolling computation to sequential pairs of values in a list. For example: + + ```python + from functools import reduce + numbers = [1, 2, 3, 4, 5] + sum_of_numbers = reduce(lambda x, y: x + y, numbers) + ``` + +These are just a few examples of how lambda functions can be applied in Python to simplify code and make it more expressive. They are particularly useful in situations where a small, one-time function is needed and defining a separate named function would be excessive. + +In conclusion, **lambda functions** in Python offer a concise and powerful way to handle simple tasks without the need for full function definitions. Their versatility, especially in scenarios like sorting, filtering, and event handling, makes them valuable tools for improving code readability and efficiency. By mastering lambda functions, you can enhance your Python programming skills and tackle various tasks with elegance and brevity. \ No newline at end of file diff --git a/contrib/advanced-python/list-comprehension.md b/contrib/advanced-python/list-comprehension.md new file mode 100644 index 00000000..d9ab589d --- /dev/null +++ b/contrib/advanced-python/list-comprehension.md @@ -0,0 +1,73 @@ +# List Comprehension + +Creating lists concisely and expressively is what list comprehension in Python does. You can generate lists from already existing iterables like lists, tuples or strings with a short form. +This boosts the readability of code and reduces necessity of using explicit looping constructs. + +## Syntax : + +### Basic syntax + +```python +new_list = [expression for item in iterable] +``` +- **new_list**: This is the name given to the list that will be created using the list comprehension. +- **expression**: This is the expression that defines how each element of the new list will be generated or transformed. +- **item**: This variable represents each individual element from the iterable. It takes on the value of each element in the iterable during each iteration. +- **iterable**: This is the sequence-like object over which the iteration will take place. It provides the elements that will be processed by the expression. + +This list comprehension syntax `[expression for item in iterable]` allows you to generate a new list by applying a specific expression to each element in an iterable. + +### Syntax including condition + +```python +new_list = [expression for item in iterable if condition] +``` +- **new_list**: This is the name given to the list that will be created using the list comprehension. +- **expression**: This is the expression that defines how each element of the new list will be generated or transformed. +- **item**: This variable represents each individual element from the iterable. It takes on the value of each element in the iterable during each iteration. +- **iterable**: This is the sequence-like object over which the iteration will take place. It provides the elements that will be processed by the expression. +- **if condition**: This is an optional part of the syntax. It allows for conditional filtering of elements from the iterable. Only items that satisfy the condition + will be included in the new list. + + +## Examples: + +1. Generating a list of squares of numbers from 1 to 5: + +```python +squares = [x ** 2 for x in range(1, 6)] +print(squares) +``` + +- **Output** : +```python +[1, 4, 9, 16, 25] +``` + +2. Filtering even numbers from a list: + +```python +nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] +even = [x for x in nums if x % 2 == 0] +print(even) +``` + +- **Output** : +```python +[2, 4, 6, 8, 10] +``` + +3. Flattening a list of lists: +```python +matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] +flat = [x for sublist in matrix for x in sublist] +print(flat) +``` + +- **Output** : +```python +[1, 2, 3, 4, 5, 6, 7, 8, 9] +``` + +List comprehension is a powerful feature in Python for creating lists based on existing iterables with a concise syntax. +By mastering list comprehension, developers can write cleaner, more expressive code and leverage Python's functional programming capabilities effectively. diff --git a/contrib/advanced-python/magic-methods.md b/contrib/advanced-python/magic-methods.md new file mode 100644 index 00000000..447e36b5 --- /dev/null +++ b/contrib/advanced-python/magic-methods.md @@ -0,0 +1,151 @@ +# Magic Methods + +Magic methods, also known as dunder (double underscore) methods, are special methods in Python that start and end with double underscores (`__`). +These methods allow you to define the behavior of objects for built-in operations and functions, enabling you to customize how your objects interact with the +language's syntax and built-in features. Magic methods make your custom classes integrate seamlessly with Python’s built-in data types and operations. + +**Commonly Used Magic Methods** + +1. **Initialization and Representation** + - `__init__(self, ...)`: Called when an instance of the class is created. Used for initializing the object's attributes. + - `__repr__(self)`: Returns a string representation of the object, useful for debugging and logging. + - `__str__(self)`: Returns a human-readable string representation of the object. + +**Example** : + + ```python + class Person: + def __init__(self, name, age): + self.name = name + self.age = age + + def __repr__(self): + return f"Person({self.name}, {self.age})" + + def __str__(self): + return f"{self.name}, {self.age} years old" + + p = Person("Alice", 30) + print(repr(p)) + print(str(p)) + ``` + +**Output** : +```python +Person("Alice",30) +Alice, 30 years old +``` + +2. **Arithmetic Operations** + - `__add__(self, other)`: Defines behavior for the `+` operator. + - `__sub__(self, other)`: Defines behavior for the `-` operator. + - `__mul__(self, other)`: Defines behavior for the `*` operator. + - `__truediv__(self, other)`: Defines behavior for the `/` operator. + + +**Example** : + + ```python + class Vector: + def __init__(self, x, y): + self.x = x + self.y = y + + def __add__(self, other): + return Vector(self.x + other.x, self.y + other.y) + + def __repr__(self): + return f"Vector({self.x}, {self.y})" + + v1 = Vector(2, 3) + v2 = Vector(1, 1) + v3 = v1 + v2 + print(v3) + ``` + +**Output** : + +```python +Vector(3, 4) +``` + +3. **Comparison Operations** + - `__eq__(self, other)`: Defines behavior for the `==` operator. + - `__lt__(self, other)`: Defines behavior for the `<` operator. + - `__le__(self, other)`: Defines behavior for the `<=` operator. + +**Example** : + + ```python + class Person: + def __init__(self, name, age): + self.name = name + self.age = age + + def __eq__(self, other): + return self.age == other.age + + def __lt__(self, other): + return self.age < other.age + + p1 = Person("Alice", 30) + p2 = Person("Bob", 25) + print(p1 == p2) + print(p1 < p2) + ``` + + **Output** : + + ```python + False + False + ``` + +5. **Container and Sequence Methods** + + - `__len__(self)`: Defines behavior for the `len()` function. + - `__getitem__(self, key)`: Defines behavior for indexing (`self[key]`). + - `__setitem__(self, key, value)`: Defines behavior for item assignment (`self[key] = value`). + - `__delitem__(self, key)`: Defines behavior for item deletion (`del self[key]`). + +**Example** : + + ```python + class CustomList: + def __init__(self, *args): + self.items = list(args) + + def __len__(self): + return len(self.items) + + def __getitem__(self, index): + return self.items[index] + + def __setitem__(self, index, value): + self.items[index] = value + + def __delitem__(self, index): + del self.items[index] + + def __repr__(self): + return f"CustomList({self.items})" + + cl = CustomList(1, 2, 3) + print(len(cl)) + print(cl[1]) + cl[1] = 5 + print(cl) + del cl[1] + print(cl) + ``` + +**Output** : +```python +3 +2 +CustomList([1, 5, 3]) +CustomList([1, 3]) +``` + +Magic methods provide powerful ways to customize the behavior of your objects and make them work seamlessly with Python's syntax and built-in functions. +Use them judiciously to enhance the functionality and readability of your classes. diff --git a/contrib/advanced-python/map-function.md b/contrib/advanced-python/map-function.md new file mode 100644 index 00000000..be035d0c --- /dev/null +++ b/contrib/advanced-python/map-function.md @@ -0,0 +1,54 @@ +The `map()` function in Python is a built-in function used for applying a given function to each item of an iterable (like a list, tuple, or dictionary) and returning a new iterable with the results. It's a powerful tool for transforming data without the need for explicit loops. Let's break down its syntax, explore examples, and discuss various use cases. + +### Syntax: + +```python +map(function, iterable1, iterable2, ...) +``` + +- `function`: The function to apply to each item in the iterables. +- `iterable1`, `iterable2`, ...: One or more iterable objects whose items will be passed as arguments to `function`. + +### Examples: + +#### Example 1: Doubling the values in a list + +```python +# Define the function +def double(x): + return x * 2 + +# Apply the function to each item in the list using map +original_list = [1, 2, 3, 4, 5] +doubled_list = list(map(double, original_list)) +print(doubled_list) # Output: [2, 4, 6, 8, 10] +``` + +#### Example 2: Converting temperatures from Celsius to Fahrenheit + +```python +# Define the function +def celsius_to_fahrenheit(celsius): + return (celsius * 9/5) + 32 + +# Apply the function to each Celsius temperature using map +celsius_temperatures = [0, 10, 20, 30, 40] +fahrenheit_temperatures = list(map(celsius_to_fahrenheit, celsius_temperatures)) +print(fahrenheit_temperatures) # Output: [32.0, 50.0, 68.0, 86.0, 104.0] +``` + +### Use Cases: + +1. **Data Transformation**: When you need to apply a function to each item of a collection and obtain the transformed values, `map()` is very handy. + +2. **Parallel Processing**: In some cases, `map()` can be utilized in parallel processing scenarios, especially when combined with `multiprocessing` or `concurrent.futures`. + +3. **Cleaning and Formatting Data**: It's often used in data processing pipelines for tasks like converting data types, normalizing values, or applying formatting functions. + +4. **Functional Programming**: In functional programming paradigms, `map()` is frequently used along with other functional constructs like `filter()` and `reduce()` for concise and expressive code. + +5. **Generating Multiple Outputs**: You can use `map()` to generate multiple outputs simultaneously by passing multiple iterables. The function will be applied to corresponding items in the iterables. + +6. **Lazy Evaluation**: In Python 3, `map()` returns an iterator rather than a list. This means it's memory efficient and can handle large datasets without loading everything into memory at once. + +Remember, while `map()` is powerful, it's essential to balance its use with readability and clarity. Sometimes, a simple loop might be more understandable than a `map()` call. \ No newline at end of file diff --git a/contrib/advanced-python/match-case.md b/contrib/advanced-python/match-case.md new file mode 100644 index 00000000..1b4f0171 --- /dev/null +++ b/contrib/advanced-python/match-case.md @@ -0,0 +1,251 @@ +# Match Case Statements +## Introduction +Match and case statements are introduced in Python 3.10 for structural pattern matching of patterns with associated actions. It offers more readible and +cleaniness to the code as opposed to the traditional `if-else` statements. They also have destructuring, pattern matching and checks for specific properties in +addition to the traditional `switch-case` statements in other languages, which makes them more versatile. + +## Syntax +``` +match : + case : + + case : + + case _: + +``` +A match statement takes a statement which compares it to the various cases and their patterns. If any of the pattern is matched successively, the task is performed accordingly. If an exact match is not confirmed, the last case, a wildcard `_`, if provided, will be used as the matching case. + +## Pattern Matching +As discussed earlier, match case statements use pattern matching where the patterns consist of sequences, mappings, primitive data types as well as class instances. The structural pattern matching uses declarative approach and it nexplicitly states the conditions for the patterns to match with the data. + +### Patterns with a Literal +#### Generic Case +`sample text` is passed as a literal in the `match` block. There are two cases and a wildcard case mentioned. +```python +match 'sample text': + case 'sample text': + print('sample text') + case 'sample': + print('sample') + case _: + print('None found') +``` +The `sample text` case is satisfied as it matches with the literal `sample text` described in the `match` block. + +O/P: +``` +sample text +``` + +#### Using OR +Taking another example, `|` can be used as OR to include multiple patterns in a single case statement where the multiple patterns all lead to a similar task. + +The below code snippets can be used interchangebly and generate the similar output. The latter is more consive and readible. +```python +match 'e': + case 'a': + print('vowel') + case 'e': + print('vowel') + case 'i': + print('vowel') + case 'o': + print('vowel') + case 'u': + print('vowel') + case _: + print('consonant') +``` +```python +match 'e': + case 'a' | 'e' | 'i' | 'o' | 'u': + print('vowel') + case _: + print('consonant') +``` +O/P: +``` +vowel +``` + +#### Without wildcard +When in a `match` block, there is no wildcard case present there are be two cases of match being present or not. If the match doesn't exist, the behaviour is a no-op. +```python +match 'c': + case 'a' | 'e' | 'i' | 'o' | 'u': + print('vowel') +``` +The output will be blank as a no-op occurs. + +### Patterns with a Literal and a Variable +Pattern matching can be done by unpacking the assignments and also bind variables with it. +```python +def get_names(names: str) -> None: + match names: + case ('Bob', y): + print(f'Hello {y}') + case (x, 'John'): + print(f'Hello {x}') + case (x, y): + print(f'Hello {x} and {y}') + case _: + print('Invalid') +``` +Here, the `names` is a tuple that contains two names. The `match` block unpacks the tuple and binds `x` and `y` based on the patterns. A wildcard case prints `Invalid` if the condition is not satisfied. + +O/P: + +In this example, the above code snippet with the parameter `names` as below and the respective output. +``` +>>> get_names(('Bob', 'Max')) +Hello Max + +>>> get_names(('Rob', 'John')) +Hello Rob + +>>> get_names(('Rob', 'Max')) +Hello Rob and Max + +>>> get_names(('Rob', 'Max', 'Bob')) +Invalid +``` + +### Patterns with Classes +Class structures can be used in `match` block for pattern matching. The class members can also be binded with a variable to perform certain operations. For the class structure: +```python +class Person: + def __init__(self, name, age): + self.name = name + self.age = age +``` +The match case example illustrates the generic working as well as the binding of variables with the class members. +```python +def get_class(cls: Person) -> None: + match cls: + case Person(name='Bob', age=18): + print('Hello Bob with age 18') + case Person(name='Max', age=y): + print(f'Age is {y}') + case Person(name=x, age=18): + print(f'Name is {x}') + case Person(name=x, age=y): + print(f'Name and age is {x} and {y}') + case _: + print('Invalid') +``` +O/P: +``` +>>> get_class(Person('Bob', 18)) +Hello Bob with age 18 + +>>> get_class(Person('Max', 21)) +Age is 21 + +>>> get_class(Person('Rob', 18)) +Name is Rob + +>>> get_class(Person('Rob', 21)) +Name and age is Rob and 21 +``` +Now, if a new class is introduced in the above code snippet like below. +```python +class Pet: + def __init__(self, name, animal): + self.name = name + self.animal = animal +``` +The patterns will not match the cases and will trigger the wildcard case for the original code snippet above with `get_class` function. +``` +>>> get_class(Pet('Tommy', 'Dog')) +Invalid +``` + +### Nested Patterns +The patterns can be nested via various means. It can include the mix of the patterns mentioned earlier or can be symmetrical across. A basic of the nested pattern of a list with Patterns with a Literal and Variable is taken. Classes and Iterables can laso be included. +```python +def get_points(points: list) -> None: + match points: + case []: + print('Empty') + case [x]: + print(f'One point {x}') + case [x, y]: + print(f'Two points {x} and {y}') + case _: + print('More than two points') +``` +O/P: +``` +>>> get_points([]) +Empty + +>>> get_points([1]) +One point 1 + +>>> get_points([1, 2]) +Two points 1 and 2 + +>>> get_points([1, 2, 3]) +More than two points +``` + +### Complex Patterns +Complex patterns are also supported in the pattern matching sequence. The complex does not mean complex numbers but rather the structure which makes the readibility to seem complex. + +#### Wildcard +The wildcard used till now are in the form of `case _` where the wildcard case is used if no match is found. Furthermore, the wildcard `_` can also be used as a placeholder in complex patterns. + +```python +def wildcard(value: tuple) -> None: + match value: + case ('Bob', age, 'Mechanic'): + print(f'Bob is mechanic of age {age}') + case ('Bob', age, _): + print(f'Bob is not a mechanic of age {age}') +``` +O/P: + +The value in the above snippet is a tuple with `(Name, Age, Job)`. If the job is Mechanic and the name is Bob, the first case is triggered. But if the job is different and not a mechanic, then the other case is triggered with the wildcard. +``` +>>> wildcard(('Bob', 18, 'Mechanic')) +Bob is mechanic of age 18 + +>>> wildcard(('Bob', 21, 'Engineer')) +Bob is not a mechanic of age 21 +``` + +#### Guard +A `guard` is when an `if` is added to a pattern. The evaluation depends on the truth value of the guard. + +`nums` is the tuple which contains two integers. A guard is the first case where it checks whether the first number is greater or equal to the second number in the tuple. If it is false, then it moves to the second case, where it concludes that the first number is smaller than the second number. +```python +def guard(nums: tuple) -> None: + match nums: + case (x, y) if x >= y: + print(f'{x} is greater or equal than {y}') + case (x, y): + print(f'{x} is smaller than {y}') + case _: + print('Invalid') +``` +O/P: +``` +>>> guard((1, 2)) +1 is smaller than 2 + +>>> guard((2, 1)) +2 is greater or equal than 1 + +>>> guard((1, 1)) +1 is greater or equal than 1 +``` + +## Summary +The match case statements provide an elegant and readible format to perform operations on pattern matching as compared to `if-else` statements. They are also more versatile as they provide additional functionalities on the pattern matching operations like unpacking, class matching, iterables and iterators. It can also use positional arguments for checking the patterns. They provide a powerful and concise way to handle multiple conditions and perform pattern matching + +## Further Reading +This article provides a brief introduction to the match case statements and the overview on the pattern matching operations. To know more, the below articles can be used for in-depth understanding of the topic. + +- [PEP 634 – Structural Pattern Matching: Specification](https://peps.python.org/pep-0634/) +- [PEP 636 – Structural Pattern Matching: Tutorial](https://peps.python.org/pep-0636/) diff --git a/contrib/advanced-python/oops.md b/contrib/advanced-python/oops.md new file mode 100644 index 00000000..0dbd855d --- /dev/null +++ b/contrib/advanced-python/oops.md @@ -0,0 +1,379 @@ + +In Python object-oriented Programming (OOPs) is a programming paradigm +that uses objects and classes in programming. It aims to implement +real-world entities like inheritance, polymorphisms, encapsulation, etc. +in the programming. The main concept of object-oriented Programming +(OOPs) or oops concepts in Python is to bind the data and the functions +that work together as a single unit so that no other part of the code +can access this data. + +**OOPs Concepts in Python** + +1. Class in Python + +2. Objects in Python + +3. Polymorphism in Python + +4. Encapsulation in Python + +5. Inheritance in Python + +6. Data Abstraction in Python + + + +Python Class A class is a collection of objects. A class contains the +blueprints or the prototype from which the objects are being created. It +is a logical entity that contains some attributes and methods. + + + + +```python +#Simple Class in Python +class Dog: + pass +``` + + + + +**Python Objects** In object oriented programming Python, The object is +an entity that has a state and behavior associated with it. It may be +any real-world object like a mouse, keyboard, chair, table, pen, etc. +Integers, strings, floating-point numbers, even arrays, and +dictionaries, are all objects. + + +```python +obj = Dog() +``` +This creates an instance for class Dog + +**The Python **init** Method** + +The **init** method is similar to constructors in C++ and Java. It is +run as soon as an object of a class is instantiated. The method is +useful to do any initialization you want to do with your object. + + +```python +class Dog: + + # class attribute + attr1 = "mammal" + + # Instance attribute + def __init__(self, name): + self.name = name + +# Object instantiation +Rodger = Dog("Rodger") +Tommy = Dog("Tommy") + +# Accessing class attributes +print("Rodger is a {}".format(Rodger.__class__.attr1)) +print("Tommy is also a {}".format(Tommy.__class__.attr1)) + +# Accessing instance attributes +print("My name is {}".format(Rodger.name)) +print("My name is {}".format(Tommy.name)) +``` +In the above mentioned code, init method is used to initialize the name. + +**Inheritance** + +In Python object oriented Programming, Inheritance is the capability of +one class to derive or inherit the properties from another class. The +class that derives properties is called the derived class or child class +and the class from which the properties are being derived is called the +base class or parent class. + +Types of Inheritances: + +- Single Inheritance + +- Multilevel Inheritance + +- Multiple Inheritance + +- Hierarchial Inheritance + +```python +#Single Inheritance +# Parent class +class Animal: + def __init__(self, name, sound): + self.name = name + self.sound = sound + + def make_sound(self): + print(f"{self.name} says {self.sound}") + +# Child class inheriting from Animal +class Dog(Animal): + def __init__(self, name): + # Call the constructor of the parent class + super().__init__(name, "Woof") + +# Child class inheriting from Animal +class Cat(Animal): + def __init__(self, name): + # Call the constructor of the parent class + super().__init__(name, "Meow") + +# Creating objects of the derived classes +dog = Dog("Buddy") +cat = Cat("Whiskers") + +# Accessing methods of the parent class +dog.make_sound() +cat.make_sound() +``` +The above code depicts the Single Inheritance, in case of single inheritance there's only a single base class and a derived class. Here, Dog and Cat are the derived classes with Animal as the parent class. They can access the methods of the base class or derive their own methods. + + + +```python +#Multilevel Inheritance +# Parent class +class Animal: + def __init__(self, name): + self.name = name + + def speak(self): + print(f"{self.name} speaks") + +# Child class inheriting from Animal +class Dog(Animal): + def bark(self): + print(f"{self.name} barks") + +# Grandchild class inheriting from Dog +class GermanShepherd(Dog): + def guard(self): + print(f"{self.name} guards") + +# Creating objects of the derived classes +german_shepherd = GermanShepherd("Rocky") + +# Accessing methods from all levels of inheritance +german_shepherd.speak() # Accessing method from the Animal class +german_shepherd.bark() # Accessing method from the Dog class +german_shepherd.guard() # Accessing method from the GermanShepherd class +``` +Multilevel inheritance is a concept in object-oriented programming where a class inherits properties and behaviors from another class, which itself may inherit from another class. In other words, it involves a chain of inheritance where a subclass inherits from a superclass, and that subclass can then become a superclass for another subclass.Its similar to GrandFather ,Father and Son .In the above code,Animal class is the superclass, Dog is derived from Animal and Dog is the parent of GermanShepherd. GermenShepherd is the child class of Dog. GermenShepherd can access methods of both Animal and Dog. + + +```python +#Hierarchial Inheritance +# Parent class +class Animal: + def __init__(self, name): + self.name = name + + def speak(self): + print(f"{self.name} speaks") + +# Child class 1 inheriting from Animal +class Dog(Animal): + def bark(self): + print(f"{self.name} barks") + +# Child class 2 inheriting from Animal +class Cat(Animal): + def meow(self): + print(f"{self.name} meows") + +# Creating objects of the derived classes +dog = Dog("Buddy") +cat = Cat("Whiskers") + +# Accessing methods from the parent and child classes +dog.speak() # Accessing method from the Animal class +dog.bark() # Accessing method from the Dog class +cat.speak() # Accessing method from the Animal class +cat.meow() # Accessing method from the Cat class +``` +Hierarchical inheritance is a type of inheritance in object-oriented programming where one class serves as a superclass for multiple subclasses. In this inheritance model, each subclass inherits properties and behaviors from the same superclass, creating a hierarchical tree-like structure. + +```python +#Multiple Inheritance +# Parent class 1 +class Herbivore: + def eat_plants(self): + print("Eating plants") + +# Parent class 2 +class Carnivore: + def eat_meat(self): + print("Eating meat") + +# Child class inheriting from both Herbivore and Carnivore +class Omnivore(Herbivore, Carnivore): + def eat(self): + print("Eating everything") + +# Creating an object of the Omnivore class +omnivore = Omnivore() + +# Accessing methods from both parent classes +omnivore.eat_plants() # Accessing method from Herbivore +omnivore.eat_meat() # Accessing method from Carnivore +omnivore.eat() # Accessing method from Omnivore +``` +Multiple inheritance is a concept in object-oriented programming where a class can inherit properties and behaviors from more than one parent class. This means that a subclass can have multiple immediate parent classes, allowing it to inherit features from each of them. + +**Polymorphism** In object oriented Programming Python, Polymorphism +simply means having many forms + +```python +class Bird: + + def intro(self): + print("There are many types of birds.") + + def flight(self): + print("Most of the birds can fly but some cannot.") + +class sparrow(Bird): + + def flight(self): + print("Sparrows can fly.") + +class ostrich(Bird): + + def flight(self): + print("Ostriches cannot fly.") + +obj_bird = Bird() +obj_spr = sparrow() +obj_ost = ostrich() + +obj_bird.intro() +obj_bird.flight() + +obj_spr.intro() +obj_spr.flight() + +obj_ost.intro() +obj_ost.flight() +``` +Poly stands for 'many' and morphism for 'forms'. In the above code, method flight() has many forms. + +**Python Encapsulation** + +In Python object oriented programming, Encapsulation is one of the +fundamental concepts in object-oriented programming (OOP). It describes +the idea of wrapping data and the methods that work on data within one +unit. This puts restrictions on accessing variables and methods directly +and can prevent the accidental modification of data. To prevent +accidental change, an object's variable can only be changed by an +object's method. Those types of variables are known as private +variables. + + +```python +class Car: + def __init__(self, make, model, year): + self._make = make # Encapsulated attribute with single underscore + self._model = model # Encapsulated attribute with single underscore + self._year = year # Encapsulated attribute with single underscore + self._odometer_reading = 0 # Encapsulated attribute with single underscore + + def get_make(self): + return self._make + + def get_model(self): + return self._model + + def get_year(self): + return self._year + + def get_odometer_reading(self): + return self._odometer_reading + + def update_odometer(self, mileage): + if mileage >= self._odometer_reading: + self._odometer_reading = mileage + else: + print("You can't roll back an odometer!") + + def increment_odometer(self, miles): + self._odometer_reading += miles + +# Creating an instance of the Car class +my_car = Car("Toyota", "Camry", 2021) + +# Accessing encapsulated attributes through methods +print("Make:", my_car.get_make()) +print("Model:", my_car.get_model()) +print("Year:", my_car.get_year()) + +# Modifying encapsulated attribute through method +my_car.update_odometer(100) +print("Odometer Reading:", my_car.get_odometer_reading()) + +# Incrementing odometer reading +my_car.increment_odometer(50) +print("Odometer Reading after increment:", my_car.get_odometer_reading()) +``` + + +**Data Abstraction** It hides unnecessary code details from the user. +Also, when we do not want to give out sensitive parts of our code +implementation and this is where data abstraction came. + +```python +from abc import ABC, abstractmethod + +# Abstract class defining the interface for a Shape +class Shape(ABC): + def __init__(self, name): + self.name = name + + @abstractmethod + def area(self): + pass + + @abstractmethod + def perimeter(self): + pass + +# Concrete class implementing the Shape interface for a Rectangle +class Rectangle(Shape): + def __init__(self, name, length, width): + super().__init__(name) + self.length = length + self.width = width + + def area(self): + return self.length * self.width + + def perimeter(self): + return 2 * (self.length + self.width) + +# Concrete class implementing the Shape interface for a Circle +class Circle(Shape): + def __init__(self, name, radius): + super().__init__(name) + self.radius = radius + + def area(self): + return 3.14 * self.radius * self.radius + + def perimeter(self): + return 2 * 3.14 * self.radius + +# Creating objects of the derived classes +rectangle = Rectangle("Rectangle", 5, 4) +circle = Circle("Circle", 3) + +# Accessing methods defined by the Shape interface +print(f"{rectangle.name}: Area = {rectangle.area()}, Perimeter = {rectangle.perimeter()}") +print(f"{circle.name}: Area = {circle.area()}, Perimeter = {circle.perimeter()}") +``` +To implement Data Abstraction , we have to import abc . ABC stands for Abstract Base Class . All those classes which want to implement data abstraction have to inherit from ABC. +@abstractmethod is a decorator provided by the abc module, which stands for "abstract method". It's used to define abstract methods within abstract base classes (ABCs). An abstract method is a method declared in a class, but it does not contain an implementation. Instead, it serves as a placeholder, and its concrete implementation must be provided by subclasses. +Abstract methods can be implemented by the derived classes. diff --git a/contrib/advanced-python/protocols.md b/contrib/advanced-python/protocols.md new file mode 100644 index 00000000..9b5e74a3 --- /dev/null +++ b/contrib/advanced-python/protocols.md @@ -0,0 +1,243 @@ +# Protocols in Python +Python can establish informal interfaces using protocols In order to improve code structure, reusability, and type checking. Protocols allow for progressive adoption and are more flexible than standard interfaces in other programming languages like JAVA, which are tight contracts that specify the methods and attributes a class must implement. + +>Before going into depth of this topic let's understand another topic which is pre-requisite od this topic \#TypingModule + +## Typing Module +This is a module in python which provides +1. Provides classes, functions, and type aliases. +2. Allows adding type annotations to our code. +3. Enhances code readability. +4. Helps in catching errors early. + +### Type Hints in Python: +Type hints allow you to specify the expected data types of variables, function parameters, and return values. This can improve code readability and help with debugging. + +Here is a simple function that adds two numbers: +```python +def add(a,b): + return a + b +add(10,20) +``` +>Output: 30 + +While this works fine, adding type hints makes the code more understandable and serves as documentation: + +```python +def add(a:int, b:int)->int: + return a + b +print(add(1,10)) +``` +>Output: 11 + +In this version, `a` and `b` are expected to be integers, and the function is expected to return an integer. This makes the function's purpose and usage clearer. + +#### let's see another example + +The function given below takes an iterable (it can be any off list, tuple, dict, set, frozeset, String... etc) and print it's content in a single line along with it's type. + +```python +from typing import Iterable +# type alias + +def print_all(l: Iterable)->None: + print(type(l),end=' ') + for i in l: + print(i,end=' ') + print() + +l = [1,2,3,4,5] # type: List[int] +s = {1,2,3,4,5} # type: Set[int] +t = (1,2,3,4,5) # type: Tuple[int] + +for iter_obj in [l,s,t]: + print_all(iter_obj) + +``` +Output: +> 1 2 3 4 5 +> 1 2 3 4 5 +> 1 2 3 4 5 + +and now lets try calling the function `print_all` using a non-iterable object `int` as argument. + +```python +a = 10 +print_all(a) # This will raise an error +``` +Output: +>TypeError: 'int' object is not iterable + +This error occurs because `a` is an `integer`, and the `integer` class does not have any methods or attributes that make it work like an iterable. In other words, the integer class does not conform to the `Iterable` protocol. + +**Benefits of Type Hints** +Using type hints helps in several ways: + +1. **Error Detection**: Tools like mypy can catch type-related problems during development, decreasing runtime errors. +2. **Code Readability**: Type hints serve as documentation, making it easy to comprehend what data types are anticipated and returned. +3. **Improved Maintenance**: With unambiguous type expectations, maintaining and updating code becomes easier, especially in huge codebases. + +Now that we have understood about type hints and typing module let's dive deep into protocols. + +## Understanding Protocols + +In Python, protocols define interfaces similar to Java interfaces. They let you specify methods and attributes that an object must implement without requiring inheritance from a base class. Protocols are part of the `typing` module and provide a way to enforce certain structures in your classes, enhancing type safety and code clarity. + +### What is a Protocol? + +A protocol specifies one or more method signatures that a class must implement to be considered as conforming to the protocol. + This concept is often referred to as "structural subtyping" or "duck typing," meaning that if an object implements the required methods and attributes, it can be treated as an instance of the protocol. + +Let's write our own protocol: + +```python +from typing import Protocol + +# Define a Printable protocol +class Printable(Protocol): + def print(self) -> None: + """Print the object""" + pass + +# Book class implements the Printable protocol +class Book: + def __init__(self, title: str): + self.title = title + + def print(self) -> None: + print(f"Book Title: {self.title}") + +# print_object function takes a Printable object and calls its print method +def print_object(obj: Printable) -> None: + obj.print() + +book = Book("Python Programming") +print_object(book) +``` +Output: +> Book Title: Python Programming + +In this example: + +1. **Printable Protocol:** Defines an interface with a single method print. +2. **Book Class:** Implements the Printable protocol by providing a print method. +3. **print_object Function:** Accepts any object that conforms to the Printable protocol and calls its print method. + +we got our output because the class `Book` confirms to the protocols `printable`. +similarly When you pass an object to `print_object` that does not conform to the Printable protocol, an error will occur. This is because the object does not implement the required `print` method. +Let's see an example: +```python +class Team: + def huddle(self) -> None: + print("Team Huddle") + +c = Team() +print_object(c) # This will raise an error +``` +Output: +>AttributeError: 'Team' object has no attribute 'print' + +In this case: +- The `Team` class has a `huddle` method but does not have a `print` method. +- When `print_object` tries to call the `print` method on a `Team` instance, it raises an `AttributeError`. + +> This is an important aspect of using protocols: they ensure that objects provide the necessary methods, leading to more predictable and reliable code. + +**Ensuring Protocol Conformance** +To avoid such errors, you need to ensure that any object passed to `print_object` implements the `Printable` protocol. Here's how you can modify the `Team` class to conform to the protocol: +```python +class Team: + def __init__(self, name: str): + self.name = name + + def huddle(self) -> None: + print("Team Huddle") + + def print(self) -> None: + print(f"Team Name: {self.name}") + +c = Team("Dream Team") +print_object(c) +``` +Output: +>Team Name: Dream Team + +The `Team` class now implements the `print` method, conforming to the `Printable` protocol. and hence, no longer raises an error. + +### Protocols and Inheritance: +Protocols can also be used in combination with inheritance to create more complex interfaces. +we can do that by following these steps: +**Step 1 - Base protocol**: Define a base protocol that specifies a common set of methods and attributes. +**Step 2 - Derived Protocols**: Create derives protocols that extends the base protocol with addition requirements +**Step 3 - Polymorphism**: Objects can then conform to multiple protocols, allowing for Polymorphic behavior. + +Let's see an example on this as well: + +```python +from typing import Protocol + +# Base Protocols +class Printable(Protocol): + def print(self) -> None: + """Print the object""" + pass + +# Base Protocols-2 +class Serializable(Protocol): + def serialize(self) -> str: + pass + +# Derived Protocol +class PrintableAndSerializable(Printable, Serializable): + pass + +# class with implementation of both Printable and Serializable +class Book_serialize: + def __init__(self, title: str): + self.title = title + + def print(self) -> None: + print(f"Book Title: {self.title}") + + def serialize(self) -> None: + print(f"serialize: {self.title}") + +# function accepts the object which implements PrintableAndSerializable +def test(obj: PrintableAndSerializable): + obj.print() + obj.serialize() + +book = Book_serialize("lean-in") +test(book) +``` +Output: +> Book Title: lean-in +serialize: lean-in + +In this example: + +**Printable Protocol:** Specifies a `print` method. +**Serializable Protocol:** Specifies a `serialize` method. +**PrintableAndSerializable Protocol:** Combines both `Printable` and `Serializable`. +**Book Class**: Implements both `print` and `serialize` methods, conforming to `PrintableAndSerializable`. +**test Function:** Accepts any object that implements the `PrintableAndSerializable` protocol. + +If you try to pass an object that does not conform to the `PrintableAndSerializable` protocol to the test function, it will raise an `error`. Let's see an example: + +```python +class Team: + def huddle(self) -> None: + print("Team Huddle") + +c = Team() +test(c) # This will raise an error +``` +output: +> AttributeError: 'Team' object has no attribute 'print' + +In this case: +The `Team` class has a `huddle` method but does not implement `print` or `serialize` methods. +When test tries to call `print` and `serialize` on a `Team` instance, it raises an `AttributeError`. + +**In Conclusion:** +>Python protocols offer a versatile and powerful means of defining interfaces, encouraging the decoupling of code, improving readability, and facilitating static type checking. They are particularly handy for scenarios involving file-like objects, bespoke containers, and any case where you wish to enforce certain behaviors without requiring inheritance from a specific base class. Ensuring that classes conform to protocols reduces runtime problems and makes your code more robust and maintainable. \ No newline at end of file diff --git a/contrib/advanced-python/reduce-function.md b/contrib/advanced-python/reduce-function.md new file mode 100644 index 00000000..5c0c81b4 --- /dev/null +++ b/contrib/advanced-python/reduce-function.md @@ -0,0 +1,72 @@ +# Reduce Function + +## Definition: +The reduce() function is part of the functools module and is used to apply a binary function (a function that takes two arguments) cumulatively to the items of an iterable (e.g., a list, tuple, or string). It reduces the iterable to a single value by successively combining elements. + +**Syntax**: +```python +from functools import reduce +reduce(function, iterable, initial=None) +``` +**Parameters**:
+*function* : The binary function to apply. It takes two arguments and returns a single value.
+*iterable* : The sequence of elements to process.
+*initial (optional)*: An initial value. If provided, the function is applied to the initial value and the first element of the iterable. Otherwise, the first two elements are used as the initial values. + +## Working: +- Intially , first two elements of iterable are picked and the result is obtained. +- Next step is to apply the same function to the previously attained result and the number just succeeding the second element and the result is again stored. +- This process continues till no more elements are left in the container. +- The final returned result is returned and printed on console. + +## Examples: + +**Example 1:** +```python +numbers = [1, 2, 3, 4, 10] +total = reduce(lambda x, y: x + y, numbers) +print(total) # Output: 20 +``` +**Example 2:** +```python +numbers = [11, 7, 8, 20, 1] +max_value = reduce(lambda x, y: x if x > y else y, numbers) +print(max_value) # Output: 20 +``` +**Example 3:** +```python +# Importing reduce function from functools +from functools import reduce + +# Creating a list +my_list = [10, 20, 30, 40, 50] + +# Calculating the product of the numbers in my_list +# using reduce and lambda functions together +product = reduce(lambda x, y: x * y, my_list) + +# Printing output +print(f"Product = {product}") # Output : Product = 12000000 +``` + +## Difference Between reduce() and accumulate(): +- **Behavior:** + - reduce() stores intermediate results and only returns the final summation value. + - accumulate() returns an iterator containing all intermediate results. The last value in the iterator is the summation value of the list. + +- **Use Cases:** + - Use reduce() when you need a single result (e.g., total sum, product) from the iterable. + - Use accumulate() when you want to access intermediate results during the reduction process. + +- **Initial Value:** + - reduce() allows an optional initial value. + - accumulate() also accepts an optional initial value since Python 3.8. + +- **Order of Arguments:** + - reduce() takes the function first, followed by the iterable. + - accumulate() takes the iterable first, followed by the function. + +## Conclusion: +Python's Reduce function enables us to apply reduction operations to iterables using lambda and callable functions. A +function called reduce() reduces the elements of an iterable to a single cumulative value. The reduce function in +Python solves various straightforward issues, including adding and multiplying iterables of numbers. \ No newline at end of file diff --git a/contrib/advanced-python/regular_expressions.md b/contrib/advanced-python/regular_expressions.md new file mode 100644 index 00000000..81c883ec --- /dev/null +++ b/contrib/advanced-python/regular_expressions.md @@ -0,0 +1,240 @@ +## Regular Expressions in Python +Regular expressions (regex) are a powerful tool for pattern matching and text manipulation. +Python's re module provides comprehensive support for regular expressions, enabling efficient text processing and validation. +Regular expressions (regex) are a versitile tool for matching patterns in strings. In Python, the `re` module provides support for working with regular expressions. + +## 1. Introduction to Regular Expressions +A regular expression is a sequence of characters defining a search pattern. Common use cases include validating input, searching within text, and extracting +specific patterns. + +## 2. Basic Syntax +Literal Characters: Match exact characters (e.g., abc matches "abc"). +Metacharacters: Special characters like ., \*, ?, +, ^, $, [ ], and | used to build patterns. + +**Common Metacharacters:** + +- .: Any character except newline. +- ^: Start of the string. +- $: End of the string. +- *: 0 or more repetitions. +- +: 1 or more repetitions. +- ?: 0 or 1 repetition. +- []: Any one character inside brackets (e.g., [a-z]). +- |: Either the pattern before or after. +- \ : Used to drop the special meaning of character following it +- {} : Indicate the number of occurrences of a preceding regex to match. +- () : Enclose a group of Regex + +Examples: + +1. `.` + +```bash +import re +pattern = r'c.t' +text = 'cat cot cut cit' +matches = re.findall(pattern, text) +print(matches) # Output: ['cat', 'cot', 'cut', 'cit'] +``` + +2. `^` + +```bash +pattern = r'^Hello' +text = 'Hello, world!' +match = re.search(pattern, text) +print(match.group() if match else 'No match') # Output: 'Hello' +``` + +3. `$` + +```bash +pattern = r'world!$' +text = 'Hello, world!' +match = re.search(pattern, text) +print(match.group() if match else 'No match') # Output: 'world!' +``` + +4. `*` + +```bash +pattern = r'ab*' +text = 'a ab abb abbb' +matches = re.findall(pattern, text) +print(matches) # Output: ['a', 'ab', 'abb', 'abbb'] +``` + +5. `+` + +```bash +pattern = r'ab+' +text = 'a ab abb abbb' +matches = re.findall(pattern, text) +print(matches) # Output: ['ab', 'abb', 'abbb'] +``` + +6. `?` + +```bash +pattern = r'ab?' +text = 'a ab abb abbb' +matches = re.findall(pattern, text) +print(matches) # Output: ['a', 'ab', 'ab', 'ab'] +``` + +7. `[]` + +```bash +pattern = r'[aeiou]' +text = 'hello world' +matches = re.findall(pattern, text) +print(matches) # Output: ['e', 'o', 'o'] +``` + +8. `|` + +```bash +pattern = r'cat|dog' +text = 'I have a cat and a dog.' +matches = re.findall(pattern, text) +print(matches) # Output: ['cat', 'dog'] +``` + +9. `\`` + +```bash +pattern = r'\$100' +text = 'The price is $100.' +match = re.search(pattern, text) +print(match.group() if match else 'No match') # Output: '$100' +``` + +10. `{}` + +```bash +pattern = r'\d{3}' +text = 'My number is 123456' +matches = re.findall(pattern, text) +print(matches) # Output: ['123', '456'] +``` + +11. `()` + +```bash +pattern = r'(cat|dog)' +text = 'I have a cat and a dog.' +matches = re.findall(pattern, text) +print(matches) # Output: ['cat', 'dog'] +``` + +## 3. Using the re Module + +**Key functions in the re module:** + +- re.match(): Checks for a match at the beginning of the string. +- re.search(): Searches for a match anywhere in the string. +- re.findall(): Returns a list of all matches. +- re.sub(): Replaces matches with a specified string. +- re.split(): Returns a list where the string has been split at each match. +- re.escape(): Escapes special character + Examples: + +```bash +import re + +# Match at the beginning +print(re.match(r'\d+', '123abc').group()) # Output: 123 + +# Search anywhere +print(re.search(r'\d+', 'abc123').group()) # Output: 123 + +# Find all matches +print(re.findall(r'\d+', 'abc123def456')) # Output: ['123', '456'] + +# Substitute matches +print(re.sub(r'\d+', '#', 'abc123def456')) # Output: abc#def# + +#Return a list where it get matched +print(re.split("\s", txt)) #['The', 'Donkey', 'in', 'the','Town'] + +# Escape special character +print(re.escape("We are good to go")) #We\ are\ good\ to\ go +``` + +## 4. Compiling Regular Expressions + +Compiling regular expressions improves performance for repeated use. + +Example: + +```bash +import re + +pattern = re.compile(r'\d+') +print(pattern.match('123abc').group()) # Output: 123 +print(pattern.search('abc123').group()) # Output: 123 +print(pattern.findall('abc123def456')) # Output: ['123', '456'] + +``` + +## 5. Groups and Capturing + +Parentheses () group and capture parts of the match. + +Example: + +```bash +import re + +match = re.match(r'(\d{3})-(\d{2})-(\d{4})', '123-45-6789') +if match: + print(match.group()) # Output: 123-45-6789 + print(match.group(1)) # Output: 123 + print(match.group(2)) # Output: 45 + print(match.group(3)) # Output: 6789 +``` + +## 6. Special Sequences + +Special sequences are shortcuts for common patterns: + +- \A:Returns a match if the specified characters are at the beginning of the string. +- \b:Returns a match where the specified characters are at the beginning or at the end of a word. +- \B:Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word. +- \d: Any digit. +- \D: Any non-digit. +- \w: Any alphanumeric character. +- \W: Any non-alphanumeric character. +- \s: Any whitespace character. +- \S: Any non-whitespace character. +- \Z:Returns a match if the specified characters are at the end of the string. + +Example: + +```bash +import re + +print(re.search(r'\w+@\w+\.\w+', 'Contact: support@example.com').group()) # Output: support@example.com +``` + +## 7.Sets + +A set is a set of characters inside a pair of square brackets [] with a special meaning: + +- [arn] : Returns a match where one of the specified characters (a, r, or n) is present. +- [a-n] : Returns a match for any lower case character, alphabetically between a and n. +- [^arn] : Returns a match for any character EXCEPT a, r, and n. +- [0123] : Returns a match where any of the specified digits (0, 1, 2, or 3) are present. +- [0-9] : Returns a match for any digit between 0 and 9. +- [0-5][0-9] : Returns a match for any two-digit numbers from 00 and 59. +- [a-zA-Z] : Returns a match for any character alphabetically between a and z, lower case OR upper case. +- [+] : In sets, +, \*, ., |, (), $,{} has no special meaning +- [+] means: return a match for any + character in the string. + +## Summary + +Regular expressions (regex) are a powerful tool for text processing in Python, offering a flexible way to match, search, and manipulate text patterns. The re module provides a comprehensive set of functions and metacharacters to tackle complex text processing tasks. +With regex, you can: +1.Match patterns: Use metacharacters like ., \*, ?, and {} to match specific patterns in text. +2.Search text: Employ functions like re.search() and re.match() to find occurrences of patterns in text. +3.Manipulate text: Utilize functions like re.sub() to replace patterns with new text. diff --git a/contrib/advanced-python/threading.md b/contrib/advanced-python/threading.md new file mode 100644 index 00000000..fa315335 --- /dev/null +++ b/contrib/advanced-python/threading.md @@ -0,0 +1,198 @@ +# Threading in Python +Threading is a sequence of instructions in a program that can be executed independently of the remaining process and +Threads are like lightweight processes that share the same memory space but can execute independently. +The process is an executable instance of a computer program. +This guide provides an overview of the threading module and its key functionalities. + +## Key Characteristics of Threads: +* Shared Memory: All threads within a process share the same memory space, which allows for efficient communication between threads. +* Independent Execution: Each thread can run independently and concurrently. +* Context Switching: The operating system can switch between threads, enabling concurrent execution. + +## Threading Module +This module will allows you to create and manage threads easily. This module includes several functions and classes to work with threads. + +**1. Creating Thread:** +To create a thread in Python, you can use the Thread class from the threading module. + +Example: +```python +import threading + +# Create a thread +thread = threading.Thread() + +# Start the thread +thread.start() + +# Wait for the thread to complete +thread.join() + +print("Thread has finished execution.") +``` +Output : +``` +Thread has finished execution. +``` +**2. Performing Task with Thread:** +We can also perform a specific task by thread by giving a function as target and its argument as arg ,as a parameter to Thread object. + +Example: + +```python +import threading + +# Define a function that will be executed by the thread +def print_numbers(arg): + for i in range(arg): + print(f"Thread: {i}") +# Create a thread +thread = threading.Thread(target=print_numbers,args=(5,)) + +# Start the thread +thread.start() + +# Wait for the thread to complete +thread.join() + +print("Thread has finished execution.") +``` +Output : +``` +Thread: 0 +Thread: 1 +Thread: 2 +Thread: 3 +Thread: 4 +Thread has finished execution. +``` +**3. Delaying a Task with Thread's Timer Function:** +We can set a time for which we want a thread to start. Timer function takes 4 arguments (interval,function,args,kwargs). + +Example: +```python +import threading + +# Define a function that will be executed by the thread +def print_numbers(arg): + for i in range(arg): + print(f"Thread: {i}") +# Create a thread after 3 seconds +thread = threading.Timer(3,print_numbers,args=(5,)) + +# Start the thread +thread.start() + +# Wait for the thread to complete +thread.join() + +print("Thread has finished execution.") +``` +Output : +``` +# after three second output will be generated +Thread: 0 +Thread: 1 +Thread: 2 +Thread: 3 +Thread: 4 +Thread has finished execution. +``` +**4. Creating Multiple Threads** +We can create and manage multiple threads to achieve concurrent execution. + +Example: +```python +import threading + +def print_numbers(thread_name): + for i in range(5): + print(f"{thread_name}: {i}") + +# Create multiple threads +thread1 = threading.Thread(target=print_numbers, args=("Thread 1",)) +thread2 = threading.Thread(target=print_numbers, args=("Thread 2",)) + +# Start the threads +thread1.start() +thread2.start() + +# Wait for both threads to complete +thread1.join() +thread2.join() + +print("Both threads have finished execution.") +``` +Output : +``` +Thread 1: 0 +Thread 1: 1 +Thread 2: 0 +Thread 1: 2 +Thread 1: 3 +Thread 2: 1 +Thread 2: 2 +Thread 2: 3 +Thread 2: 4 +Thread 1: 4 +Both threads have finished execution. +``` + +**5. Thread Synchronization** +When we create multiple threads and they access shared resources, there is a risk of race conditions and data corruption. To prevent this, you can use synchronization primitives such as locks. +A lock is a synchronization primitive that ensures that only one thread can access a shared resource at a time. + +Example: +```Python +import threading + +lock = threading.Lock() + +def print_numbers(thread_name): + for i in range(10): + with lock: + print(f"{thread_name}: {i}") + +# Create multiple threads +thread1 = threading.Thread(target=print_numbers, args=("Thread 1",)) +thread2 = threading.Thread(target=print_numbers, args=("Thread 2",)) + +# Start the threads +thread1.start() +thread2.start() + +# Wait for both threads to complete +thread1.join() +thread2.join() + +print("Both threads have finished execution.") +``` +Output : +``` +Thread 1: 0 +Thread 1: 1 +Thread 1: 2 +Thread 1: 3 +Thread 1: 4 +Thread 1: 5 +Thread 1: 6 +Thread 1: 7 +Thread 1: 8 +Thread 1: 9 +Thread 2: 0 +Thread 2: 1 +Thread 2: 2 +Thread 2: 3 +Thread 2: 4 +Thread 2: 5 +Thread 2: 6 +Thread 2: 7 +Thread 2: 8 +Thread 2: 9 +Both threads have finished execution. +``` + +A ```lock``` object is created using threading.Lock() and The ```with lock``` statement ensures that the lock is acquired before printing and released after printing. This prevents other threads from accessing the print statement simultaneously. + +## Conclusion +Threading in Python is a powerful tool for achieving concurrency and improving the performance of I/O-bound tasks. By understanding and implementing threads using the threading module, you can enhance the efficiency of your programs. To prevent race situations and maintain data integrity, keep in mind that thread synchronization must be properly managed. diff --git a/contrib/advanced-python/type-hinting.md b/contrib/advanced-python/type-hinting.md new file mode 100644 index 00000000..fcf1e1c0 --- /dev/null +++ b/contrib/advanced-python/type-hinting.md @@ -0,0 +1,106 @@ +# Introduction to Type Hinting in Python +Type hinting is a feature in Python that allows you to specify the expected data types of variables, function arguments, and return values. It was introduced +in Python 3.5 via PEP 484 and has since become a standard practice to improve code readability and facilitate static analysis tools. + +**Benefits of Type Hinting** + +1. Improved Readability: Type hints make it clear what type of data is expected, making the code easier to understand for others and your future self. +2. Error Detection: Static analysis tools like MyPy can use type hints to detect type errors before runtime, reducing bugs and improving code quality. +3.Better Tooling Support: Modern IDEs and editors can leverage type hints to provide better autocompletion, refactoring, and error checking features. +4. Documentation: Type hints serve as a form of documentation, indicating the intended usage of functions and classes. + +**Syntax of Type Hinting**
+Type hints can be added to variables, function arguments, and return values using annotations. + +1. Variable Annotations: + +```bash +age: int = 25 +name: str = "Alice" +is_student: bool = True +``` + +2. Function Annotations: + +```bash +def greet(name: str) -> str: + return f"Hello, {name}!" +``` + +3. Multiple Arguments and Return Types: + +```bash +def add(a: int, b: int) -> int: + return a + b +``` + +4. Optional Types: Use the Optional type from the typing module for values that could be None. + +```bash +from typing import Optional + +def get_user_name(user_id: int) -> Optional[str]: + # Function logic here + return None # Example return value +``` + +5. Union Types: Use the Union type when a variable can be of multiple types. + +```bash +from typing import Union + +def get_value(key: str) -> Union[int, str]: + # Function logic here + return "value" # Example return value +``` + +6. List and Dictionary Types: Use the List and Dict types from the typing module for collections. + +```bash +from typing import List, Dict + +def process_data(data: List[int]) -> Dict[str, int]: + # Function logic here + return {"sum": sum(data)} # Example return value +``` + +7. Type Aliases: Create type aliases for complex types to make the code more readable. + +```bash +from typing import List, Tuple + +Coordinates = List[Tuple[int, int]] + +def draw_shape(points: Coordinates) -> None: + # Function logic here + pass +``` + +**Example of Type Hinting in a Class**
+Here is a more comprehensive example using type hints in a class: + +```bash +from typing import List + +class Student: + def __init__(self, name: str, age: int, grades: List[int]) -> None: + self.name = name + self.age = age + self.grades = grades + + def average_grade(self) -> float: + return sum(self.grades) / len(self.grades) + + def add_grade(self, grade: int) -> None: + self.grades.append(grade) + +# Example usage +student = Student("Alice", 20, [90, 85, 88]) +print(student.average_grade()) # Output: 87.66666666666667 +student.add_grade(92) +print(student.average_grade()) # Output: 88.75 +``` + +### Conclusion +Type hinting in Python enhances code readability, facilitates error detection through static analysis, and improves tooling support. By adopting +type hinting, you can write clearer and more maintainable code, reducing the likelihood of bugs and making your codebase easier to navigate for yourself and others. diff --git a/contrib/api-development/api-methods.md b/contrib/api-development/api-methods.md new file mode 100644 index 00000000..ea419799 --- /dev/null +++ b/contrib/api-development/api-methods.md @@ -0,0 +1,56 @@ +# API Methods + +| Method | Summary | CRUD | Accepts Request Body | Idempotent | Safe | Response Body | +|---------|----------------------------------------------------------|--------|-----------------------|------------|------|---------------| +| GET | To fetch a single resource or group of resources | Read | No | Yes | Yes | Yes | +| PUT | To update an entire resource in one go | Update | Yes | Yes | No | Yes | +| POST | To create a new resource | Create | Yes | No | No | Yes | +| PATCH | To partially update a resource | Update | Yes | No | No | Yes | +| DELETE | To delete a resource | Delete | No | Yes | No | No | +| OPTIONS | To get information on permitted operations | Read | No | Yes | Yes | Yes | +| HEAD | To get metadata of the endpoint | Read | No | Yes | Yes | No | +| TRACE | For diagnosing purposes | Read | No | Yes | Yes | No | +| CONNECT | To make the two-way connection between the client and the resource | - | No | No | No | No | + +## Method Details: + +- **GET**: + - The GET method is used to retrieve data from a specified resource. It does not typically alter the state of the resource and is safe and idempotent. It can accept query parameters in the URL to filter or sort the results. A response body is returned with the requested data. + +- **POST**: + - The POST method is used to submit data to be processed to a specified resource. It is commonly used for creating new resources, and it may or may not require a request body containing the data to be created. It is not idempotent nor safe. A response body is returned with the newly created resource's details. + +- **PUT**: + - The PUT method is used to update a specified resource with new data. It typically requires a request body containing the complete representation of the resource to be updated. It is idempotent and not safe. A response body is returned with the updated resource's details. + +- **PATCH**: + - The PATCH method is used to apply partial modifications to a resource. It typically requires a request body containing the specific changes to be made. It is not idempotent nor safe. A response body is returned with the updated resource's details. + +- **DELETE**: + - The DELETE method is used to delete a specified resource. It does not typically require a request body, as the resource to be deleted is identified in the request URI. It is idempotent but not safe. No response body is returned. + +- **OPTIONS**: + - The OPTIONS method is used to describe the communication options for the target resource. It does not typically require a request body and is safe and idempotent. A response body is returned with information about the supported HTTP methods and other metadata. + +- **HEAD**: + - The HEAD method is similar to the GET method, but it only retrieves the headers of the response without the body. It does not typically require a request body and is safe and idempotent. + +- **TRACE**: + - The TRACE method is used to test the connectivity between the client and the server. It does not typically require a request body and is safe and idempotent. + +- **CONNECT**: + - The CONNECT method is used to establish a tunnel to the server using a proxy. It does not typically require a request body and is neither safe nor idempotent. + +### Definitions: + +- **CRUD**: + - CRUD stands for Create, Read, Update, and Delete, representing the four basic functions of persistent storage. These operations are commonly used in database and RESTful API designs. + +- **Accepts Request Body**: + - Indicates whether the HTTP method typically accepts a request body containing data to be processed or modified. If yes, the method may require the client to include data in the request body. + +- **Idempotent**: + - An idempotent operation means that making the same request multiple times will produce the same result as making it once. In the context of HTTP methods, an idempotent method does not change the server state after multiple identical requests. + +- **Safe**: + - A safe operation does not modify the state of the server or its resources. It only retrieves data without causing any side effects. Safe methods are typically used for read-only operations. \ No newline at end of file diff --git a/contrib/api-development/assets/image.png b/contrib/api-development/assets/image.png new file mode 100644 index 00000000..682e9ed4 Binary files /dev/null and b/contrib/api-development/assets/image.png differ diff --git a/contrib/api-development/assets/image2.png b/contrib/api-development/assets/image2.png new file mode 100644 index 00000000..298a726b Binary files /dev/null and b/contrib/api-development/assets/image2.png differ diff --git a/contrib/api-development/fast-api.md b/contrib/api-development/fast-api.md new file mode 100644 index 00000000..b67c56ef --- /dev/null +++ b/contrib/api-development/fast-api.md @@ -0,0 +1,289 @@ + +# FastAPI + + +## Table of Contents + +- [Introduction](#introduction) +- [Features](#features) +- [Installation](#installation) +- [Making First API](#making-first-api) + - [GET Method](#get-method) + - [Running Server and calling API](#running-server-and-calling-api) +- [Path Parameters](#pata-parameters) +- [Query Parameters](#query-parameters) +- [POST Method](#post-method) +- [PUT Method](#put-method) +- [Additional Content](#additional-content) + - [Swagger UI](#swagger-ui) + +## Introduction +FastAPI is a modern, web-framework for building APIs with Python. +It uses python 3.7+ +## Features + +1. **Speed ⚡:** FastAPI is built on top of Starlette, a lightweight ASGI framework. It's designed for high performance and handles thousands of requests per second . +2. **Easy to use 😃:** FastAPI is designed to be intuitive and easy to use, especially for developers familiar with Python. It uses standard Python type hints for request and response validation, making it easy to understand and write code. +3. **Automatic Interactive API Documentation generation 🤩:** FastAPI automatically generates interactive API documentation (Swagger UI or ReDoc) based on your code and type annotations. Swagger UI also allows you to test API endpoints. +4. **Asynchronous Support 🔁:** FastAPI fully supports asynchronous programming, allowing you to write asynchronous code with async/await syntax. This enables handling high-concurrency scenarios and improves overall performance. + +Now, lets get hands-on with FastAPI. + + +## Installation + +Make sure that you have python version 3.7 or greater. + +Then, simply open your command shell and give the following command. + +```bash + pip install fastapi +``` +After this, you need to install uvicorn. uvicorn is an ASGI server on which we will be running our API. + +```bash + pip install uvicorn +``` + + + +## Making First API + +After successful installation we will be moving towards making an API and seeing how to use it. + +Firstly, the first thing in an API is its root/index page which is sent as response when API is called. + +Follow the given steps to make your first FastAPI🫨 + +First, lets import FastAPI to get things started. + +```python +from fastapi import FastAPI +app = FastAPI() +``` +Now, we will write the ``GET`` method for the root of the API. As you have already seen, the GET method is ``HTTP request`` method used to fetch data from a source. In web development, it is primarily used to *retrieve data* from server. + +The root of the app is ``"/"`` When the API will be called, response will be generated by on this url: ```localhost:8000``` + +### GET method +Following is the code to write GET method which will be calling API. + +When the API is called, the ``read_root()`` function will be hit and the JSON response will be returned which will be shown on your web browser. + +```python +@app.get("/") +def read_root(): + return {"Hello": "World"} + +``` + +Tadaaa! you have made your first FastAPI! Now lets run it! + +### Running Server and calling API + +Open your terminal and give following command: +```bash +uvicorn myapi:app --reload +``` +Here, ``myapi`` is the name of your API which is name of your python file. ``app`` is the name you have given to your API in assignment ``app = FastAPI()`` + +After running this command, uvicorn server will be live and you can access your API. + +As right now we have only written root ``GET`` method, only its corresponding response will be displayed. + +On running this API, we get the response in JSON form: + +```json +{ + "Hello": "World" +} +``` +## Path Parameters +Path parameters are a way to send variables to an API endpoint so that an operation may be perfomed on it. + +This feature is particularly useful for defining routes that need to operate on resources identified by unique identifiers, such as user IDs, product IDs, or any other unique value. + +### Example +Lets take an example to make it understandable. + + +Assume that we have some Students 🧑‍🎓 in our class and we have saved their data in form of dictionary in our API (in practical scenarios they will be saved in a database and API will query database). +So we have a student dictionary that looks something like this: + +```python +students = { + 1: { + "name": "John", + "age": 17, + "class": "year 12" + }, + 2: { + "name": "Jane", + "age": 16, + "class": "year 11" + }, + 3: { + "name": "Alice", + "age": 17, + "class": "year 12" + } +} +``` +Here, keys are ``student_id``. + +Let's say user wants the data of the student whose ID is 2. Here, we will take ID as **path parameter** from the user and return the data of that ID. + + +Lets see how it will be done! + +```python +@app.get("/students/{student_id}") +def read_student(student_id: int): + return students[student_id] +``` +Here is the explanatory breakdown of the method: + +- ``/students`` is the URL of students endpoint in API. +- ``{student_id}`` is the path parameter, which is a dynamic variable the user will give to fetch the record of a particular student. +- ``def read_student(student_id: int)`` is the signature of function which takes the student_id we got from path parameter. Its type is defined as ``int`` as our ID will be an integer. +**Note that there will be automatic type checking of the parameter. If it is not same as type defined in method, an Error response ⛔ will be generated.** + +- ``return students[student_id]`` will return the data of required student from dictionary. + +When the user passes the URL ``http://127.0.0.1:8000/students/1`` the data of student with student_id=1 is fetched and displayed. +In this case following output will be displayed: + +```json +{ + "name": "John", + "age": 17, + "class": "year 12" +} +``` + +## Query Parameters +Query parameters in FastAPI allow you to pass data to your API endpoints via the URL's query string. This is useful for filtering, searching, and other operations that do not fit well with the path parameters. + +Query parameters are specified after the ``?`` symbol in the URL and are typically used for optional parameters. + +### Example +Lets continue the example of students to understand the query parameters. + +Assume that we want to search students by name. In this case, we will be sending datat in query parameter which will be read by our method and respective result will be returned. + +Lets see the method: + +```python +@app.get("/get-by-name") +def read_student(name: str): + for student_id in students: + if students[student_id]["name"] == name: + return students[student_id] + return {"Error": "Student not found"} +``` +Here is the explanatory breakdown of this process: + +- ``/get-by-name`` is the URL of the endpoint. After this URL, client will enter the query parameter(s). +- ``http://127.0.0.1:8000/get-by-name?name=Jane`` In this URL, ``name=Jane`` is the query parameter. It means that user needs to search the student whose name is Jane. When you hit this URL, ``read_student(name:str)`` method is called and respective response is returned. + +In this case, the output will be: +```json +{ + "name": "Jane", + "age": 16, + "class": "year 11" +} +``` +If we pass a name that doesn't exist in dictionary, Error response will be returned. + +## POST Method +The ``POST`` method in FastAPI is used to **create resources** or submit data to an API endpoint. This method typically involves sending data in the request body, which the server processes to create or modify resources. + +**⛔ In case of ``GET`` method, sent data is part of URL, but in case of ``POST`` metohod, sent data is part of request body.** + +### Example +Again continuing with the example of student. Now, lets assume we need to add student. Following is the ``POST`` method to do this: + +```python +@app.post("/create-student/{student_id}") +def create_student(student_id: int, student: dict): + if student_id in students: + return {"Error": "Student exists"} + students[student_id] = student + return students +``` +Here is the explanation of process: + +- ``/create-student/{student_id}`` shows that only student_id will be part of URL, rest of the data will be sent in request body. +- Data in the request body will be in JSON format and will be received in ``student: dict`` +- Data sent in JSON format is given as: +```json +{ +"name":"Seerat", +"age":22, +"class":"8 sem" + +} +``` +*Note:* I have used Swagger UI to send data in request body to test my ``POST`` method but you may use any other API tesing tool like Postman etc. + +- This new student will be added in the dictionary, and if operation is successful, new dictionary will be returned as response. + +Following is the output of this ``POST`` method call: + +```json +{ + "1": { + "name": "John", + "age": 17, + "class": "year 12" + }, + "2": { + "name": "Jane", + "age": 16, + "class": "year 11" + }, + "3": { + "name": "Alice", + "age": 17, + "class": "year 12" + }, + "4": { + "name": "Seerat", + "age": 22, + "class": "8 sem" + } +} +``` + +## PUT Method +The ``PUT`` method in FastAPI is used to **update** existing resources or create resources if they do not already exist. It is one of the standard HTTP methods and is idempotent, meaning that multiple identical requests should have the same effect as a single request. + +### Example +Let's update the record of a student. + +```python +@app.put("/update-student/{student_id}") +def update_student(student_id: int, student: dict): + if student_id not in students: + return {"Error": "Student does not exist"} + students[student_id] = student + return students +``` +``PUT`` method is nearly same as ``POST`` method but ``PUT`` is indempotent while ``POST`` is not. + +The given method will update an existing student record and if student doesnt exist, it'll send error response. + +## Additional Content + +### Swagger UI + +Swagger UI automatically generates UI for API tesing. Just write ``/docs`` with the URL and UI mode of Swagger UI will be launched. + +Following Screenshot shows the Swagger UI +![App Screenshot](assets/image.png) + +Here is how I tested ``POST`` method in UI: +![Screenshot](assets/image2.png) + +That's all for FastAPI for now.... Happy Learning! diff --git a/contrib/api-development/index.md b/contrib/api-development/index.md new file mode 100644 index 00000000..8d4dc595 --- /dev/null +++ b/contrib/api-development/index.md @@ -0,0 +1,4 @@ +# List of sections + +- [API Methods](api-methods.md) +- [FastAPI](fast-api.md) diff --git a/contrib/database/index.md b/contrib/database/index.md new file mode 100644 index 00000000..bc3d7e67 --- /dev/null +++ b/contrib/database/index.md @@ -0,0 +1,4 @@ +# List of sections + +- [Introduction to MySQL and Queries](intro_mysql_queries.md) +- [SQLAlchemy and Aggregation Functions](sqlalchemy-aggregation.md) diff --git a/contrib/database/intro_mysql_queries.md b/contrib/database/intro_mysql_queries.md new file mode 100644 index 00000000..b955ead6 --- /dev/null +++ b/contrib/database/intro_mysql_queries.md @@ -0,0 +1,371 @@ +# Introduction to MySQL Queries +MySQL is a widely-used open-source relational database management system (RDBMS) that utilizes SQL (Structured Query Language) for managing and querying data. In Python, the **mysql-connector-python** library allows you to connect to MySQL databases and execute SQL queries, providing a way to interact with the database from within a Python program. + +## Prerequisites +* Python and MySQL Server must be installed and configured. +* The library: **mysql-connector-python** must be installed. + +## Establishing connection with server +To establish a connection with the MySQL server, you need to import the **mysql.connector** module and create a connection object using the **connect()** function by providing the prompt server details as mentioned. + +```python +import mysql.connector + +con = mysql.connector.connect( +host ="localhost", +user ="root", +passwd ="12345" +) + +print((con.is_connected())) +``` +Having established a connection with the server, you get the following output : +``` +True +``` +## Creating a Database [CREATE] +To create a database, you need to execute the **CREATE DATABASE** query. The following code snippet demonstrates how to create a database named **GSSOC**. +```python +import mysql.connector + +# Establish the connection +conn = mysql.connector.connect( + host="localhost", + user="root", + password="12345" +) + +# Create a cursor object +cursor = conn.cursor() + +# Execute the query to show databases +cursor.execute("SHOW DATABASES") + +# Fetch and print the databases +databases = cursor.fetchall() +for database in databases: + print(database[0]) + +# Execute the query to create database GSSOC +cursor.execute("CREATE DATABASE GSSOC") + +print("\nAfter creation of the database\n") + +# Execute the query to show databases +cursor.execute("SHOW DATABASES") +# Fetch and print the databases +databases = cursor.fetchall() +for database in databases: + print(database[0]) + +cursor.close() +conn.close() +``` +You can observe in the output below, after execution of the query a new database named **GSSOC** has been created. +#### Output: +``` +information_schema +mysql +performance_schema +sakila +sys +world + +After creation of the database + +gssoc +information_schema +mysql +performance_schema +sakila +sys +world +``` +## Creating a Table in the Database [CREATE] +Now, we will create a table in the database. We will create a table named **example_table** in the database **GSSOC**. We will execute **CREATE TABLE** query and provide the fields for the table as mentioned in the code below: +```python +import mysql.connector + +# Establish the connection +conn = mysql.connector.connect( + host="localhost", + user="root", + password="12345" +) +# Create a cursor object +cursor = conn.cursor() + +# Execute the query to show tables +cursor.execute("USE GSSOC") +cursor.execute("SHOW TABLES") + +# Fetch and print the tables +tables = cursor.fetchall() +print("Before creation of table\n") +for table in tables: + print(table[0]) + +create_table_query = """ +CREATE TABLE example_table ( + name VARCHAR(255) NOT NULL, + age INT NOT NULL, + email VARCHAR(255) +) +""" +# Execute the query +cursor.execute(create_table_query) + +# Commit the changes +conn.commit() + +print("\nAfter creation of Table\n") +# Execute the query to show tables in GSSOC +cursor.execute("SHOW TABLES") + +# Fetch and print the tables +tables = cursor.fetchall() +for table in tables: + print(table[0]) + +cursor.close() +conn.close() +``` +#### Output: +``` +Before creation of table + + +After creation of Table + +example_table +``` +## Inserting Data [INSERT] +To insert data in an existing table, the **INSERT INTO** query is used, followed by the name of the table in which the data needs to be inserted. The following code demonstrates the insertion of multiple records in the table by **executemany()**. +```python +import mysql.connector + +# Establish the connection +conn = mysql.connector.connect( + host="localhost", + user="root", + password="12345" +) +# Create a cursor object +cursor = conn.cursor() +cursor.execute("USE GSSOC") +# SQL query to insert data +insert_data_query = """ +INSERT INTO example_table (name, age, email) +VALUES (%s, %s, %s) +""" + +# Data to be inserted +data_to_insert = [ + ("John Doe", 28, "john.doe@example.com"), + ("Jane Smith", 34, "jane.smith@example.com"), + ("Sam Brown", 22, "sam.brown@example.com") +] + +# Execute the query for each data entry +cursor.executemany(insert_data_query, data_to_insert) + +conn.commit() +cursor.close() +conn.close() +``` +## Displaying Data [SELECT] +To display the data from a table, the **SELECT** query is used. The following code demonstrates the display of data from the table. +```python +import mysql.connector + +# Establish the connection +conn = mysql.connector.connect( + host="localhost", + user="root", + password="12345" +) +# Create a cursor object +cursor = conn.cursor() +cursor.execute("USE GSSOC") + +# SQL query to display data +display_data_query = "SELECT * FROM example_table" + +# Execute the query for each data entry +cursor.execute(display_data_query) + +# Fetch all the rows +rows = cursor.fetchall() + +# Print the column names +column_names = [desc[0] for desc in cursor.description] +print(column_names) + +# Print the rows +for row in rows: + print(row) + +cursor.close() +conn.close() +``` +#### Output : +``` +['name', 'age', 'email'] +('John Doe', 28, 'john.doe@example.com') +('Jane Smith', 34, 'jane.smith@example.com') +('Sam Brown', 22, 'sam.brown@example.com') +``` +## Updating Data [UPDATE] +To update data in the table, **UPDATE** query is used. In the following code, we will be updating the email and age of the record where the name is John Doe. +```python +import mysql.connector + +# Establish the connection +conn = mysql.connector.connect( + host="localhost", + user="root", + password="12345" +) +# Create a cursor object +cursor = conn.cursor() +cursor.execute("USE GSSOC") + +# SQL query to display data +display_data_query = "SELECT * FROM example_table" + +# SQL Query to update data of John Doe +update_data_query = """ +UPDATE example_table +SET age = %s, email = %s +WHERE name = %s +""" + +# Data to be updated +data_to_update = (30, "new.email@example.com", "John Doe") + +# Execute the query +cursor.execute(update_data_query, data_to_update) + +# Commit the changes +conn.commit() + +# Execute the query for each data entry +cursor.execute(display_data_query) + +# Fetch all the rows +rows = cursor.fetchall() + +# Print the column names +column_names = [desc[0] for desc in cursor.description] +print(column_names) + +# Print the rows +for row in rows: + print(row) + +cursor.close() +conn.close() +``` +#### Output: +``` +['name', 'age', 'email'] +('John Doe', 30, 'new.email@example.com') +('Jane Smith', 34, 'jane.smith@example.com') +('Sam Brown', 22, 'sam.brown@example.com') +``` + +## Deleting Data [DELETE] +In this segment, we will Delete the record named "John Doe" using the **DELETE** and **WHERE** statements in the query. The following code explains the same and the observe the change in output. +```python +import mysql.connector + +# Establish the connection +conn = mysql.connector.connect( + host="localhost", + user="root", + password="12345" +) +# Create a cursor object +cursor = conn.cursor() +cursor.execute("USE GSSOC") + +# SQL query to display data +display_data_query = "SELECT * FROM example_table" + +# SQL query to delete data +delete_data_query = "DELETE FROM example_table WHERE name = %s" + +# Data to be deleted +data_to_delete = ("John Doe",) + +# Execute the query +cursor.execute(delete_data_query, data_to_delete) + +# Commit the changes +conn.commit() + +# Execute the query for each data entry +cursor.execute(display_data_query) + +# Fetch all the rows +rows = cursor.fetchall() + +# Print the column names +column_names = [desc[0] for desc in cursor.description] +print(column_names) + +# Print the rows +for row in rows: + print(row) + +cursor.close() +conn.close() +``` +#### Output: +``` +['name', 'age', 'email'] +('Jane Smith', 34, 'jane.smith@example.com') +('Sam Brown', 22, 'sam.brown@example.com') +``` +## Deleting the Table/Database [DROP] +For deleting a table, you can use the **DROP** query in the following manner: +```python +import mysql.connector + +# Establish the connection +conn = mysql.connector.connect( + host="localhost", + user="root", + password="12345" +) +# Create a cursor object +cursor = conn.cursor() +cursor.execute("USE GSSOC") + +# SQL query to delete the table +delete_table_query = "DROP TABLE IF EXISTS example_table" + +# Execute the query +cursor.execute(delete_table_query) + +# Verify the table deletion +cursor.execute("SHOW TABLES LIKE 'example_table'") +result = cursor.fetchone() + +cursor.close() +conn.close() + +if result: + print("Table deletion failed.") +else: + print("Table successfully deleted.") +``` +#### Output: +``` +Table successfully deleted. +``` +Similarly, you can delete the database also by using the **DROP** and accordingly changing the query to be executed. + + + + diff --git a/contrib/database/sqlalchemy-aggregation.md b/contrib/database/sqlalchemy-aggregation.md new file mode 100644 index 00000000..9fce96c0 --- /dev/null +++ b/contrib/database/sqlalchemy-aggregation.md @@ -0,0 +1,123 @@ +# SQLAlchemy +SQLAlchemy is a powerful and flexible SQL toolkit and Object-Relational Mapping (ORM) library for Python. It is a versatile library that bridges the gap between Python applications and relational databases. + +SQLAlchemy allows the user to write database-agnostic code that can work with a variety of relational databases such as SQLite, MySQL, PostgreSQL, Oracle, and Microsoft SQL Server. The ORM layer in SQLAlchemy allows developers to map Python classes to database tables. This means you can interact with your database using Python objects instead of writing raw SQL queries. + +## Setting up the Environment +* Python and MySQL Server must be installed and configured. +* The library: **mysql-connector-python** and **sqlalchemy** must be installed. + +```bash +pip install sqlalchemy mysql-connector-python +``` + +* If not installed, you can install them using the above command in terminal, + +## Establishing Connection with Database + +* Create a connection with the database using the following code snippet: +```python +from sqlalchemy import create_engine +from sqlalchemy.orm import declarative_base +from sqlalchemy.orm import sessionmaker + +DATABASE_URL = 'mysql+mysqlconnector://root:12345@localhost/gssoc' + +engine = create_engine(DATABASE_URL) +Session = sessionmaker(bind=engine) +session = Session() + +Base = declarative_base() +``` + +* The connection string **DATABASE_URL** is passed as an argument to **create_engine** function which is used to create a connection to the database. This connection string contains the database credentials such as the database type, username, password, and database name. +* The **sessionmaker** function is used to create a session object which is used to interact with the database +* The **declarative_base** function is used to create a base class for all the database models. This base class is used to define the structure of the database tables. + +## Creating Tables + +* The following code snippet creates a table named **"products"** in the database: +```python +from sqlalchemy import Column, Integer, String, Float + +class Product(Base): + __tablename__ = 'products' + id = Column(Integer, primary_key=True) + name = Column(String(50)) + category = Column(String(50)) + price = Column(Float) + quantity = Column(Integer) + +Base.metadata.create_all(engine) +``` + +* The **Product class** inherits from **Base**, which is a base class for all the database models. +* The **Base.metadata.create_all(engine)** statement is used to create the table in the database. The engine object is a connection to the database that was created earlier. + +## Inserting Data for Aggregation Functions + +* The following code snippet inserts data into the **"products"** table: +```python +products = [ + Product(name='Laptop', category='Electronics', price=1000, quantity=50), + Product(name='Smartphone', category='Electronics', price=700, quantity=150), + Product(name='Tablet', category='Electronics', price=400, quantity=100), + Product(name='Headphones', category='Accessories', price=100, quantity=200), + Product(name='Charger', category='Accessories', price=20, quantity=300), +] + +session.add_all(products) +session.commit() +``` + +* A list of **Product** objects is created. Each Product object represents a row in the **products table** in the database. +* The **add_all** method of the session object is used to add all the Product objects to the session. This method takes a **list of objects as an argument** and adds them to the session. +* The **commit** method of the session object is used to commit the changes made to the database. + +## Aggregation Functions + +SQLAlchemy provides functions that correspond to SQL aggregation functions and are available in the **sqlalchemy.func module**. + +### COUNT + +The **COUNT** function returns the number of rows in a result set. It can be demonstrated using the following code snippet: +```python +from sqlalchemy import func + +total_products = session.query(func.count(Product.id)).scalar() +print(f'Total products: {total_products}') +``` + +### SUM + +The **SUM** function returns the sum of all values in a column. It can be demonstrated using the following code snippet: +```python +total_price = session.query(func.sum(Product.price)).scalar() +print(f'Total price of all products: {total_price}') +``` + +### AVG + +The **AVG** function returns the average of all values in a column. It can be demonstrated by the following code snippet: +```python +average_price = session.query(func.avg(Product.price)).scalar() +print(f'Average price of products: {average_price}') +``` + +### MAX + +The **MAX** function returns the maximum value in a column. It can be demonstrated using the following code snippet : +```python +max_price = session.query(func.max(Product.price)).scalar() +print(f'Maximum price of products: {max_price}') +``` + +### MIN + +The **MIN** function returns the minimum value in a column. It can be demonstrated using the following code snippet: +```python +min_price = session.query(func.min(Product.price)).scalar() +print(f'Minimum price of products: {min_price}') +``` + +In general, the aggregation functions can be implemented by utilising the **session** object to execute the desired query on the table present in a database using the **query()** method. The **scalar()** method is called on the query object to execute the query and return a single value diff --git a/contrib/ds-algorithms/avl-trees.md b/contrib/ds-algorithms/avl-trees.md new file mode 100644 index 00000000..b87e82cb --- /dev/null +++ b/contrib/ds-algorithms/avl-trees.md @@ -0,0 +1,185 @@ +# AVL Tree + +In Data Structures and Algorithms, an **AVL Tree** is a self-balancing binary search tree (BST) where the difference between heights of left and right subtrees cannot be more than one for all nodes. It ensures that the tree remains balanced, providing efficient search, insertion, and deletion operations. + +## Points to be Remembered + +- **Balance Factor**: The difference in heights between the left and right subtrees of a node. It should be -1, 0, or +1 for all nodes in an AVL tree. +- **Rotations**: Tree rotations (left, right, left-right, right-left) are used to maintain the balance factor within the allowed range. + +## Real Life Examples of AVL Trees + +- **Databases**: AVL trees can be used to maintain large indexes for database tables, ensuring quick data retrieval. +- **File Systems**: Some file systems use AVL trees to keep track of free and used memory blocks. + +## Applications of AVL Trees + +AVL trees are used in various applications in Computer Science: + +- **Database Indexing** +- **Memory Allocation** +- **Network Routing Algorithms** + +Understanding these applications is essential for Software Development. + +## Operations in AVL Tree + +Key operations include: + +- **INSERT**: Insert a new element into the AVL tree. +- **SEARCH**: Find the position of an element in the AVL tree. +- **DELETE**: Remove an element from the AVL tree. + +## Implementing AVL Tree in Python + +```python +class AVLTreeNode: + def __init__(self, key): + self.key = key + self.left = None + self.right = None + self.height = 1 + +class AVLTree: + def insert(self, root, key): + if not root: + return AVLTreeNode(key) + + if key < root.key: + root.left = self.insert(root.left, key) + else: + root.right = self.insert(root.right, key) + + root.height = 1 + max(self.getHeight(root.left), self.getHeight(root.right)) + balance = self.getBalance(root) + + if balance > 1 and key < root.left.key: + return self.rotateRight(root) + if balance < -1 and key > root.right.key: + return self.rotateLeft(root) + if balance > 1 and key > root.left.key: + root.left = self.rotateLeft(root.left) + return self.rotateRight(root) + if balance < -1 and key < root.right.key: + root.right = self.rotateRight(root.right) + return self.rotateLeft(root) + + return root + + def search(self, root, key): + if not root or root.key == key: + return root + + if key < root.key: + return self.search(root.left, key) + + return self.search(root.right, key) + + def delete(self, root, key): + if not root: + return root + + if key < root.key: + root.left = self.delete(root.left, key) + elif key > root.key: + root.right = self.delete(root.right, key) + else: + if root.left is None: + temp = root.right + root = None + return temp + elif root.right is None: + temp = root.left + root = None + return temp + + temp = self.getMinValueNode(root.right) + root.key = temp.key + root.right = self.delete(root.right, temp.key) + + if root is None: + return root + + root.height = 1 + max(self.getHeight(root.left), self.getHeight(root.right)) + balance = self.getBalance(root) + + if balance > 1 and self.getBalance(root.left) >= 0: + return self.rotateRight(root) + if balance < -1 and self.getBalance(root.right) <= 0: + return self.rotateLeft(root) + if balance > 1 and self.getBalance(root.left) < 0: + root.left = self.rotateLeft(root.left) + return self.rotateRight(root) + if balance < -1 and self.getBalance(root.right) > 0: + root.right = self.rotateRight(root.right) + return self.rotateLeft(root) + + return root + + def rotateLeft(self, z): + y = z.right + T2 = y.left + y.left = z + z.right = T2 + z.height = 1 + max(self.getHeight(z.left), self.getHeight(z.right)) + y.height = 1 + max(self.getHeight(y.left), self.getHeight(y.right)) + return y + + def rotateRight(self, z): + y = z.left + T3 = y.right + y.right = z + z.left = T3 + z.height = 1 + max(self.getHeight(z.left), self.getHeight(z.right)) + y.height = 1 + max(self.getHeight(y.left), self.getHeight(y.right)) + return y + + def getHeight(self, root): + if not root: + return 0 + return root.height + + def getBalance(self, root): + if not root: + return 0 + return self.getHeight(root.left) - self.getHeight(root.right) + + def getMinValueNode(self, root): + if root is None or root.left is None: + return root + return self.getMinValueNode(root.left) + + def preOrder(self, root): + if not root: + return + print(root.key, end=' ') + self.preOrder(root.left) + self.preOrder(root.right) + +#Example usage +avl_tree = AVLTree() +root = None + +root = avl_tree.insert(root, 10) +root = avl_tree.insert(root, 20) +root = avl_tree.insert(root, 30) +root = avl_tree.insert(root, 40) +root = avl_tree.insert(root, 50) +root = avl_tree.insert(root, 25) + +print("Preorder traversal of the AVL tree is:") +avl_tree.preOrder(root) +``` + +## Output + +```markdown +Preorder traversal of the AVL tree is: +30 20 10 25 40 50 +``` + +## Complexity Analysis + +- **Insertion**: O(logn). Inserting a node involves traversing the height of the tree, which is logarithmic due to the balancing property. +- **Search**: O(logn). Searching for a node involves traversing the height of the tree. +- **Deletion**: O(log⁡n). Deleting a node involves traversing and potentially rebalancing the tree, maintaining the logarithmic height. \ No newline at end of file diff --git a/contrib/ds-algorithms/binary-tree.md b/contrib/ds-algorithms/binary-tree.md new file mode 100644 index 00000000..03da2cf8 --- /dev/null +++ b/contrib/ds-algorithms/binary-tree.md @@ -0,0 +1,231 @@ +# Binary Tree + +A binary tree is a non-linear data structure in which each node can have atmost two children, known as the left and the right child. It is a heirarchial data structure represented in the following way: + +``` + A...................Level 0 + / \ + B C.................Level 1 + / \ \ + D E G...............Level 2 +``` + +## Basic Terminologies + +- **Root node:** The topmost node in a tree is the root node. The root node does not have any parent. In the above example, **A** is the root node. +- **Parent node:** The predecessor of a node is called the parent of that node. **A** is the parent of **B** and **C**, **B** is the parent of **D** and **E** and **C** is the parent of **G**. +- **Child node:** The successor of a node is called the child of that node. **B** and **C** are children of **A**, **D** and **E** are children of **B** and **G** is the right child of **C**. +- **Leaf node:** Nodes without any children are called the leaf nodes. **D**, **E** and **G** are the leaf nodes. +- **Ancestor node:** Predecessor nodes on the path from the root to that node are called ancestor nodes. **A** and **B** are the ancestors of **E**. +- **Descendant node:** Successor nodes on the path from the root to that node are called descendant nodes. **B** and **E** are descendants of **A**. +- **Sibling node:** Nodes having the same parent are called sibling nodes. **B** and **C** are sibling nodes and so are **D** and **E**. +- **Level (Depth) of a node:** Number of edges in the path from the root to that node is the level of that node. The root node is always at level 0. The depth of root node is the depth of the tree. +- **Height of a node:** Number of edges in the path from that node to the deepest leaf is the height of that node. The height of the root is the height of a tree. Height of node **A** is 2, nodes **B** and **C** is 1 and nodes **D**, **E** and **G** is 0. + +## Types Of Binary Trees + +- **Full Binary Tree:** A binary tree where each node has 0 or 2 children is a full binary tree. +``` + A + / \ + B C + / \ + D E +``` +- **Complete Binary Tree:** A binary tree in which all levels are completely filled except the last level is a complete binary tree. Whenever new nodes are inserted, they are inserted from the left side. +``` + A + / \ + / \ + B C + / \ / + D E F +``` +- **Perfect Binary Tree:** A binary tree in which all nodes are completely filled, i.e., each node has two children is called a perfect binary tree. +``` + A + / \ + / \ + B C + / \ / \ + D E F G +``` +- **Skewed Binary Tree:** A binary tree in which each node has either 0 or 1 child is called a skewed binary tree. It is of two types - left skewed binary tree and right skewed binary tree. +``` + A A + \ / + B B + \ / + C C + Right skewed binary tree Left skewed binary tree +``` +- **Balanced Binary Tree:** A binary tree in which the height difference between the left and right subtree is not more than one and the subtrees are also balanced is a balanced binary tree. +``` + A + / \ + B C + / \ + D E +``` + +## Real Life Applications Of Binary Tree + +- **File Systems:** File systems employ binary trees to organize the folders and files, facilitating efficient search and access of files. +- **Decision Trees:** Decision tree, a supervised learning algorithm, utilizes binary trees, with each node representing a decision and its edges showing the possible outcomes. +- **Routing Algorithms:** In routing algorithms, binary trees are used to efficiently transfer data packets from the source to destination through a network of nodes. +- **Searching and sorting Algorithms:** Searching algorithms like binary search and sorting algorithms like heapsort heavily rely on binary trees. + +## Implementation of Binary Tree + +```python +from collections import deque + +class Node: + def __init__(self, data): + self.data = data + self.left = None + self.right = None + +class Binary_tree: + @staticmethod + def insert(root, data): + if root is None: + return Node(data) + q = deque() + q.append(root) + while q: + temp = q.popleft() + if temp.left is None: + temp.left = Node(data) + break + else: + q.append(temp.left) + if temp.right is None: + temp.right = Node(data) + break + else: + q.append(temp.right) + return root + + @staticmethod + def inorder(root): + if not root: + return + b.inorder(root.left) + print(root.data, end=" ") + b.inorder(root.right) + + @staticmethod + def preorder(root): + if not root: + return + print(root.data, end=" ") + b.preorder(root.left) + b.preorder(root.right) + + @staticmethod + def postorder(root): + if not root: + return + b.postorder(root.left) + b.postorder(root.right) + print(root.data, end=" ") + + @staticmethod + def levelorder(root): + if not root: + return + q = deque() + q.append(root) + while q: + temp = q.popleft() + print(temp.data, end=" ") + if temp.left is not None: + q.append(temp.left) + if temp.right is not None: + q.append(temp.right) + + @staticmethod + def delete(root, value): + q = deque() + q.append(root) + while q: + temp = q.popleft() + if temp is value: + temp = None + return + if temp.right: + if temp.right is value: + temp.right = None + return + else: + q.append(temp.right) + if temp.left: + if temp.left is value: + temp.left = None + return + else: + q.append(temp.left) + + @staticmethod + def delete_value(root, value): + if root is None: + return None + if root.left is None and root.right is None: + if root.data == value: + return None + else: + return root + x = None + q = deque() + q.append(root) + temp = None + while q: + temp = q.popleft() + if temp.data == value: + x = temp + if temp.left: + q.append(temp.left) + if temp.right: + q.append(temp.right) + if x: + y = temp.data + x.data = y + b.delete(root, temp) + return root + +b = Binary_tree() +root = None +root = b.insert(root, 10) +root = b.insert(root, 20) +root = b.insert(root, 30) +root = b.insert(root, 40) +root = b.insert(root, 50) +root = b.insert(root, 60) + +print("Preorder traversal:", end=" ") +b.preorder(root) + +print("\nInorder traversal:", end=" ") +b.inorder(root) + +print("\nPostorder traversal:", end=" ") +b.postorder(root) + +print("\nLevel order traversal:", end=" ") +b.levelorder(root) + +root = b.delete_value(root, 20) +print("\nLevel order traversal after deletion:", end=" ") +b.levelorder(root) +``` + +#### OUTPUT + +``` +Preorder traversal: 10 20 40 50 30 60 +Inorder traversal: 40 20 50 10 60 30 +Postorder traversal: 40 50 20 60 30 10 +Level order traversal: 10 20 30 40 50 60 +Level order traversal after deletion: 10 60 30 40 50 +``` diff --git a/contrib/ds-algorithms/deque.md b/contrib/ds-algorithms/deque.md new file mode 100644 index 00000000..2a5a77d2 --- /dev/null +++ b/contrib/ds-algorithms/deque.md @@ -0,0 +1,216 @@ +# Deque in Python + +## Definition +A deque, short for double-ended queue, is an ordered collection of items that allows rapid insertion and deletion at both ends. + +## Syntax +In Python, deques are implemented in the collections module: + +```py +from collections import deque + +# Creating a deque +d = deque(iterable) # Create deque from iterable (optional) +``` + +## Operations +1. **Appending Elements**: + + - append(x): Adds element x to the right end of the deque. + - appendleft(x): Adds element x to the left end of the deque. + + ### Program + ```py + from collections import deque + + # Initialize a deque + d = deque([1, 2, 3, 4, 5]) + print("Initial deque:", d) + + # Append elements + d.append(6) + print("After append(6):", d) + + # Append left + d.appendleft(0) + print("After appendleft(0):", d) + + ``` + ### Output + ```py + Initial deque: deque([1, 2, 3, 4, 5]) + After append(6): deque([1, 2, 3, 4, 5, 6]) + After appendleft(0): deque([0, 1, 2, 3, 4, 5, 6]) + ``` + +2. **Removing Elements**: + + - pop(): Removes and returns the rightmost element. + - popleft(): Removes and returns the leftmost element. + + ### Program + ```py + from collections import deque + + # Initialize a deque + d = deque([1, 2, 3, 4, 5]) + print("Initial deque:", d) + + # Pop from the right end + rightmost = d.pop() + print("Popped from right end:", rightmost) + print("Deque after pop():", d) + + # Pop from the left end + leftmost = d.popleft() + print("Popped from left end:", leftmost) + print("Deque after popleft():", d) + + ``` + + ### Output + ```py + Initial deque: deque([1, 2, 3, 4, 5]) + Popped from right end: 5 + Deque after pop(): deque([1, 2, 3, 4]) + Popped from left end: 1 + Deque after popleft(): deque([2, 3, 4]) + ``` + +3. **Accessing Elements**: + + - deque[index]: Accesses element at index. + + ### Program + ```py + from collections import deque + + # Initialize a deque + d = deque([1, 2, 3, 4, 5]) + print("Initial deque:", d) + + # Accessing elements + print("Element at index 2:", d[2]) + + ``` + + ### Output + ```py + Initial deque: deque([1, 2, 3, 4, 5]) + Element at index 2: 3 + + ``` + +4. **Other Operations**: + + - extend(iterable): Extends deque by appending elements from iterable. + - extendleft(iterable): Extends deque by appending elements from iterable to the left. + - rotate(n): Rotates deque n steps to the right (negative n rotates left). + + ### Program + ```py + from collections import deque + + # Initialize a deque + d = deque([1, 2, 3, 4, 5]) + print("Initial deque:", d) + + # Extend deque + d.extend([6, 7, 8]) + print("After extend([6, 7, 8]):", d) + + # Extend left + d.extendleft([-1, 0]) + print("After extendleft([-1, 0]):", d) + + # Rotate deque + d.rotate(2) + print("After rotate(2):", d) + + # Rotate left + d.rotate(-3) + print("After rotate(-3):", d) + + ``` + + ### Output + ```py + Initial deque: deque([1, 2, 3, 4, 5]) + After extend([6, 7, 8]): deque([1, 2, 3, 4, 5, 6, 7, 8]) + After extendleft([-1, 0]): deque([0, -1, 1, 2, 3, 4, 5, 6, 7, 8]) + After rotate(2): deque([7, 8, 0, -1, 1, 2, 3, 4, 5, 6]) + After rotate(-3): deque([1, 2, 3, 4, 5, 6, 7, 8, 0, -1]) + + ``` + + +## Example + +### 1. Finding Maximum in Sliding Window +```py +from collections import deque + +def max_sliding_window(nums, k): + if not nums: + return [] + + d = deque() + result = [] + + for i, num in enumerate(nums): + # Remove elements from deque that are out of the current window + if d and d[0] <= i - k: + d.popleft() + + # Remove elements from deque smaller than the current element + while d and nums[d[-1]] <= num: + d.pop() + + d.append(i) + + # Add maximum for current window + if i >= k - 1: + result.append(nums[d[0]]) + + return result + +# Example usage: +nums = [1, 3, -1, -3, 5, 3, 6, 7] +k = 3 +print("Maximums in sliding window of size", k, "are:", max_sliding_window(nums, k)) + +``` + +Output +```py +Maximums in sliding window of size 3 are: [3, 3, 5, 5, 6, 7] +``` + + +## Applications +- **Efficient Queues and Stacks**: Deques allow fast O(1) append and pop operations from both ends, +making them ideal for implementing queues and stacks. +- **Sliding Window Maximum/Minimum**: Used in algorithms that require efficient windowed +computations. + + +## Advantages +- Efficiency: O(1) time complexity for append and pop operations from both ends. +- Versatility: Can function both as a queue and as a stack. +- Flexible: Supports rotation and slicing operations efficiently. + + +## Disadvantages +- Memory Usage: Requires more memory compared to simple lists due to overhead in managing linked +nodes. + +## Conclusion +- Deques in Python, provided by the collections.deque module, offer efficient double-ended queue +operations with O(1) time complexity for append and pop operations on both ends. They are versatile +data structures suitable for implementing queues, stacks, and more complex algorithms requiring +efficient manipulation of elements at both ends. + +- While deques excel in scenarios requiring fast append and pop operations from either end, they do +consume more memory compared to simple lists due to their implementation using doubly-linked lists. +However, their flexibility and efficiency make them invaluable for various programming tasks and +algorithmic solutions. \ No newline at end of file diff --git a/contrib/ds-algorithms/dijkstra.md b/contrib/ds-algorithms/dijkstra.md new file mode 100644 index 00000000..cea6da40 --- /dev/null +++ b/contrib/ds-algorithms/dijkstra.md @@ -0,0 +1,90 @@ + +# Dijkstra's Algorithm +Dijkstra's algorithm is a graph algorithm that gives the shortest distance of each node from the given node in a weighted, undirected graph. It operates by continually choosing the closest unvisited node and determining the distance to all its unvisited neighboring nodes. This algorithm is similar to BFS in graphs, with the difference being it gives priority to nodes with shorter distances by using a priority queue(min-heap) instead of a FIFO queue. The data structures required would be a distance list (to store the minimum distance of each node), a priority queue or a set, and we assume the adjacency list will be provided. + +## Working +- We will store the minimum distance of each node in the distance list, which has a length equal to the number of nodes in the graph. Thus, the minimum distance of the 2nd node will be stored in the 2nd index of the distance list. We initialize the list with the maximum number possible, say infinity. + +- We now start the traversal from the starting node given and mark its distance as 0. We push this node to the priority queue along with its minimum distance, which is 0, so the structure pushed will be (0, node), a tuple. + +- Now, with the help of the adjacency list, we will add the neighboring nodes to the priority queue with the distance equal to (edge weight + current node distance), and this should be less than the distance list value. We will also update the distance list in the process. + +- When all the nodes are added, we will select the node with the shortest distance and repeat the process. + +## Dry Run +We will now do a manual simulation using an example graph given. First, (0, a) is pushed to the priority queue (pq). +![Photo 1](images/Dijkstra's_algorithm_photo1.png) + +- **Step1:** The lowest element is popped from the pq, which is (0, a), and all its neighboring nodes are added to the pq while simultaneously checking the distance list. Thus (3, b), (7, c), (1, d) are added to the pq. +![Photo 2](images/Dijkstra's_algorithm_photo2.png) + +- **Step2:** Again, the lowest element is popped from the pq, which is (1, d). It has two neighboring nodes, a and e, from which + (0 + 1, a) will not be added to the pq as dist[a] = 0 is less than 1. +![Photo 3](images/Dijkstra's_algorithm_photo3.png) + +- **Step3:** Now, the lowest element is popped from the pq, which is (3, b). It has two neighboring nodes, a and c, from which + (0 + 1, a) will not be added to the pq. But the new distance to reach c is 5 (3 + 2), which is less than dist[c] = 7. So (5, c) is added to the pq. +![Photo 4](images/Dijkstra's_algorithm_photo4.png) + +- **Step4:** The next smallest element is (5, c). It has neighbors a and e. The new distance to reach a will be 5 + 7 = 12, which is more than dist[a], so it will not be considered. Similarly, the new distance for e is 5 + 3 = 8, which again will not be considered. So, no new tuple has been added to the pq. +![Photo 5](images/Dijkstra's_algorithm_photo5.png) + +- **Step5:** Similarly, both the elements of the pq will be popped one by one without any new addition. +![Photo 6](images/Dijkstra's_algorithm_photo6.png) +![Photo 7](images/Dijkstra's_algorithm_photo7.png) + +- The distance list we get at the end will be our answer. +- `Output` `dist=[1, 3, 7, 1, 6]` + +## Python Code +```python +import heapq + +def dijkstra(graph, start): + # Create a priority queue + pq = [] + heapq.heappush(pq, (0, start)) + + # Create a dictionary to store distances to each node + dist = {node: float('inf') for node in graph} + dist[start] = 0 + + while pq: + # Get the node with the smallest distance + current_distance, current_node = heapq.heappop(pq) + + # If the current distance is greater than the recorded distance, skip it + if current_distance > dist[current_node]: + continue + + # Update the distances to the neighboring nodes + for neighbor, weight in graph[current_node].items(): + distance = current_distance + weight + # Only consider this new path if it's better + if distance < dist[neighbor]: + dist[neighbor] = distance + heapq.heappush(pq, (distance, neighbor)) + + return dist + +# Example usage: +graph = { + 'A': {'B': 1, 'C': 4}, + 'B': {'A': 1, 'C': 2, 'D': 5}, + 'C': {'A': 4, 'B': 2, 'D': 1}, + 'D': {'B': 5, 'C': 1} +} + +start_node = 'A' +dist = dijkstra(graph, start_node) +print(dist) +``` + +## Complexity Analysis + +- **Time Complexity**: \(O((V + E) log V)\) +- **Space Complexity**: \(O(V + E)\) + + + + diff --git a/contrib/ds-algorithms/divide-and-conquer-algorithm.md b/contrib/ds-algorithms/divide-and-conquer-algorithm.md new file mode 100644 index 00000000..b5a356ea --- /dev/null +++ b/contrib/ds-algorithms/divide-and-conquer-algorithm.md @@ -0,0 +1,54 @@ +# Divide and Conquer Algorithms + +Divide and Conquer is a paradigm for solving problems that involves breaking a problem into smaller sub-problems, solving the sub-problems recursively, and then combining their solutions to solve the original problem. + +## Merge Sort + +Merge Sort is a popular sorting algorithm that follows the divide and conquer strategy. It divides the input array into two halves, recursively sorts the halves, and then merges them. + +**Algorithm Overview:** +- **Divide:** Divide the unsorted list into two sublists of about half the size. +- **Conquer:** Recursively sort each sublist. +- **Combine:** Merge the sorted sublists back into one sorted list. + +```python +def merge_sort(arr): + if len(arr) > 1: + mid = len(arr) // 2 + left_half = arr[:mid] + right_half = arr[mid:] + + merge_sort(left_half) + merge_sort(right_half) + + i = j = k = 0 + + while i < len(left_half) and j < len(right_half): + if left_half[i] < right_half[j]: + arr[k] = left_half[i] + i += 1 + else: + arr[k] = right_half[j] + j += 1 + k += 1 + + while i < len(left_half): + arr[k] = left_half[i] + i += 1 + k += 1 + + while j < len(right_half): + arr[k] = right_half[j] + j += 1 + k += 1 + +arr = [12, 11, 13, 5, 6, 7] +merge_sort(arr) +print("Sorted array:", arr) +``` + +## Complexity Analysis +- **Time Complexity:** O(n log n) in all cases +- **Space Complexity:** O(n) additional space for the merge operation + +--- diff --git a/contrib/ds-algorithms/dynamic-programming.md b/contrib/ds-algorithms/dynamic-programming.md new file mode 100644 index 00000000..f4958689 --- /dev/null +++ b/contrib/ds-algorithms/dynamic-programming.md @@ -0,0 +1,453 @@ +# Dynamic Programming + +Dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems and solving each subproblem only once. It stores the solutions to subproblems to avoid redundant computations, making it particularly useful for optimization problems where the solution can be obtained by combining solutions to smaller subproblems. + +## Real-Life Examples of Dynamic Programming +- **Fibonacci Sequence:** Computing the nth Fibonacci number efficiently. +- **Shortest Path:** Finding the shortest path in a graph from a source to a destination. +- **String Edit Distance:** Calculating the minimum number of operations required to transform one string into another. +- **Knapsack Problem:** Maximizing the value of items in a knapsack without exceeding its weight capacity. + +# Some Common Dynamic Programming Techniques + +# 1. Fibonacci Sequence + +The Fibonacci sequence is a classic example used to illustrate dynamic programming. It is a series of numbers where each number is the sum of the two preceding ones, usually starting with 0 and 1. + +**Algorithm Overview:** +- **Base Cases:** The first two numbers in the Fibonacci sequence are defined as 0 and 1. +- **Memoization:** Store the results of previously computed Fibonacci numbers to avoid redundant computations. +- **Recurrence Relation:** Compute each Fibonacci number by adding the two preceding numbers. + +## Fibonacci Sequence Code in Python (Top-Down Approach with Memoization) + +```python +def fibonacci(n, memo={}): + if n in memo: + return memo[n] + if n <= 1: + return n + memo[n] = fibonacci(n-1, memo) + fibonacci(n-2, memo) + return memo[n] + +n = 10 +print(f"The {n}th Fibonacci number is: {fibonacci(n)}.") +``` + +## Fibonacci Sequence Code in Python (Bottom-Up Approach) + +```python +def fibonacci(n): + fib = [0, 1] + for i in range(2, n + 1): + fib.append(fib[i - 1] + fib[i - 2]) + return fib[n] + +n = 10 +print(f"The {n}th Fibonacci number is: {fibonacci(n)}.") +``` + +## Complexity Analysis +- **Time Complexity**: O(n) for both approaches +- **Space Complexity**: O(n) for the top-down approach (due to memoization), O(1) for the bottom-up approach + +# 2. Longest Common Subsequence + +The longest common subsequence (LCS) problem is to find the longest subsequence common to two sequences. A subsequence is a sequence that appears in the same relative order but not necessarily contiguous. + +**Algorithm Overview:** +- **Base Cases:** If one of the sequences is empty, the LCS is empty. +- **Memoization:** Store the results of previously computed LCS lengths to avoid redundant computations. +- **Recurrence Relation:** Compute the LCS length by comparing characters of the sequences and making decisions based on whether they match. + +## Longest Common Subsequence Code in Python (Top-Down Approach with Memoization) + +```python +def longest_common_subsequence(X, Y, m, n, memo={}): + if (m, n) in memo: + return memo[(m, n)] + if m == 0 or n == 0: + return 0 + if X[m - 1] == Y[n - 1]: + memo[(m, n)] = 1 + longest_common_subsequence(X, Y, m - 1, n - 1, memo) + else: + memo[(m, n)] = max(longest_common_subsequence(X, Y, m, n - 1, memo), + longest_common_subsequence(X, Y, m - 1, n, memo)) + return memo[(m, n)] + +X = "AGGTAB" +Y = "GXTXAYB" +print("Length of Longest Common Subsequence:", longest_common_subsequence(X, Y, len(X), len(Y))) +``` + +## Longest Common Subsequence Code in Python (Bottom-Up Approach) + +```python + +def longestCommonSubsequence(X, Y, m, n): + L = [[None]*(n+1) for i in range(m+1)] + for i in range(m+1): + for j in range(n+1): + if i == 0 or j == 0: + L[i][j] = 0 + elif X[i-1] == Y[j-1]: + L[i][j] = L[i-1][j-1]+1 + else: + L[i][j] = max(L[i-1][j], L[i][j-1]) + return L[m][n] + + +S1 = "AGGTAB" +S2 = "GXTXAYB" +m = len(S1) +n = len(S2) +print("Length of LCS is", longestCommonSubsequence(S1, S2, m, n)) +``` + +## Complexity Analysis +- **Time Complexity**: O(m * n) for both approaches, where m and n are the lengths of the input sequences +- **Space Complexity**: O(m * n) for the memoization table + +# 3. 0-1 Knapsack Problem + +The 0-1 knapsack problem is a classic optimization problem where the goal is to maximize the total value of items selected while keeping the total weight within a specified limit. + +**Algorithm Overview:** +- **Base Cases:** If the capacity of the knapsack is 0 or there are no items to select, the total value is 0. +- **Memoization:** Store the results of previously computed subproblems to avoid redundant computations. +- **Recurrence Relation:** Compute the maximum value by considering whether to include the current item or not. + +## 0-1 Knapsack Problem Code in Python (Top-Down Approach with Memoization) + +```python +def knapsack(weights, values, capacity, n, memo={}): + if (capacity, n) in memo: + return memo[(capacity, n)] + if n == 0 or capacity == 0: + return 0 + if weights[n - 1] > capacity: + memo[(capacity, n)] = knapsack(weights, values, capacity, n - 1, memo) + else: + memo[(capacity, n)] = max(values[n - 1] + knapsack(weights, values, capacity - weights[n - 1], n - 1, memo), + knapsack(weights, values, capacity, n - 1, memo)) + return memo[(capacity, n)] + +weights = [10, 20, 30] +values = [60, 100, 120] +capacity = 50 +n = len(weights) +print("Maximum value that can be obtained:", knapsack(weights, values, capacity, n)) +``` + +## 0-1 Knapsack Problem Code in Python (Bottom-up Approach) + +```python +def knapSack(capacity, weights, values, n): + K = [[0 for x in range(capacity + 1)] for x in range(n + 1)] + for i in range(n + 1): + for w in range(capacity + 1): + if i == 0 or w == 0: + K[i][w] = 0 + elif weights[i-1] <= w: + K[i][w] = max(values[i-1] + + K[i-1][w-weights[i-1]], + K[i-1][w]) + else: + K[i][w] = K[i-1][w] + + return K[n][capacity] + +values = [60, 100, 120] +weights = [10, 20, 30] +capacity = 50 +n = len(weights) +print(knapSack(capacity, weights, values, n)) +``` + +## Complexity Analysis +- **Time Complexity**: O(n * W) for both approaches, where n is the number of items and W is the capacity of the knapsack +- **Space Complexity**: O(n * W) for the memoization table + +# 4. Longest Increasing Subsequence + +The Longest Increasing Subsequence (LIS) is a task is to find the longest subsequence that is strictly increasing, meaning each element in the subsequence is greater than the one before it. This subsequence must maintain the order of elements as they appear in the original sequence but does not need to be contiguous. The goal is to identify the subsequence with the maximum possible length. + +**Algorithm Overview:** +- **Base cases:** If the sequence is empty, the LIS length is 0. +- **Memoization:** Store the results of previously computed subproblems to avoid redundant computations. +- **Recurrence relation:** Compute the LIS length by comparing characters of the sequences and making decisions based on their values. + +## Longest Increasing Subsequence Code in Python (Top-Down Approach using Memoization) + +```python +import sys + +def f(idx, prev_idx, n, a, dp): + if (idx == n): + return 0 + + if (dp[idx][prev_idx + 1] != -1): + return dp[idx][prev_idx + 1] + + notTake = 0 + f(idx + 1, prev_idx, n, a, dp) + take = -sys.maxsize - 1 + if (prev_idx == -1 or a[idx] > a[prev_idx]): + take = 1 + f(idx + 1, idx, n, a, dp) + + dp[idx][prev_idx + 1] = max(take, notTake) + return dp[idx][prev_idx + 1] + +def longestSubsequence(n, a): + + dp = [[-1 for i in range(n + 1)]for j in range(n + 1)] + return f(0, -1, n, a, dp) + +a = [3, 10, 2, 1, 20] +n = len(a) + +print("Length of lis is", longestSubsequence(n, a)) + +``` + +## Longest Increasing Subsequence Code in Python (Bottom-Up Approach) + +```python +def lis(arr): + n = len(arr) + lis = [1]*n + + for i in range(1, n): + for j in range(0, i): + if arr[i] > arr[j] and lis[i] < lis[j] + 1: + lis[i] = lis[j]+1 + + maximum = 0 + for i in range(n): + maximum = max(maximum, lis[i]) + + return maximum + +arr = [10, 22, 9, 33, 21, 50, 41, 60] +print("Length of lis is", lis(arr)) +``` + +## Complexity Analysis +- **Time Complexity**: O(n * n) for both approaches, where n is the length of the array. +- **Space Complexity**: O(n * n) for the memoization table in Top-Down Approach, O(n) in Bottom-Up Approach. + +# 5. String Edit Distance + +The String Edit Distance algorithm calculates the minimum number of operations (insertions, deletions, or substitutions) required to convert one string into another. + +**Algorithm Overview:** +- **Base Cases:** If one string is empty, the edit distance is the length of the other string. +- **Memoization:** Store the results of previously computed edit distances to avoid redundant computations. +- **Recurrence Relation:** Compute the edit distance by considering insertion, deletion, and substitution operations. + +## String Edit Distance Code in Python (Top-Down Approach with Memoization) +```python +def edit_distance(str1, str2, memo={}): + m, n = len(str1), len(str2) + if (m, n) in memo: + return memo[(m, n)] + if m == 0: + return n + if n == 0: + return m + if str1[m - 1] == str2[n - 1]: + memo[(m, n)] = edit_distance(str1[:m-1], str2[:n-1], memo) + else: + memo[(m, n)] = 1 + min(edit_distance(str1, str2[:n-1], memo), # Insert + edit_distance(str1[:m-1], str2, memo), # Remove + edit_distance(str1[:m-1], str2[:n-1], memo)) # Replace + return memo[(m, n)] + +str1 = "sunday" +str2 = "saturday" +print(f"Edit Distance between '{str1}' and '{str2}' is {edit_distance(str1, str2)}.") +``` + +#### Output +``` +Edit Distance between 'sunday' and 'saturday' is 3. +``` + +## String Edit Distance Code in Python (Bottom-Up Approach) +```python +def edit_distance(str1, str2): + m, n = len(str1), len(str2) + dp = [[0 for _ in range(n + 1)] for _ in range(m + 1)] + + for i in range(m + 1): + for j in range(n + 1): + if i == 0: + dp[i][j] = j + elif j == 0: + dp[i][j] = i + elif str1[i - 1] == str2[j - 1]: + dp[i][j] = dp[i - 1][j - 1] + else: + dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]) + + return dp[m][n] + +str1 = "sunday" +str2 = "saturday" +print(f"Edit Distance between '{str1}' and '{str2}' is {edit_distance(str1, str2)}.") +``` + +#### Output +``` +Edit Distance between 'sunday' and 'saturday' is 3. +``` + +## **Complexity Analysis:** +- **Time Complexity:** O(m * n) where m and n are the lengths of string 1 and string 2 respectively +- **Space Complexity:** O(m * n) for both top-down and bottom-up approaches + + +# 6. Matrix Chain Multiplication + +The Matrix Chain Multiplication finds the optimal way to multiply a sequence of matrices to minimize the number of scalar multiplications. + +**Algorithm Overview:** +- **Base Cases:** The cost of multiplying one matrix is zero. +- **Memoization:** Store the results of previously computed matrix chain orders to avoid redundant computations. +- **Recurrence Relation:** Compute the optimal cost by splitting the product at different points and choosing the minimum cost. + +## Matrix Chain Multiplication Code in Python (Top-Down Approach with Memoization) +```python +def matrix_chain_order(p, memo={}): + n = len(p) - 1 + def compute_cost(i, j): + if (i, j) in memo: + return memo[(i, j)] + if i == j: + return 0 + memo[(i, j)] = float('inf') + for k in range(i, j): + q = compute_cost(i, k) + compute_cost(k + 1, j) + p[i - 1] * p[k] * p[j] + if q < memo[(i, j)]: + memo[(i, j)] = q + return memo[(i, j)] + return compute_cost(1, n) + +p = [1, 2, 3, 4] +print(f"Minimum number of multiplications is {matrix_chain_order(p)}.") +``` + +#### Output +``` +Minimum number of multiplications is 18. +``` + + +## Matrix Chain Multiplication Code in Python (Bottom-Up Approach) +```python +def matrix_chain_order(p): + n = len(p) - 1 + m = [[0 for _ in range(n)] for _ in range(n)] + + for L in range(2, n + 1): + for i in range(n - L + 1): + j = i + L - 1 + m[i][j] = float('inf') + for k in range(i, j): + q = m[i][k] + m[k + 1][j] + p[i] * p[k + 1] * p[j + 1] + if q < m[i][j]: + m[i][j] = q + + return m[0][n - 1] + +p = [1, 2, 3, 4] +print(f"Minimum number of multiplications is {matrix_chain_order(p)}.") +``` + +#### Output +``` +Minimum number of multiplications is 18. +``` + +## **Complexity Analysis:** +- **Time Complexity:** O(n^3) where n is the number of matrices in the chain. For an `array p` of dimensions representing the matrices such that the `i-th matrix` has dimensions `p[i-1] x p[i]`, n is `len(p) - 1` +- **Space Complexity:** O(n^2) for both top-down and bottom-up approaches + +# 7. Optimal Binary Search Tree + +The Matrix Chain Multiplication finds the optimal way to multiply a sequence of matrices to minimize the number of scalar multiplications. + +**Algorithm Overview:** +- **Base Cases:** The cost of a single key is its frequency. +- **Memoization:** Store the results of previously computed subproblems to avoid redundant computations. +- **Recurrence Relation:** Compute the optimal cost by trying each key as the root and choosing the minimum cost. + +## Optimal Binary Search Tree Code in Python (Top-Down Approach with Memoization) + +```python +def optimal_bst(keys, freq, memo={}): + n = len(keys) + def compute_cost(i, j): + if (i, j) in memo: + return memo[(i, j)] + if i > j: + return 0 + if i == j: + return freq[i] + memo[(i, j)] = float('inf') + total_freq = sum(freq[i:j+1]) + for r in range(i, j + 1): + cost = (compute_cost(i, r - 1) + + compute_cost(r + 1, j) + + total_freq) + if cost < memo[(i, j)]: + memo[(i, j)] = cost + return memo[(i, j)] + return compute_cost(0, n - 1) + +keys = [10, 12, 20] +freq = [34, 8, 50] +print(f"Cost of Optimal BST is {optimal_bst(keys, freq)}.") +``` + +#### Output +``` +Cost of Optimal BST is 142. +``` + +## Optimal Binary Search Tree Code in Python (Bottom-Up Approach) + +```python +def optimal_bst(keys, freq): + n = len(keys) + cost = [[0 for x in range(n)] for y in range(n)] + + for i in range(n): + cost[i][i] = freq[i] + + for L in range(2, n + 1): + for i in range(n - L + 1): + j = i + L - 1 + cost[i][j] = float('inf') + total_freq = sum(freq[i:j+1]) + for r in range(i, j + 1): + c = (cost[i][r - 1] if r > i else 0) + \ + (cost[r + 1][j] if r < j else 0) + \ + total_freq + if c < cost[i][j]: + cost[i][j] = c + + return cost[0][n - 1] + +keys = [10, 12, 20] +freq = [34, 8, 50] +print(f"Cost of Optimal BST is {optimal_bst(keys, freq)}.") +``` + +#### Output +``` +Cost of Optimal BST is 142. +``` + +### Complexity Analysis +- **Time Complexity**: O(n^3) where n is the number of keys in the binary search tree. +- **Space Complexity**: O(n^2) for both top-down and bottom-up approaches diff --git a/contrib/ds-algorithms/graph.md b/contrib/ds-algorithms/graph.md new file mode 100644 index 00000000..517c90d1 --- /dev/null +++ b/contrib/ds-algorithms/graph.md @@ -0,0 +1,219 @@ +# Graph Data Stucture + +Graph is a non-linear data structure consisting of vertices and edges. It is a powerful tool for representing and analyzing complex relationships between objects or entities. + +## Components of a Graph + +1. **Vertices:** Vertices are the fundamental units of the graph. Sometimes, vertices are also known as vertex or nodes. Every node/vertex can be labeled or unlabeled. + +2. **Edges:** Edges are drawn or used to connect two nodes of the graph. It can be ordered pair of nodes in a directed graph. Edges can connect any two nodes in any possible way. There are no rules. very edge can be labelled/unlabelled. + +## Basic Operations on Graphs +- Insertion of Nodes/Edges in the graph +- Deletion of Nodes/Edges in the graph +- Searching on Graphs +- Traversal of Graphs + +## Types of Graph + + +**1. Undirected Graph:** In an undirected graph, edges have no direction, and they represent symmetric relationships between nodes. If there is an edge between node A and node B, you can travel from A to B and from B to A. + +**2. Directed Graph (Digraph):** In a directed graph, edges have a direction, indicating a one-way relationship between nodes. If there is an edge from node A to node B, you can travel from A to B but not necessarily from B to A. + +**3. Weighted Graph:** In a weighted graph, edges have associated weights or costs. These weights can represent various attributes such as distance, cost, or capacity. Weighted graphs are commonly used in applications like route planning or network optimization. + +**4. Cyclic Graph:** A cyclic graph contains at least one cycle, which is a path that starts and ends at the same node. In other words, you can traverse the graph and return to a previously visited node by following the edges. + +**5. Acyclic Graph:** An acyclic graph, as the name suggests, does not contain any cycles. This type of graph is often used in scenarios where a cycle would be nonsensical or undesirable, such as representing dependencies between tasks or events. + +**6. Tree:** A tree is a special type of acyclic graph where each node has a unique parent except for the root node, which has no parent. Trees have a hierarchical structure and are frequently used in data structures like binary trees or decision trees. + +## Representation of Graphs +There are two ways to store a graph: + +1. **Adjacency Matrix:** +In this method, the graph is stored in the form of the 2D matrix where rows and columns denote vertices. Each entry in the matrix represents the weight of the edge between those vertices. + +```python +def create_adjacency_matrix(graph): + num_vertices = len(graph) + + adj_matrix = [[0] * num_vertices for _ in range(num_vertices)] + + for i in range(num_vertices): + for j in range(num_vertices): + if graph[i][j] == 1: + adj_matrix[i][j] = 1 + adj_matrix[j][i] = 1 + + return adj_matrix + + +graph = [ + [0, 1, 0, 0], + [1, 0, 1, 0], + [0, 1, 0, 1], + [0, 0, 1, 0] +] + +adj_matrix = create_adjacency_matrix(graph) + +for row in adj_matrix: + print(' '.join(map(str, row))) + +``` + +2. **Adjacency List:** +In this method, the graph is represented as a collection of linked lists. There is an array of pointer which points to the edges connected to that vertex. + +```python +def create_adjacency_list(edges, num_vertices): + adj_list = [[] for _ in range(num_vertices)] + + for u, v in edges: + adj_list[u].append(v) + adj_list[v].append(u) + + return adj_list + +if __name__ == "__main__": + num_vertices = 4 + edges = [(0, 1), (0, 2), (1, 2), (2, 3), (3, 1)] + + adj_list = create_adjacency_list(edges, num_vertices) + + for i in range(num_vertices): + print(f"{i} -> {' '.join(map(str, adj_list[i]))}") +``` +`Output` +`0 -> 1 2` +`1 -> 0 2 3` +`2 -> 0 1 3` +`3 -> 2 1 ` + + + +# Traversal Techniques + +## Breadth First Search (BFS) +- It is a graph traversal algorithm that explores all the vertices in a graph at the current depth before moving on to the vertices at the next depth level. +- It starts at a specified vertex and visits all its neighbors before moving on to the next level of neighbors. +BFS is commonly used in algorithms for pathfinding, connected components, and shortest path problems in graphs. + +**Steps of BFS algorithms** + + +- **Step 1:** Initially queue and visited arrays are empty. +- **Step 2:** Push node 0 into queue and mark it visited. +- **Step 3:** Remove node 0 from the front of queue and visit the unvisited neighbours and push them into queue. +- **Step 4:** Remove node 1 from the front of queue and visit the unvisited neighbours and push them into queue. +- **Step 5:** Remove node 2 from the front of queue and visit the unvisited neighbours and push them into queue. +- **Step 6:** Remove node 3 from the front of queue and visit the unvisited neighbours and push them into queue. +- **Step 7:** Remove node 4 from the front of queue and visit the unvisited neighbours and push them into queue. + +```python + +from collections import deque + +def bfs(adjList, startNode, visited): + q = deque() + + visited[startNode] = True + q.append(startNode) + + while q: + currentNode = q.popleft() + print(currentNode, end=" ") + + for neighbor in adjList[currentNode]: + if not visited[neighbor]: + visited[neighbor] = True + q.append(neighbor) + +def addEdge(adjList, u, v): + adjList[u].append(v) + +def main(): + vertices = 5 + + adjList = [[] for _ in range(vertices)] + + addEdge(adjList, 0, 1) + addEdge(adjList, 0, 2) + addEdge(adjList, 1, 3) + addEdge(adjList, 1, 4) + addEdge(adjList, 2, 4) + + visited = [False] * vertices + + print("Breadth First Traversal", end=" ") + bfs(adjList, 0, visited) + +if __name__ == "__main__": #Output : Breadth First Traversal 0 1 2 3 4 + main() + +``` + +- **Time Complexity:** `O(V+E)`, where V is the number of nodes and E is the number of edges. +- **Auxiliary Space:** `O(V)` + + +## Depth-first search + +Depth-first search is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible along each branch before backtracking. + +**Steps of DFS algorithms** + +- **Step 1:** Initially stack and visited arrays are empty. +- **Step 2:** Visit 0 and put its adjacent nodes which are not visited yet into the stack. +- **Step 3:** Now, Node 1 at the top of the stack, so visit node 1 and pop it from the stack and put all of its adjacent nodes which are not visited in the stack. +- **Step 4:** Now, Node 2 at the top of the stack, so visit node 2 and pop it from the stack and put all of its adjacent nodes which are not visited (i.e, 3, 4) in the stack. +- **Step 5:** Now, Node 4 at the top of the stack, so visit node 4 and pop it from the stack and put all of its adjacent nodes which are not visited in the stack. +- **Step 6:** Now, Node 3 at the top of the stack, so visit node 3 and pop it from the stack and put all of its adjacent nodes which are not visited in the stack. + + + +```python +from collections import defaultdict + +class Graph: + + def __init__(self): + self.graph = defaultdict(list) + + def addEdge(self, u, v): + self.graph[u].append(v) + + def DFSUtil(self, v, visited): + visited.add(v) + print(v, end=' ') + + for neighbour in self.graph[v]: + if neighbour not in visited: + self.DFSUtil(neighbour, visited) + + def DFS(self, v): + visited = set() + self.DFSUtil(v, visited) + +if __name__ == "__main__": + g = Graph() + g.addEdge(0, 1) + g.addEdge(0, 2) + g.addEdge(1, 2) + g.addEdge(2, 0) + g.addEdge(2, 3) + g.addEdge(3, 3) + + print("Depth First Traversal (starting from vertex 2): ",g.DFS(2)) + +``` +`Output: Depth First Traversal (starting from vertex 2): 2 0 1 3 ` + +- **Time complexity:** `O(V + E)`, where V is the number of vertices and E is the number of edges in the graph. +- **Auxiliary Space:** `O(V + E)`, since an extra visited array of size V is required, And stack size for iterative call to DFS function. + +
+ + diff --git a/contrib/ds-algorithms/greedy-algorithms.md b/contrib/ds-algorithms/greedy-algorithms.md new file mode 100644 index 00000000..c79ee991 --- /dev/null +++ b/contrib/ds-algorithms/greedy-algorithms.md @@ -0,0 +1,135 @@ +# Greedy Algorithms + +Greedy algorithms are simple, intuitive algorithms that make a sequence of choices at each step with the hope of finding a global optimum. They are called "greedy" because at each step, they choose the most advantageous option without considering the future consequences. Despite their simplicity, greedy algorithms are powerful tools for solving optimization problems, especially when the problem exhibits the greedy-choice property. + +## Real-Life Examples of Greedy Algorithms +- **Coin Change:** Finding the minimum number of coins to make a certain amount of change. +- **Job Scheduling:** Assigning tasks to machines to minimize completion time. +- **Huffman Coding:** Constructing an optimal prefix-free binary code for data compression. +- **Fractional Knapsack:** Selecting items to maximize the value within a weight limit. + +# Some Common Greedy Algorithms + +# 1. Coin Change Problem + +The coin change problem is a classic example of a greedy algorithm. Given a set of coin denominations and a target amount, the objective is to find the minimum number of coins required to make up that amount. + +**Algorithm Overview:** +- **Greedy Strategy:** At each step, the algorithm selects the largest denomination coin that is less than or equal to the remaining amount. +- **Repeat Until Amount is Zero:** The process continues until the remaining amount becomes zero. + +## Coin Change Code in Python + +```python +def coin_change(coins, amount): + coins.sort(reverse=True) + num_coins = 0 + for coin in coins: + num_coins += amount // coin + amount %= coin + if amount == 0: + return num_coins + else: + return -1 + +coins = [1, 5, 10, 25] +amount = 63 +result = coin_change(coins, amount) +if result != -1: + print(f"Minimum number of coins required: {result}.") +else: + print("It is not possible to make the amount with the given denominations.") +``` + +## Complexity Analysis +- **Time Complexity**: O(n log n) for sorting (if not pre-sorted), O(n) for iteration +- **Space Complexity**: O(1) + +
+
+
+ +# 2. Activity Selection Problem + +The activity selection problem involves selecting the maximum number of mutually compatible activities that can be performed by a single person or machine, assuming that a person can only work on one activity at a time. + +**Algorithm Overview:** +- **Greedy Strategy:** Sort the activities based on their finish times. +- **Selecting Activities:** Iterate through the sorted activities, selecting each activity if it doesn't conflict with the previously selected ones. + +## Activity Selection Code in Python + +```python +def activity_selection(start, finish): + n = len(start) + activities = [] + i = 0 + activities.append(i) + for j in range(1, n): + if start[j] >= finish[i]: + activities.append(j) + i = j + return activities + +start = [1, 3, 0, 5, 8, 5] +finish = [2, 4, 6, 7, 9, 9] +selected_activities = activity_selection(start, finish) +print("Selected activities:", selected_activities) +``` + +## Complexity Analysis +- **Time Complexity**: O(n log n) for sorting (if not pre-sorted), O(n) for iteration +- **Space Complexity**: O(1) + +
+
+
+ +# 3. Huffman Coding + +Huffman coding is a method of lossless data compression that efficiently represents characters or symbols in a file. It uses variable-length codes to represent characters, with shorter codes assigned to more frequent characters. + +**Algorithm Overview:** +- **Frequency Analysis:** Determine the frequency of each character in the input data. +- **Building the Huffman Tree:** Construct a binary tree where each leaf node represents a character and the path to the leaf node determines its code. +- **Assigning Codes:** Traverse the Huffman tree to assign codes to each character, with shorter codes for more frequent characters. + +## Huffman Coding Code in Python + +```python +from heapq import heappush, heappop, heapify +from collections import defaultdict + +def huffman_coding(data): + frequency = defaultdict(int) + for char in data: + frequency[char] += 1 + + heap = [[weight, [symbol, ""]] for symbol, weight in frequency.items()] + heapify(heap) + + while len(heap) > 1: + lo = heappop(heap) + hi = heappop(heap) + for pair in lo[1:]: + pair[1] = '0' + pair[1] + for pair in hi[1:]: + pair[1] = '1' + pair[1] + heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:]) + + return sorted(heappop(heap)[1:], key=lambda p: (len(p[-1]), p)) + +data = "Huffman coding is a greedy algorithm" +encoded_data = huffman_coding(data) +print("Huffman Codes:") +for symbol, code in encoded_data: + print(f"{symbol}: {code}") +``` + +## Complexity Analysis +- **Time Complexity**: O(n log n) for heap operations, where n is the number of unique characters +- **Space Complexity**: O(n) for the heap + +
+
+
diff --git a/contrib/ds-algorithms/hash-tables.md b/contrib/ds-algorithms/hash-tables.md new file mode 100644 index 00000000..f03b7c5e --- /dev/null +++ b/contrib/ds-algorithms/hash-tables.md @@ -0,0 +1,212 @@ +# Data Structures: Hash Tables, Hash Sets, and Hash Maps + +## Table of Contents +- [Introduction](#introduction) +- [Hash Tables](#hash-tables) + - [Overview](#overview) + - [Operations](#operations) +- [Hash Sets](#hash-sets) + - [Overview](#overview-1) + - [Operations](#operations-1) +- [Hash Maps](#hash-maps) + - [Overview](#overview-2) + - [Operations](#operations-2) +- [Conclusion](#conclusion) + +## Introduction +This document provides an overview of three fundamental data structures in computer science: hash tables, hash sets, and hash maps. These structures are widely used for efficient data storage and retrieval operations. + +## Hash Tables + +### Overview +A **hash table** is a data structure that stores key-value pairs. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. + +### Operations +1. **Insertion**: Add a new key-value pair to the hash table. +2. **Deletion**: Remove a key-value pair from the hash table. +3. **Search**: Find the value associated with a given key. +4. **Update**: Modify the value associated with a given key. + +**Example Code (Python):** +```python +class Node: + def __init__(self, key, value): + self.key = key + self.value = value + self.next = None + + +class HashTable: + def __init__(self, capacity): + self.capacity = capacity + self.size = 0 + self.table = [None] * capacity + + def _hash(self, key): + return hash(key) % self.capacity + + def insert(self, key, value): + index = self._hash(key) + + if self.table[index] is None: + self.table[index] = Node(key, value) + self.size += 1 + else: + current = self.table[index] + while current: + if current.key == key: + current.value = value + return + current = current.next + new_node = Node(key, value) + new_node.next = self.table[index] + self.table[index] = new_node + self.size += 1 + + def search(self, key): + index = self._hash(key) + + current = self.table[index] + while current: + if current.key == key: + return current.value + current = current.next + + raise KeyError(key) + + def remove(self, key): + index = self._hash(key) + + previous = None + current = self.table[index] + + while current: + if current.key == key: + if previous: + previous.next = current.next + else: + self.table[index] = current.next + self.size -= 1 + return + previous = current + current = current.next + + raise KeyError(key) + + def __len__(self): + return self.size + + def __contains__(self, key): + try: + self.search(key) + return True + except KeyError: + return False + + +# Driver code +if __name__ == '__main__': + + ht = HashTable(5) + + ht.insert("apple", 3) + ht.insert("banana", 2) + ht.insert("cherry", 5) + + + print("apple" in ht) + print("durian" in ht) + + print(ht.search("banana")) + + ht.insert("banana", 4) + print(ht.search("banana")) # 4 + + ht.remove("apple") + + print(len(ht)) # 3 +``` + +# Insert elements +hash_table["key1"] = "value1" +hash_table["key2"] = "value2" + +# Search for an element +value = hash_table.get("key1") + +# Delete an element +del hash_table["key2"] + +# Update an element +hash_table["key1"] = "new_value1" + +## Hash Sets + +### Overview +A **hash set** is a collection of unique elements. It is implemented using a hash table where each bucket can store only one element. + +### Operations +1. **Insertion**: Add a new element to the set. +2. **Deletion**: Remove an element from the set. +3. **Search**: Check if an element exists in the set. +4. **Union**: Combine two sets to form a new set with elements from both. +5. **Intersection**: Find common elements between two sets. +6. **Difference**: Find elements present in one set but not in the other. + +**Example Code (Python):** +```python +# Create a hash set +hash_set = set() + +# Insert elements +hash_set.add("element1") +hash_set.add("element2") + +# Search for an element +exists = "element1" in hash_set + +# Delete an element +hash_set.remove("element2") + +# Union of sets +another_set = {"element3", "element4"} +union_set = hash_set.union(another_set) + +# Intersection of sets +intersection_set = hash_set.intersection(another_set) + +# Difference of sets +difference_set = hash_set.difference(another_set) +``` +## Hash Maps + +### Overview +A **hash map** is similar to a hash table but often provides additional functionalities and more user-friendly interfaces for developers. It is a collection of key-value pairs where each key is unique. + +### Operations +1. **Insertion**: Add a new key-value pair to the hash map. +2. **Deletion**: Remove a key-value pair from the hash map. +3. **Search**: Retrieve the value associated with a given key. +4. **Update**: Change the value associated with a given key. + +**Example Code (Python):** +```python +# Create a hash map +hash_map = {} + +# Insert elements +hash_map["key1"] = "value1" +hash_map["key2"] = "value2" + +# Search for an element +value = hash_map.get("key1") + +# Delete an element +del hash_map["key2"] + +# Update an element +hash_map["key1"] = "new_value1" + +``` +## Conclusion +Hash tables, hash sets, and hash maps are powerful data structures that provide efficient means of storing and retrieving data. Understanding these structures and their operations is crucial for developing optimized algorithms and applications. \ No newline at end of file diff --git a/contrib/ds-algorithms/hashing-chaining.md b/contrib/ds-algorithms/hashing-chaining.md new file mode 100644 index 00000000..34086b5e --- /dev/null +++ b/contrib/ds-algorithms/hashing-chaining.md @@ -0,0 +1,153 @@ +# Hashing with Chaining + +In Data Structures and Algorithms, hashing is used to map data of arbitrary size to fixed-size values. A common approach to handle collisions in hashing is **chaining**. In chaining, each slot of the hash table contains a linked list, and all elements that hash to the same slot are stored in that list. + +## Points to be Remembered + +- **Hash Function**: A function that converts an input (or 'key') into an index in a hash table. +- **Collision**: When two keys hash to the same index. +- **Chaining**: A method to resolve collisions by maintaining a linked list for each hash table slot. + +## Real Life Examples of Hashing with Chaining + +- **Phone Directory**: Contacts are stored in a hash table where the contact's name is hashed to an index. If multiple names hash to the same index, they are stored in a linked list at that index. +- **Library Catalog**: Books are indexed by their titles. If multiple books have titles that hash to the same index, they are stored in a linked list at that index. + +## Applications of Hashing + +Hashing is widely used in Computer Science: + +- **Database Indexing** +- **Caches** (like CPU caches, web caches) +- **Associative Arrays** (or dictionaries in Python) +- **Sets** (unordered collections of unique elements) + +Understanding these applications is essential for Software Development. + +## Operations in Hash Table with Chaining + +Key operations include: + +- **INSERT**: Insert a new element into the hash table. +- **SEARCH**: Find the position of an element in the hash table. +- **DELETE**: Remove an element from the hash table. + +## Implementing Hash Table with Chaining in Python + +```python +class Node: + def __init__(self, key, value): + self.key = key + self.value = value + self.next = None + +class HashTable: + def __init__(self, size): + self.size = size + self.table = [None] * size + + def hash_function(self, key): + return key % self.size + + def insert(self, key, value): + hash_index = self.hash_function(key) + new_node = Node(key, value) + + if self.table[hash_index] is None: + self.table[hash_index] = new_node + else: + current = self.table[hash_index] + while current.next is not None: + current = current.next + current.next = new_node + + def search(self, key): + hash_index = self.hash_function(key) + current = self.table[hash_index] + + while current is not None: + if current.key == key: + return current.value + current = current.next + + return None + + def delete(self, key): + hash_index = self.hash_function(key) + current = self.table[hash_index] + prev = None + + while current is not None: + if current.key == key: + if prev is None: + self.table[hash_index] = current.next + else: + prev.next = current.next + return True + prev = current + current = current.next + + return False + + def display(self): + for index, item in enumerate(self.table): + print(f"Index {index}:", end=" ") + current = item + while current is not None: + print(f"({current.key}, {current.value})", end=" -> ") + current = current.next + print("None") + +# Example usage +hash_table = HashTable(10) + +hash_table.insert(1, 'A') +hash_table.insert(11, 'B') +hash_table.insert(21, 'C') + +print("Hash Table after Insert operations:") +hash_table.display() + +print("Search operation for key 11:", hash_table.search(11)) + +hash_table.delete(11) + +print("Hash Table after Delete operation:") +hash_table.display() +``` + +## Output + +```markdown +Hash Table after Insert operations: +Index 0: None +Index 1: (1, 'A') -> (11, 'B') -> (21, 'C') -> None +Index 2: None +Index 3: None +Index 4: None +Index 5: None +Index 6: None +Index 7: None +Index 8: None +Index 9: None + +Search operation for key 11: B + +Hash Table after Delete operation: +Index 0: None +Index 1: (1, 'A') -> (21, 'C') -> None +Index 2: None +Index 3: None +Index 4: None +Index 5: None +Index 6: None +Index 7: None +Index 8: None +Index 9: None +``` + +## Complexity Analysis + +- **Insertion**: Average case O(1), Worst case O(n) when many elements hash to the same slot. +- **Search**: Average case O(1), Worst case O(n) when many elements hash to the same slot. +- **Deletion**: Average case O(1), Worst case O(n) when many elements hash to the same slot. \ No newline at end of file diff --git a/contrib/ds-algorithms/hashing-linear-probing.md b/contrib/ds-algorithms/hashing-linear-probing.md new file mode 100644 index 00000000..0d27db47 --- /dev/null +++ b/contrib/ds-algorithms/hashing-linear-probing.md @@ -0,0 +1,139 @@ +# Hashing with Linear Probing + +In Data Structures and Algorithms, hashing is used to map data of arbitrary size to fixed-size values. A common approach to handle collisions in hashing is **linear probing**. In linear probing, if a collision occurs (i.e., the hash value points to an already occupied slot), we linearly probe through the table to find the next available slot. This method ensures that every element can be inserted or found in the hash table. + +## Points to be Remembered + +- **Hash Function**: A function that converts an input (or 'key') into an index in a hash table. +- **Collision**: When two keys hash to the same index. +- **Linear Probing**: A method to resolve collisions by checking the next slot (i.e., index + 1) until an empty slot is found. + +## Real Life Examples of Hashing with Linear Probing + +- **Student Record System**: Each student record is stored in a table where the student's ID number is hashed to an index. If two students have the same hash index, linear probing finds the next available slot. +- **Library System**: Books are indexed by their ISBN numbers. If two books hash to the same slot, linear probing helps find another spot for the book in the catalog. + +## Applications of Hashing + +Hashing is widely used in Computer Science: + +- **Database Indexing** +- **Caches** (like CPU caches, web caches) +- **Associative Arrays** (or dictionaries in Python) +- **Sets** (unordered collections of unique elements) + +Understanding these applications is essential for Software Development. + +## Operations in Hash Table with Linear Probing + +Key operations include: + +- **INSERT**: Insert a new element into the hash table. +- **SEARCH**: Find the position of an element in the hash table. +- **DELETE**: Remove an element from the hash table. + +## Implementing Hash Table with Linear Probing in Python + +```python +class HashTable: + def __init__(self, size): + self.size = size + self.table = [None] * size + + def hash_function(self, key): + return key % self.size + + def insert(self, key, value): + hash_index = self.hash_function(key) + + if self.table[hash_index] is None: + self.table[hash_index] = (key, value) + else: + while self.table[hash_index] is not None: + hash_index = (hash_index + 1) % self.size + self.table[hash_index] = (key, value) + + def search(self, key): + hash_index = self.hash_function(key) + + while self.table[hash_index] is not None: + if self.table[hash_index][0] == key: + return self.table[hash_index][1] + hash_index = (hash_index + 1) % self.size + + return None + + def delete(self, key): + hash_index = self.hash_function(key) + + while self.table[hash_index] is not None: + if self.table[hash_index][0] == key: + self.table[hash_index] = None + return True + hash_index = (hash_index + 1) % self.size + + return False + + def display(self): + for index, item in enumerate(self.table): + print(f"Index {index}: {item}") + +# Example usage +hash_table = HashTable(10) + +hash_table.insert(1, 'A') +hash_table.insert(11, 'B') +hash_table.insert(21, 'C') + +print("Hash Table after Insert operations:") +hash_table.display() + +print("Search operation for key 11:", hash_table.search(11)) + +hash_table.delete(11) + +print("Hash Table after Delete operation:") +hash_table.display() +``` + +## Output + +```markdown +Hash Table after Insert operations: +Index 0: None +Index 1: (1, 'A') +Index 2: None +Index 3: None +Index 4: None +Index 5: None +Index 6: None +Index 7: None +Index 8: None +Index 9: None +Index 10: None +Index 11: (11, 'B') +Index 12: (21, 'C') + +Search operation for key 11: B + +Hash Table after Delete operation: +Index 0: None +Index 1: (1, 'A') +Index 2: None +Index 3: None +Index 4: None +Index 5: None +Index 6: None +Index 7: None +Index 8: None +Index 9: None +Index 10: None +Index 11: None +Index 12: (21, 'C') +``` + +## Complexity Analysis + +- **Insertion**: Average case O(1), Worst case O(n) when many collisions occur. +- **Search**: Average case O(1), Worst case O(n) when many collisions occur. +- **Deletion**: Average case O(1), Worst case O(n) when many collisions occur. \ No newline at end of file diff --git a/contrib/ds-algorithms/heaps.md b/contrib/ds-algorithms/heaps.md new file mode 100644 index 00000000..6a9ba71a --- /dev/null +++ b/contrib/ds-algorithms/heaps.md @@ -0,0 +1,169 @@ +# Heaps + +## Definition: +Heaps are a crucial data structure that support efficient priority queue operations. They come in two main types: min heaps and max heaps. Python's heapq module provides a robust implementation for min heaps, and with some minor adjustments, it can also be used to implement max heaps. + +## Overview: +A heap is a specialized binary tree-based data structure that satisfies the heap property: + +- **Min Heap:** The key at the root must be the minimum among all keys present in the Binary Heap. This property must be recursively true for all nodes in the Binary Tree. + +- **Max Heap:** The key at the root must be the maximum among all keys present in the Binary Heap. This property must be recursively true for all nodes in the Binary Tree. + +## Python heapq Module: +The heapq module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. + +- **Min Heap:** In a min heap, the smallest element is always at the root. Here's how to use heapq to create and manipulate a min heap: + + ```python + import heapq + +# Create an empty heap +min_heap = [] + +# Adding elements to the heap + +heapq.heappush(min_heap, 10) +heapq.heappush(min_heap, 5) +heapq.heappush(min_heap, 3) +heapq.heappush(min_heap, 12) +print("Min Heap:", min_heap) + +# Pop the smallest element +smallest = heapq.heappop(min_heap) +print("Smallest element:", smallest) +print("Min Heap after pop:", min_heap) +``` + +**Output:** + + ``` +Min Heap: [3, 5, 10, 12] +Smallest element: 3 +Min Heap after pop: [5, 12, 10] +``` + +- **Max Heap:** To create a max heap, we can store negative values. + +```python +import heapq + +# Create an empty heap +max_heap = [] + +# Adding elements to the heap by pushing negative values +heapq.heappush(max_heap, -10) +heapq.heappush(max_heap, -5) +heapq.heappush(max_heap, -3) +heapq.heappush(max_heap, -12) + +# Convert back to positive values for display +print("Max Heap:", [-x for x in max_heap]) + +# Pop the largest element +largest = -heapq.heappop(max_heap) +print("Largest element:", largest) +print("Max Heap after pop:", [-x for x in max_heap]) + +``` + +**Output:** + +``` +Max Heap: [12, 10, 3, 5] +Largest element: 12 +Max Heap after pop: [10, 5, 3] +``` + +## Heap Operations: +1. **Push Operation:** Adds an element to the heap, maintaining the heap property. +```python +heapq.heappush(heap, item) +``` +2. **Pop Operation:** Removes and returns the smallest element from the heap. +```python +smallest = heapq.heappop(heap) +``` +3. **Heapify Operation:** Converts a list into a heap in-place. +```python +heapq.heapify(list) +``` +4. **Peek Operation:** To get the smallest element without popping it (not directly available, but can be done by accessing the first element). +```python +smallest = heap[0] +``` + +## Example: +```python +# importing "heapq" to implement heap queue +import heapq + +# initializing list +li = [15, 77, 90, 1, 3] + +# using heapify to convert list into heap +heapq.heapify(li) + +# printing created heap +print("The created heap is : ", end="") +print(list(li)) + +# using heappush() to push elements into heap +# pushes 4 +heapq.heappush(li, 4) + +# printing modified heap +print("The modified heap after push is : ", end="") +print(list(li)) + +# using heappop() to pop smallest element +print("The popped and smallest element is : ", end="") +print(heapq.heappop(li)) + +``` + +Output: +``` +The created heap is : [1, 3, 15, 77, 90] +The modified heap after push is : [1, 3, 4, 15, 77, 90] +The popped and smallest element is : 1 +``` + +## Advantages and Disadvantages of Heaps: + +## Advantages: + +**Efficient:** Heap queues, implemented in Python's heapq module, offer remarkable efficiency in managing priority queues and heaps. With logarithmic time complexity for key operations, they are widely favored in various applications for their performance. + +**Space-efficient:** Leveraging an array-based representation, heap queues optimize memory usage compared to node-based structures like linked lists. This design minimizes overhead, enhancing efficiency in memory management. + +**Ease of Use:** Python's heap queues boast a user-friendly API, simplifying fundamental operations such as insertion, deletion, and retrieval. This simplicity contributes to rapid development and code maintenance. + +**Flexibility:** Beyond their primary use in priority queues and heaps, Python's heap queues lend themselves to diverse applications. They can be adapted to implement various data structures, including binary trees, showcasing their versatility and broad utility across different domains. + +## Disadvantages: + +**Limited functionality:** Heap queues are primarily designed for managing priority queues and heaps, and may not be suitable for more complex data structures and algorithms. + +**No random access:** Heap queues do not support random access to elements, making it difficult to access elements in the middle of the heap or modify elements that are not at the top of the heap. + +**No sorting:** Heap queues do not support sorting, so if you need to sort elements in a specific order, you will need to use a different data structure or algorithm. + +**Not thread-safe:** Heap queues are not thread-safe, meaning that they may not be suitable for use in multi-threaded applications where data synchronization is critical. + +## Real-Life Examples of Heaps: + +1. **Priority Queues:** +Heaps are commonly used to implement priority queues, which are used in various algorithms like Dijkstra's shortest path algorithm and Prim's minimum spanning tree algorithm. + +2. **Scheduling Algorithms:** +Heaps are used in job scheduling algorithms where tasks with the highest priority need to be processed first. + +3. **Merge K Sorted Lists:** +Heaps can be used to efficiently merge multiple sorted lists into a single sorted list. + +4. **Real-Time Event Simulation:** +Heaps are used in event-driven simulators to manage events scheduled to occur at future times. + +5. **Median Finding Algorithm:** +Heaps can be used to maintain a dynamic set of numbers to find the median efficiently. diff --git a/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo1.png b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo1.png new file mode 100644 index 00000000..b937f046 Binary files /dev/null and b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo1.png differ diff --git a/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo2.png b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo2.png new file mode 100644 index 00000000..e1cacef1 Binary files /dev/null and b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo2.png differ diff --git a/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo3.png b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo3.png new file mode 100644 index 00000000..a5b69f9d Binary files /dev/null and b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo3.png differ diff --git a/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo4.png b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo4.png new file mode 100644 index 00000000..54d1889c Binary files /dev/null and b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo4.png differ diff --git a/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo5.png b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo5.png new file mode 100644 index 00000000..a3a6d508 Binary files /dev/null and b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo5.png differ diff --git a/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo6.png b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo6.png new file mode 100644 index 00000000..db7d948c Binary files /dev/null and b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo6.png differ diff --git a/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo7.png b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo7.png new file mode 100644 index 00000000..0b4eaf89 Binary files /dev/null and b/contrib/ds-algorithms/images/Dijkstra's_algorithm_photo7.png differ diff --git a/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOh.png b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOh.png new file mode 100644 index 00000000..f7480941 Binary files /dev/null and b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOh.png differ diff --git a/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOmega.png b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOmega.png new file mode 100644 index 00000000..b4faba16 Binary files /dev/null and b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigOmega.png differ diff --git a/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigTheta.png b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigTheta.png new file mode 100644 index 00000000..74349064 Binary files /dev/null and b/contrib/ds-algorithms/images/Time-And-Space-Complexity-BigTheta.png differ diff --git a/contrib/ds-algorithms/images/binarytree.png b/contrib/ds-algorithms/images/binarytree.png new file mode 100644 index 00000000..4137cdfc Binary files /dev/null and b/contrib/ds-algorithms/images/binarytree.png differ diff --git a/contrib/ds-algorithms/images/inorder-traversal.png b/contrib/ds-algorithms/images/inorder-traversal.png new file mode 100644 index 00000000..61b32c70 Binary files /dev/null and b/contrib/ds-algorithms/images/inorder-traversal.png differ diff --git a/contrib/ds-algorithms/images/postorder-traversal.png b/contrib/ds-algorithms/images/postorder-traversal.png new file mode 100644 index 00000000..69ac6590 Binary files /dev/null and b/contrib/ds-algorithms/images/postorder-traversal.png differ diff --git a/contrib/ds-algorithms/images/preorder-traversal.png b/contrib/ds-algorithms/images/preorder-traversal.png new file mode 100644 index 00000000..e85a70d8 Binary files /dev/null and b/contrib/ds-algorithms/images/preorder-traversal.png differ diff --git a/contrib/ds-algorithms/images/traversal.png b/contrib/ds-algorithms/images/traversal.png new file mode 100644 index 00000000..556f1775 Binary files /dev/null and b/contrib/ds-algorithms/images/traversal.png differ diff --git a/contrib/ds-algorithms/index.md b/contrib/ds-algorithms/index.md new file mode 100644 index 00000000..0c29ec75 --- /dev/null +++ b/contrib/ds-algorithms/index.md @@ -0,0 +1,26 @@ +# List of sections + +- [Time & Space Complexity](time-space-complexity.md) +- [Queues in Python](Queues.md) +- [Graphs](graph.md) +- [Sorting Algorithms](sorting-algorithms.md) +- [Recursion and Backtracking](recursion.md) +- [Divide and Conquer Algorithm](divide-and-conquer-algorithm.md) +- [Searching Algorithms](searching-algorithms.md) +- [Greedy Algorithms](greedy-algorithms.md) +- [Dynamic Programming](dynamic-programming.md) +- [Linked list](linked-list.md) +- [Stacks in Python](stacks.md) +- [Sliding Window Technique](sliding-window.md) +- [Trie](trie.md) +- [Two Pointer Technique](two-pointer-technique.md) +- [Hashing through Linear Probing](hashing-linear-probing.md) +- [Hashing through Chaining](hashing-chaining.md) +- [Heaps](heaps.md) +- [Hash Tables, Sets, Maps](hash-tables.md) +- [Binary Tree](binary-tree.md) +- [AVL Trees](avl-trees.md) +- [Splay Trees](splay-trees.md) +- [Dijkstra's Algorithm](dijkstra.md) +- [Deque](deque.md) +- [Tree Traversals](tree-traversal.md) diff --git a/contrib/ds-algorithms/linked-list.md b/contrib/ds-algorithms/linked-list.md new file mode 100644 index 00000000..59631e7c --- /dev/null +++ b/contrib/ds-algorithms/linked-list.md @@ -0,0 +1,240 @@ +# Linked List Data Structure + +Linked list is a linear data Structure which can be defined as collection of objects called nodes that are randomly stored in the memory. +A node contains two types of metadata i.e. data stored at that particular address and the pointer which contains the address of the next node in the memory. + +The last element in a linked list features a null pointer. + +## Why use linked list over array? + +From the beginning, we are using array data structure to organize the group of elements that are stored individually in the memory. +However, there are some advantage and disadvantage of array which should be known to decide which data structure will used throughout the program. + +limitations + +1. Before an array can be utilized in a program, its size must be established in advance. +2. Expanding an array's size is a lengthy process and is almost impossible to achieve during runtime. +3. Array elements must be stored in contiguous memory locations. To insert an element, all subsequent elements must be shifted + +So we introduce a new data structure to overcome these limitations. + +Linked list is used because, +1. Dynamic Memory Management: Linked lists allocate memory dynamically, meaning nodes can be located anywhere in memory and are connected through pointers, rather than being stored contiguously. +2. Adaptive Sizing: There is no need to predefine the size of a linked list. It can expand or contract during runtime, adapting to the program's requirements within the constraints of the available memory. + +Let's code something + +The smallest Unit: Node + +```python + class Node: + def __init__(self, data): + self.data = data # Assigns the given data to the node + self.next = None # Initialize the next attribute to null +``` + +Now, we will see the types of linked list. + +There are mainly four types of linked list, +1. Singly linked list +2. Doubly linked list +3. Circular linked list +4. Doubly circular linked list + + +## 1. Singly linked list. + +Simply think it is a chain of nodes in which each node remember(contains) the addresses of it next node. + +### Creating a linked list class +```python + class LinkedList: + def __init__(self): + self.head = None # Initialize head as None +``` + +### Inserting a new node at the beginning of a linked list + +```python + def insertAtBeginning(self, new_data): + new_node = Node(new_data) # Create a new node + new_node.next = self.head # Next for new node becomes the current head + self.head = new_node # Head now points to the new node +``` + +### Inserting a new node at the end of a linked list + +```python + def insertAtEnd(self, new_data): + new_node = Node(new_data) # Create a new node + if self.head is None: + self.head = new_node # If the list is empty, make the new node the head + return + last = self.head + while last.next: # Otherwise, traverse the list to find the last node + last = last.next + last.next = new_node # Make the new node the next node of the last node +``` +### Inserting a new node at the middle of a linked list + +```python + def insertAtPosition(self, data, position): + new_node = Node(data) + if position <= 0: #check if position is valid or not + print("Position should be greater than 0") + return + if position == 1: + new_node.next = self.head + self.head = new_node + return + current_node = self.head + current_position = 1 + while current_node and current_position < position - 1: #Iterating to behind of the postion. + current_node = current_node.next + current_position += 1 + if not current_node: #Check if Position is out of bound or not + print("Position is out of bounds") + return + new_node.next = current_node.next #connect the intermediate node + current_node.next = new_node +``` +### Printing the Linked list + +```python + def printList(self): + temp = self.head # Start from the head of the list + while temp: + print(temp.data,end=' ') # Print the data in the current node + temp = temp.next # Move to the next node + print() # Ensures the output is followed by a new line +``` + +Lets complete the code and create a linked list. + +Connect all the code. + +```python + if __name__ == '__main__': + llist = LinkedList() + + # Insert words at the beginning + llist.insertAtBeginning(4) # <4> + llist.insertAtBeginning(3) # <3> 4 + llist.insertAtBeginning(2) # <2> 3 4 + llist.insertAtBeginning(1) # <1> 2 3 4 + + # Insert a word at the end + llist.insertAtEnd(10) # 1 2 3 4 <10> + llist.insertAtEnd(7) # 1 2 3 4 10 <7> + + #Insert at a random position + llist.insertAtPosition(9,4) ## 1 2 3 <9> 4 10 7 + # Print the list + llist.printList() +``` + +## output: +1 2 3 9 4 10 7 + + +### Deleting a node from the beginning of a linked list +check the list is empty otherwise shift the head to next node. +```python + def deleteFromBeginning(self): + if self.head is None: + return "The list is empty" # If the list is empty, return this string + self.head = self.head.next # Otherwise, remove the head by making the next node the new head +``` +### Deleting a node from the end of a linked list + +```python + def deleteFromEnd(self): + if self.head is None: + return "The list is empty" + if self.head.next is None: + self.head = None # If there's only one node, remove the head by making it None + return + temp = self.head + while temp.next.next: # Otherwise, go to the second-last node + temp = temp.next + temp.next = None # Remove the last node by setting the next pointer of the second-last node to None +``` + +### Reversing the linked list +```python + def reverseList(self): + prev = None + temp = self.head + while(temp): + nextNode = temp.next #Store the next node + temp.next = prev # Reverse the pointer of current node + prev = temp # Move prev pointer one step forward + temp = nextNode # Move temp pointer one step forward. + self.head = prev # Update the head pointer to last node +``` + +### Search in a linked list +```python + def search(self, value): + current = self.head # Start with the head of the list + position = 0 # Counter to keep track of the position + while current: # Traverse the list + if current.data == value: # Compare the list's data to the search value + return f"Value '{value}' found at position {position}" # Print the value if a match is found + current = current.next + position += 1 + return f"Value '{value}' not found in the list" +``` + +Connect all the code. + +```python + if __name__ == '__main__': + llist = LinkedList() + + # Insert words at the beginning + llist.insertAtBeginning(4) # <4> + llist.insertAtBeginning(3) # <3> 4 + llist.insertAtBeginning(2) # <2> 3 4 + llist.insertAtBeginning(1) # <1> 2 3 4 + + # Insert a word at the end + llist.insertAtEnd(10) # 1 2 3 4 <10> + llist.insertAtEnd(7) # 1 2 3 4 10 <7> + + #Insert at a random position + llist.insertAtPosition(9,4) # 1 2 3 <9> 4 10 7 + llist.insertAtPositon(56,4) # 1 2 3 <56> 9 4 10 7 + + #delete at the beginning + llist.deleteFromBeginning() # 2 3 56 9 4 10 7 + + #delete at the end + llist.deleteFromEnd() # 2 3 56 9 4 10 + # Print the original list + llist.printList() + llist.reverseList() #10 4 9 56 3 2 + # Print the reversed list + llist.printList() +``` +## Output: + +2 3 56 9 4 10 + +10 4 9 56 3 2 + + +## Real Life uses of Linked List + + +Here are a few practical applications of linked lists in various fields: + +1. **Music Player**: In a music player, songs are often linked to the previous and next tracks. This allows for seamless navigation between songs, enabling you to play tracks either from the beginning or the end of the playlist. This is akin to a doubly linked list where each song node points to both the previous and the next song, enhancing the flexibility of song selection. + +2. **GPS Navigation Systems**: Linked lists can be highly effective for managing lists of locations and routes in GPS navigation systems. Each location or waypoint can be represented as a node, making it easy to add or remove destinations and to navigate smoothly from one location to another. This is similar to how you might plan a road trip, plotting stops along the way in a flexible, dynamic manner. + +3. **Task Scheduling**: Operating systems utilize linked lists to manage task scheduling. Each process waiting to be executed is represented as a node in a linked list. This organization allows the system to efficiently keep track of which processes need to be run, enabling fair and systematic scheduling of tasks. Think of it like a to-do list where each task is a node, and the system executes tasks in a structured order. + +4. **Speech Recognition**: Speech recognition software uses linked lists to represent possible phonetic pronunciations of words. Each potential pronunciation is a node, allowing the software to dynamically explore different pronunciation paths as it processes spoken input. This method helps in accurately recognizing and understanding speech by considering multiple possibilities in a flexible manner, much like evaluating various potential meanings in a conversation. + +These examples illustrate how linked lists provide a flexible, dynamic data structure that can be adapted to a wide range of practical applications, making them a valuable tool in both software development and real-world problem-solving. diff --git a/contrib/ds-algorithms/queues.md b/contrib/ds-algorithms/queues.md new file mode 100644 index 00000000..2c5c0f0f --- /dev/null +++ b/contrib/ds-algorithms/queues.md @@ -0,0 +1,130 @@ +# Queues in Python + +A queue is a linear data structure where elements are added at the back (enqueue) and removed from the front (dequeue). Imagine a line at a coffee shop, the first in line (front) gets served first, and new customers join at the back. This FIFO approach ensures order and fairness in processing elements. + +Queues offer efficient implementations for various scenarios. They are often used in: +- **Task Scheduling** - Operating systems utilize queues to manage processes waiting for CPU time. +- **Breadth-first search algorithms** - Traversing a tree or graph involves exploring neighbouring nodes level by level, often achieved using a queue. +- **Message passing** - Communication protocols leverage queues to buffer messages between applications for reliable delivery. + +## Types of Queue + +A queue can be classified into 4 types - + +- **Simple Queue** - A simple queue is a queue, where we can only insert an element at the back and remove the element from the front of the queue, this type of queue follows the FIFO principle. +- **Double-Ended Queue (Dequeue)** - In this type of queue, insertions and deletions of elements can be performed from both ends of the queue.
+Double-ended queues can be classified into 2 types -> + - **Input-Restricted Queue** + - **Output-Restricted Queue** +- **Circular Queue** - It is a special type of queue where the back is connected to the front, where the operations follow the FIFO principle. +- **Priority Queue** - In this type of queue, elements are accessed based on their priority in the queue.
+Priority queues are of 2 types -> + - **Ascending Priority Queue** + - **Descending Priority Queue** + +## Real Life Examples of Queues +- **Customer Service** - Consider how a customer service phone line works. Customers calling are put into a queue. The first customer to call is the first one to be served (FIFO). As more customers call, they are added to the end of the queue, and as customers are served, they are removed from the front. The entire process follows the queue data structure. + +- **Printers** - Printers operate using a queue to manage print jobs. When a user sends a document to the printer, the job is added to the queue (enqueue). Once a job completes printing, it's removed from the queue (dequeue), and the next job in line starts. This sequential order of handling tasks perfectly exhibits the queue data structure. + +- **Computer Memory** - Certain types of computer memory use a queue data structure to hold and process instructions. For example, in a computer's cache memory, the fetch-decode-execute cycle of an instruction follows a queue. The first instruction fetched is the first one to be decoded and executed, while new instructions fetched are added to the rear. + +
+ +# Important Terminologies in Queues + +Understanding these terms is crucial for working with queues: + +- **Enqueue** - Adding an element to the back of the queue. +- **Dequeue** - Removing the element at the front of the queue. +- **Front** - The first element in the queue, to be removed next. +- **Rear/Back** - The last element in the queue, where new elements are added. +- **Empty Queue** - A queue with no elements. +- **Overflow** - Attempting to enqueue an element when the queue is full. +- **Underflow** - Attempting to dequeue an element from an empty queue. + +## Operations on a Queue + +There are some key operations in a queue that include - + +- **isFULL** - This operation checks if a queue is full. +- **isEMPTY** - This operation checks if a queue is empty. +- **Display** - This operation displays the queue elements. +- **Peek** - This operation is the process of getting the front value of a queue, without removing it. (i.e., Value at the front). + +
+ +# Implementation of Queue + +```python +def isEmpty(Qu): + if Qu == []: + return True + else: + return False + +def Enqueue(Qu, item) : + Qu.append(item) + if len(Qu) == 1: + front = rear = 0 + else: + rear = len(Qu) - 1 + print(item, "enqueued to queue") +def Dequeue(Qu): + if isEmpty(Qu): + print("Underflow") + else: + item = Qu.pop(0) + if len(Qu) == 0: #if it was single-element queue + front = rear = None + print(item, "dequeued from queue") + +def Peek(Qu): + if isEmpty(Qu): + print("Underflow") + else: + front = 0 + print("Frontmost item is :", Qu[front]) + +def Display(Qu): + if isEmpty(Qu): + print("Queue Empty!") + elif len(Qu) == 1: + print(Qu[0], "<== front, rear") + else: + front = 0 + rear = len(Qu) - 1 + print(Qu[front], "<-front") + for a in range(1, rear): + print(Qu[a]) + print(Qu[rear], "<-rear") + +queue = [] #initially queue is empty +front = None + +# Example Usage +Enqueue(queue, 1) +Enqueue(queue, 2) +Enqueue(queue, 3) +Dequeue(queue) +Peek(queue) +Display(queue) +``` + +## Output + +``` +1 enqueued to queue +2 enqueued to queue +3 enqueued to queue +1 dequeued from queue +Frontmost item is : 2 +2 <-front +3 <-rear +``` + +## Complexity Analysis + +- **Worst case**: `O(n^2)` This occurs when the code performs lots of display operations. +- **Best case**: `O(n)` If the code mostly performs enqueue, dequeue and peek operations. +- **Average case**: `O(n^2)` It occurs when the number of operations in display are more than the operations in enqueue, dequeue and peek. diff --git a/contrib/ds-algorithms/recursion.md b/contrib/ds-algorithms/recursion.md new file mode 100644 index 00000000..4233242a --- /dev/null +++ b/contrib/ds-algorithms/recursion.md @@ -0,0 +1,127 @@ +# Introduction to Recursions + +When a function calls itself to solve smaller instances of the same problem until a specified condition is fulfilled is called recursion. It is used for tasks that can be divided into smaller sub-tasks. + +## How Recursion Works + +To solve a problem using recursion we must define: +- Base condition :- The condition under which recursion ends. +- Recursive case :- The part of function which calls itself to solve a smaller instance of problem. + +Steps of Recursion + +When a recursive function is called, the following sequence of events occurs: +- Function Call: The function is invoked with a specific argument. +- Base Condition Check: The function checks if the argument satisfies the base case. +- Recursive Call: If the base case is not met, the function performs some operations and makes a recursive call with a modified argument. +- Stack Management: Each recursive call is placed on the call stack. The stack keeps track of each function call, its argument, and the point to return to once the call completes. +- Unwinding the Stack: When the base case is eventually met, the function returns a value, and the stack starts unwinding, returning values to previous function calls until the initial call is resolved. + +## Python Code: Factorial using Recursion + +```python +def fact(n): + if n == 0 or n == 1: + return 1 + return n * fact(n - 1) + +if __name__ == "__main__": + n = int(input("Enter a positive number: ")) + print("Factorial of", n, "is", fact(n)) +``` + +### Explanation + +This Python script calculates the factorial of a given number using recursion. + +- **Function `fact(n)`:** + - The function takes an integer `n` as input and calculates its factorial. + - It checks if `n` is 0 or 1. If so, it returns 1 (since the factorial of 0 and 1 is 1). + - Otherwise, it returns `n * fact(n - 1)`, which means it recursively calls itself with `n - 1` until it reaches either 0 or 1. + +- **Main Section:** + - The main section prompts the user to enter a positive number. + - It then calls the `fact` function with the input number and prints the result. + +#### Example : Let n = 4 + +The recursion unfolds as follows: +1. When `fact(4)` is called, it computes `4 * fact(3)`. +2. Inside `fact(3)`, it computes `3 * fact(2)`. +3. Inside `fact(2)`, it computes `2 * fact(1)`. +4. `fact(1)` returns 1 (`if` statement executes), which is received by `fact(2)`, resulting in `2 * 1` i.e. `2`. +5. Back to `fact(3)`, it receives the value from `fact(2)`, giving `3 * 2` i.e. `6`. +6. `fact(4)` receives the value from `fact(3)`, resulting in `4 * 6` i.e. `24`. +7. Finally, `fact(4)` returns 24 to the main function. + +#### So, the result is 24. + +#### What is Stack Overflow in Recursion? + +Stack overflow is an error that occurs when the call stack memory limit is exceeded. During execution of recursion calls they are simultaneously stored in a recursion stack waiting for the recursive function to be completed. Without a base case, the function would call itself indefinitely, leading to a stack overflow. + +## What is Backtracking + +Backtracking is a recursive algorithmic technique used to solve problems by exploring all possible solutions and discarding those that do not meet the problem's constraints. It is particularly useful for problems involving combinations, permutations, and finding paths in a grid. + +## How Backtracking Works + +- Incremental Solution Building: Solutions are built one step at a time. +- Feasibility Check: At each step, a check is made to see if the current partial solution is valid. +- Backtracking: If a partial solution is found to be invalid, the algorithm backtracks by removing the last added part of the solution and trying the next possibility. +- Exploration of All Possibilities: The process continues recursively, exploring all possible paths, until a solution is found or all possibilities are exhausted. + +## Example: Word Search + +Given a 2D grid of characters and a word, determine if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cells, where "adjacent" cells are horizontally or vertically neighboring. The same letter cell may not be used more than once. + +Algorithm for Solving the Word Search Problem with Backtracking: +- Start at each cell: Attempt to find the word starting from each cell. +- Check all Directions: From each cell, try all four possible directions (up, down, left, right). +- Mark Visited Cells: Use a temporary marker to indicate cells that are part of the current path to avoid revisiting. +- Backtrack: If a path does not lead to a solution, backtrack by unmarking the visited cell and trying the next possibility. + +```python +def exist(board, word): + rows, cols = len(board), len(board[0]) + + def backtrack(r, c, suffix): + if not suffix: + return True + + if r < 0 or r >= rows or c < 0 or c >= cols or board[r][c] != suffix[0]: + return False + + # Mark the cell as visited by replacing its character with a placeholder + ret = False + board[r][c], temp = '#', board[r][c] + + # Explore the four possible directions + for row_offset, col_offset in [(0, 1), (1, 0), (0, -1), (-1, 0)]: + ret = backtrack(r + row_offset, c + col_offset, suffix[1:]) + if ret: + break + + # Restore the cell's original value + board[r][c] = temp + return ret + + for row in range(rows): + for col in range(cols): + if backtrack(row, col, word): + return True + + return False + +# Test case +board = [ + ['A','B','C','E'], + ['S','F','C','S'], + ['A','D','E','E'] +] +word = "ABCES" +print(exist(board, word)) # Output: True +``` + + + diff --git a/contrib/ds-algorithms/searching-algorithms.md b/contrib/ds-algorithms/searching-algorithms.md new file mode 100644 index 00000000..78b86d14 --- /dev/null +++ b/contrib/ds-algorithms/searching-algorithms.md @@ -0,0 +1,161 @@ +# Searching Algorithms + +Searching algorithms are techniques used to locate specific items within a collection of data. These algorithms are fundamental in computer science and are employed in various applications, from databases to web search engines. + +## Real Life Example of Searching +- Searching for a word in a dictionary +- Searching for a specific book in a library +- Searching for a contact in your phone's address book +- Searching for a file on your computer, etc. + +# Some common searching techniques + +# 1. Linear Search + +Linear search, also known as sequential search, is a straightforward searching algorithm that checks each element in a collection until the target element is found or the entire collection has been traversed. It is simple to implement but becomes inefficient for large datasets. + +**Algorithm Overview:** +- **Sequential Checking:** The algorithm iterates through each element in the collection, starting from the first element. +- **Comparing Elements:** At each iteration, it compares the current element with the target element. +- **Finding the Target:** If the current element matches the target, the search terminates, and the index of the element is returned. +- **Completing the Search:** If the entire collection is traversed without finding the target, the algorithm indicates that the element is not present. + +## Linear Search Code in Python + +```python +def linear_search(arr, target): + for i in range(len(arr)): + if arr[i] == target: + return i + return -1 + +arr = [5, 3, 8, 1, 2] +target = 8 +result = linear_search(arr, target) +if result != -1: + print(f"Element {target} found at index {result}.") +else: + print(f"Element {target} not found.") +``` + +## Complexity Analysis +- **Time Complexity**: O(n) +- **Space Complexity**: O(1) + +
+
+
+ +# 2. Binary Search + +Binary search is an efficient searching algorithm that works on sorted collections. It repeatedly divides the search interval in half until the target element is found or the interval is empty. Binary search is significantly faster than linear search but requires the collection to be sorted beforehand. + +**Algorithm Overview:** +- **Initial State:** Binary search starts with the entire collection as the search interval. +- **Divide and Conquer:** At each step, it calculates the middle element of the current interval and compares it with the target. +- **Narrowing Down the Interval:** If the middle element is equal to the target, the search terminates successfully. Otherwise, it discards half of the search interval based on the comparison result. +- **Repeating the Process:** The algorithm repeats this process on the remaining half of the interval until the target is found or the interval is empty. + +## Binary Search Code in Python (Iterative) + +```python +def binary_search(arr, target): + low = 0 + high = len(arr) - 1 + while low <= high: + mid = (low + high) // 2 + if arr[mid] == target: + return mid + elif arr[mid] < target: + low = mid + 1 + else: + high = mid - 1 + return -1 + +arr = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19] +target = 13 +result = binary_search(arr, target) +if result != -1: + print(f"Element {target} found at index {result}.") +else: + print(f"Element {target} not found.") +``` + +## Binary Search Code in Python (Recursive) + +```python +def binary_search_recursive(arr, target, low, high): + if low <= high: + mid = (low + high) // 2 + if arr[mid] == target: + return mid + elif arr[mid] < target: + return binary_search_recursive(arr, target, mid + 1, high) + else: + return binary_search_recursive(arr, target, low, mid - 1) + else: + return -1 + +arr = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19] +target = 13 +result = binary_search_recursive(arr, target, 0, len(arr) - 1) +if result != -1: + print(f"Element {target} found at index {result}.") +else: + print(f"Element {target} not found.") +``` + +## Complexity Analysis +- **Time Complexity**: O(log n) +- **Space Complexity**: O(1) (Iterative), O(log n) (Recursive) + +
+
+
+ +# 3. Interpolation Search + +Interpolation search is an improved version of binary search, especially useful when the elements in the collection are uniformly distributed. Instead of always dividing the search interval in half, interpolation search estimates the position of the target element based on its value and the values of the endpoints of the search interval. + +**Algorithm Overview:** +- **Estimating Position:** Interpolation search calculates an approximate position of the target element within the search interval based on its value and the values of the endpoints. +- **Refining the Estimate:** It adjusts the estimated position based on whether the target value is likely to be closer to the beginning or end of the search interval. +- **Updating the Interval:** Using the refined estimate, it narrows down the search interval iteratively until the target is found or the interval becomes empty. + +## Interpolation Search Code in Python + +```python +def interpolation_search(arr, target): + low = 0 + high = len(arr) - 1 + while low <= high and arr[low] <= target <= arr[high]: + if low == high: + if arr[low] == target: + return low + return -1 + pos = low + ((target - arr[low]) * (high - low)) // (arr[high] - arr[low]) + if arr[pos] == target: + return pos + elif arr[pos] < target: + low = pos + 1 + else: + high = pos - 1 + return -1 + +arr = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100] +target = 60 +result = interpolation_search(arr, target) +if result != -1: + print(f"Element {target} found at index {result}.") +else: + print(f"Element {target} not found.") +``` + +## Complexity Analysis +- **Time Complexity**: O(log log n) (Average) +- **Space Complexity**: O(1) + +
+
+
+ diff --git a/contrib/ds-algorithms/sliding-window.md b/contrib/ds-algorithms/sliding-window.md new file mode 100644 index 00000000..72aa1915 --- /dev/null +++ b/contrib/ds-algorithms/sliding-window.md @@ -0,0 +1,249 @@ +# Sliding Window Technique + +The sliding window technique is a fundamental approach used to solve problems involving arrays, lists, or sequences. It's particularly useful when you need to calculate something over a subarray or sublist of fixed size that slides over the entire array. + +In easy words, It is the transformation of the nested loops into the single loop +## Concept + +The sliding window technique involves creating a window (a subarray or sublist) that moves or "slides" across the entire array. This window can either be fixed in size or dynamically resized. By maintaining and updating this window as it moves, you can optimize certain computations, reducing time complexity. + +## Types of Sliding Windows + +1. **Fixed Size Window**: The window size remains constant as it slides from the start to the end of the array. +2. **Variable Size Window**: The window size can change based on certain conditions, such as the sum of elements within the window meeting a specified target. + +## Steps to Implement a Sliding Window + +1. **Initialize the Window**: Set the initial position of the window and any required variables (like sum, count, etc.). +2. **Expand the Window**: Add the next element to the window and update the relevant variables. +3. **Shrink the Window**: If needed, remove elements from the start of the window and update the variables. +4. **Slide the Window**: Move the window one position to the right by including the next element and possibly excluding the first element. +5. **Repeat**: Continue expanding, shrinking, and sliding the window until you reach the end of the array. + +## Example Problems + +### 1. Maximum Sum Subarray of Fixed Size K + +Given an array of integers and an integer k, find the maximum sum of a subarray of size k. + +**Steps:** + +1. Initialize the sum of the first k elements. +2. Slide the window from the start of the array to the end, updating the sum by subtracting the element that is left behind and adding the new element. +3. Track the maximum sum encountered. + +**Python Code:** + +```python +def max_sum_subarray(arr, k): + n = len(arr) + if n < k: + return None + + # Compute the sum of the first window + window_sum = sum(arr[:k]) + max_sum = window_sum + + # Slide the window from start to end + for i in range(n - k): + window_sum = window_sum - arr[i] + arr[i + k] + max_sum = max(max_sum, window_sum) + + return max_sum + +# Example usage: +arr = [1, 3, 2, 5, 1, 1, 6, 2, 8, 5] +k = 3 +print(max_sum_subarray(arr, k)) # Output: 16 +``` + +### 2. Longest Substring Without Repeating Characters + +Given a string, find the length of the longest substring without repeating characters. + +**Steps:** + +1. Use two pointers to represent the current window. +2. Use a set to track characters in the current window. +3. Expand the window by moving the right pointer. +4. If a duplicate character is found, shrink the window by moving the left pointer until the duplicate is removed. + +**Python Code:** + +```python +def longest_unique_substring(s): + n = len(s) + char_set = set() + left = 0 + max_length = 0 + + for right in range(n): + while s[right] in char_set: + char_set.remove(s[left]) + left += 1 + char_set.add(s[right]) + max_length = max(max_length, right - left + 1) + + return max_length + +# Example usage: +s = "abcabcbb" +print(longest_unique_substring(s)) # Output: 3 +``` +## 3. Minimum Size Subarray Sum + +Given an array of positive integers and a positive integer `s`, find the minimal length of a contiguous subarray of which the sum is at least `s`. If there isn't one, return 0 instead. + +### Steps: +1. Use two pointers, `left` and `right`, to define the current window. +2. Expand the window by moving `right` and adding `arr[right]` to `current_sum`. +3. If `current_sum` is greater than or equal to `s`, update `min_length` and shrink the window from the left by moving `left` and subtracting `arr[left]` from `current_sum`. +4. Repeat until `right` has traversed the array. + +### Python Code: +```python +def min_subarray_len(s, arr): + n = len(arr) + left = 0 + current_sum = 0 + min_length = float('inf') + + for right in range(n): + current_sum += arr[right] + + while current_sum >= s: + min_length = min(min_length, right - left + 1) + current_sum -= arr[left] + left += 1 + + return min_length if min_length != float('inf') else 0 + +# Example usage: +arr = [2, 3, 1, 2, 4, 3] +s = 7 +print(min_subarray_len(s, arr)) # Output: 2 (subarray [4, 3]) +``` + +## 4. Longest Substring with At Most K Distinct Characters + +Given a string `s` and an integer `k`, find the length of the longest substring that contains at most `k` distinct characters. + +### Steps: +1. Use two pointers, `left` and `right`, to define the current window. +2. Use a dictionary `char_count` to count characters in the window. +3. Expand the window by moving `right` and updating `char_count`. +4. If `char_count` has more than `k` distinct characters, shrink the window from the left by moving `left` and updating `char_count`. +5. Keep track of the maximum length of the window with at most `k` distinct characters. + +### Python Code: +```python +def longest_substring_k_distinct(s, k): + n = len(s) + char_count = {} + left = 0 + max_length = 0 + + for right in range(n): + char_count[s[right]] = char_count.get(s[right], 0) + 1 + + while len(char_count) > k: + char_count[s[left]] -= 1 + if char_count[s[left]] == 0: + del char_count[s[left]] + left += 1 + + max_length = max(max_length, right - left + 1) + + return max_length + +# Example usage: +s = "eceba" +k = 2 +print(longest_substring_k_distinct(s, k)) # Output: 3 (substring "ece") +``` + +## 5. Maximum Number of Vowels in a Substring of Given Length + +Given a string `s` and an integer `k`, return the maximum number of vowel letters in any substring of `s` with length `k`. + +### Steps: +1. Use a sliding window of size `k`. +2. Keep track of the number of vowels in the current window. +3. Expand the window by adding the next character and update the count if it's a vowel. +4. If the window size exceeds `k`, remove the leftmost character and update the count if it's a vowel. +5. Track the maximum number of vowels found in any window of size `k`. + +### Python Code: +```python +def max_vowels(s, k): + vowels = set('aeiou') + max_vowel_count = 0 + current_vowel_count = 0 + + for i in range(len(s)): + if s[i] in vowels: + current_vowel_count += 1 + if i >= k: + if s[i - k] in vowels: + current_vowel_count -= 1 + max_vowel_count = max(max_vowel_count, current_vowel_count) + + return max_vowel_count + +# Example usage: +s = "abciiidef" +k = 3 +print(max_vowels(s, k)) # Output: 3 (substring "iii") +``` + +## 6. Subarray Product Less Than K + +Given an array of positive integers `nums` and an integer `k`, return the number of contiguous subarrays where the product of all the elements in the subarray is less than `k`. + +### Steps: +1. Use two pointers, `left` and `right`, to define the current window. +2. Expand the window by moving `right` and multiplying `product` by `nums[right]`. +3. If `product` is greater than or equal to `k`, shrink the window from the left by moving `left` and dividing `product` by `nums[left]`. +4. For each position of `right`, the number of valid subarray ending at `right` is `right - left + 1`. +5. Sum these counts to get the total number of subarray with product less than `k`. + +### Python Code: +```python +def num_subarray_product_less_than_k(nums, k): + if k <= 1: + return 0 + + product = 1 + left = 0 + count = 0 + + for right in range(len(nums)): + product *= nums[right] + + while product >= k: + product /= nums[left] + left += 1 + + count += right - left + 1 + + return count + +# Example usage: +nums = [10, 5, 2, 6] +k = 100 +print(num_subarray_product_less_than_k(nums, k)) # Output: 8 +``` + +## Advantages + +- **Efficiency**: Reduces the time complexity from O(n^2) to O(n) for many problems. +- **Simplicity**: Provides a straightforward way to manage subarrays/substrings with overlapping elements. + +## Applications + +- Finding the maximum or minimum sum of subarrays of fixed size. +- Detecting unique elements in a sequence. +- Solving problems related to dynamic programming with fixed constraints. +- Efficiently managing and processing streaming data or real-time analytics. + +By using the sliding window technique, you can tackle a wide range of problems in a more efficient manner. diff --git a/contrib/ds-algorithms/sorting-algorithms.md b/contrib/ds-algorithms/sorting-algorithms.md new file mode 100644 index 00000000..a3cd72e2 --- /dev/null +++ b/contrib/ds-algorithms/sorting-algorithms.md @@ -0,0 +1,618 @@ +# Sorting Algorithms + +In computer science, a sorting algorithm takes a collection of items and arranges them in a specific order. This order is usually determined by comparing the items using a defined rule. + +Real Life Example of Sorting + +- Sorting a deck of cards +- Sorting names in alphabetical order +- Sorting a list of items, etc. + +Some common sorting techniques: + +## 1. Bubble Sort + +Bubble sort is a basic sorting technique that iteratively steps through a list, comparing neighboring elements. If elements are out of order, it swaps them. While easy to understand, bubble sort becomes inefficient for large datasets due to its slow execution time. + +**Algorithm Overview:** +- **Pass by Pass:** During each pass, the algorithm iterates through the list. +- **Comparing Neighbors:** In each iteration, it compares adjacent elements in the list. +- **Swapping for Order:** If the elements are in the wrong order (typically, the first being larger than the second), it swaps their positions. +- **Bubbling Up the Largest:** This swapping process effectively pushes the largest element encountered in a pass towards the end of the list, like a bubble rising in water. +- **Repeating Until Sorted:** The algorithm continues making passes through the list until no more swaps are needed. This indicates the entire list is sorted. + + +### Bubble Sort Code in Python + +```python +def bubble_sort(arr): + n = len(arr) + for i in range(n): + for j in range(0, n-i-1): + if arr[j] > arr[j+1]: + arr[j], arr[j+1] = arr[j+1], arr[j] + +arr = [5, 3, 8, 1, 2] +bubble_sort(arr) +print("Sorted array:", arr) # Output: [1, 2, 3, 5, 8] +``` + +### Example with Visualization + +Let's sort the list `[5, 3, 8, 1, 2]` using bubble sort. + +1. **Pass 1:** + - Comparing neighbors: `[3, 5, 1, 2, 8]` + - Swapping: `[3, 5, 1, 2, 8]` → `[3, 1, 5, 2, 8]` → `[3, 1, 2, 5, 8]` + - Result: `[3, 1, 2, 5, 8]` + +2. **Pass 2:** + - Comparing neighbors: `[1, 3, 2, 5, 8]` + - Swapping: `[1, 3, 2, 5, 8]` → `[1, 2, 3, 5, 8]` + - Result: `[1, 2, 3, 5, 8]` + +3. **Pass 3:** + - Comparing neighbors: `[1, 2, 3, 5, 8]` + - No swapping needed, the list is already sorted. + +### Complexity Analysis + +- **Worst Case:** `O(n^2)` comparisons and swaps. This happens when the list is in reverse order, and we need to make maximum swaps. +- **Best Case:** `O(n)` comparisons. This occurs when the list is already sorted, but we still need O(n^2) swaps because of the nested loops. +- **Average Case:** `O(n^2)` comparisons and swaps. This is the expected number of comparisons and swaps over all possible input sequences. + +## 2. Selection Sort + +Selection sort is a simple sorting algorithm that divides the input list into two parts: a sorted sublist and an unsorted sublist. The algorithm repeatedly finds the smallest (or largest, depending on sorting order) element from the unsorted sublist and moves it to the sorted sublist. It's not efficient for large datasets but performs better than bubble sort due to fewer swaps. + +**Algorithm Overview:** +- **Initial State:** The entire list is considered unsorted initially. +- **Selecting the Minimum:** The algorithm repeatedly selects the smallest element from the unsorted sublist and moves it to the sorted sublist. +- **Expanding the Sorted Sublist:** As elements are moved to the sorted sublist, it expands until all elements are sorted. +- **Repeating Until Sorted:** The process continues until the entire list is sorted. + +### Example with Visualization + +Let's sort the list `[5, 3, 8, 1, 2]` using selection sort. + +1. **Pass 1:** + - Initial list: `[5, 3, 8, 1, 2]` + - Find the minimum: `1` + - Swap with the first element: `[1, 3, 8, 5, 2]` + +2. **Pass 2:** + - Initial list: `[1, 3, 8, 5, 2]` + - Find the minimum: `2` + - Swap with the second element: `[1, 2, 8, 5, 3]` + +3. **Pass 3:** + - Initial list: `[1, 2, 8, 5, 3]` + - Find the minimum: `3` + - Swap with the third element: `[1, 2, 3, 5, 8]` + +4. **Pass 4:** + - Initial list: `[1, 2, 3, 5, 8]` + - Find the minimum: `5` + - No swapping needed, the list is already sorted. + +### Selection Sort Code in Python + +```python +def selection_sort(arr): + n = len(arr) + for i in range(n): + min_index = i + for j in range(i+1, n): + if arr[j] < arr[min_index]: + min_index = j + arr[i], arr[min_index] = arr[min_index], arr[i] + +arr = [5, 3, 8, 1, 2] +selection_sort(arr) +print("Sorted array:", arr) # Output: [1, 2, 3, 5, 8] +``` + +### Complexity Analysis + +- **Worst Case**: `O(n^2)` comparisons and O(n) swaps. This occurs when the list is in reverse order, and we need to make maximum comparisons and swaps. +- **Best Case**: `O(n^2)` comparisons and O(n) swaps. This happens when the list is in sorted order, but the algorithm still needs to iterate through all elements for comparisons. +- **Average Case**: `O(n^2)` comparisons and O(n) swaps. This is the expected number of comparisons and swaps over all possible input sequences. + +## 3. Quick Sort +Quick sort is a popular divide-and-conquer sorting algorithm known for its efficiency on average. It works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays according to whether they are less than or greater than the pivot. The sub-arrays are then recursively sorted. + +**Algorithm Overview:** +- **Pivot Selection:** Choose a pivot element from the array. Common strategies include selecting the first, last, middle, or a randomly chosen element. +- **Partitioning:** Rearrange the array so that all elements less than the pivot are on its left, and all elements greater than the pivot are on its right. This step ensures that the pivot element is placed in its correct sorted position. +- **Recursion:** Apply the above steps recursively to the sub-arrays formed by partitioning until the base case is reached. The base case is usually when the size of the sub-array becomes 0 or 1, indicating it is already sorted. +- **Base Case:** If the sub-array size becomes 0 or 1, it is already sorted. + +### Example with Visualization + +Let's sort the list `[5, 3, 8, 1, 2]` using quick sort. + +1. **Initial Array:** `[5, 3, 8, 1, 2]` +2. **Choose Pivot:** Let's choose the last element, `2`, as the pivot. +3. **Partitioning:** + - We'll partition the array around the pivot `2`. All elements less than `2` will be placed to its left, and all elements greater than `2` will be placed to its right. + - After partitioning, the array becomes `[1, 2, 5, 3, 8]`. The pivot element, `2`, is now in its correct sorted position. +4. **Recursion:** + - Now, we recursively sort the sub-arrays `[1]` and `[5, 3, 8]`. + - For the sub-array `[5, 3, 8]`, we choose `8` as the pivot and partition it. + - After partitioning, the sub-array becomes `[3, 5, 8]`. The pivot element, `8`, is now in its correct sorted position. +5. **Concatenation:** + - Concatenating the sorted sub-arrays `[1]`, `[2]`, `[3, 5, 8]`, we get the final sorted array `[1, 2, 3, 5, 8]`. + +### Quick Sort Code in Python (Iterative) + +```python +def partition(arr, low, high): + pivot = arr[high] + i = low - 1 + for j in range(low, high): + if arr[j] < pivot: + i += 1 + arr[i], arr[j] = arr[j], arr[i] + arr[i + 1], arr[high] = arr[high], arr[i + 1] + return i + 1 + +def quick_sort_iterative(arr): + stack = [(0, len(arr) - 1)] + while stack: + low, high = stack.pop() + if low < high: + pi = partition(arr, low, high) + stack.append((low, pi - 1)) + stack.append((pi + 1, high)) + +# Example usage: +arr = [38, 27, 43, 3, 9, 82, 10] +quick_sort_iterative(arr) +print("Sorted array:", arr) # Output: [3, 9, 10, 27, 38, 43, 82] + +``` + +### Quick Sort Code in Python (Recursive) + +```python +def quick_sort(arr): + if len(arr) <= 1: + return arr + else: + pivot = arr[-1] + left = [x for x in arr[:-1] if x < pivot] + right = [x for x in arr[:-1] if x >= pivot] + return quick_sort(left) + [pivot] + quick_sort(right) + +arr = [5, 3, 8, 1, 2] +sorted_arr = quick_sort(arr) +print("Sorted array:", sorted_arr) # Output: [1, 2, 3, 5, 8] +``` + +### Complexity Analysis + +- **Worst Case**: The worst-case time complexity of quick sort is `O(n^2)`. This occurs when the pivot selection consistently results in unbalanced partitioning, such as choosing the smallest or largest element as the pivot. +-**Best Case**: The best-case time complexity is `O(n log n)`. This happens when the pivot selection leads to well-balanced partitioning, halving the array size in each recursive call. +- **Average Case**: The average-case time complexity is `O(n log n)`. This is the expected time complexity when the pivot selection results in reasonably balanced partitioning across recursive calls. +- **Space Complexity**: Quick sort has an `O(log n)` space complexity for the recursion stack, as it recursively sorts sub-arrays. + +## 4. Merge Sort + +Merge sort is a divide-and-conquer algorithm that recursively divides the input list into smaller sublists until each sublist contains only one element. Then, it repeatedly merges adjacent sublists while maintaining the sorted order until there is only one sublist remaining, which represents the sorted list. + +**Algorithm Overview:** +- **Divide:** Split the input list into smaller sublists recursively until each sublist contains only one element. +- **Merge:** Repeatedly merge adjacent sublists while maintaining the sorted order until there is only one sublist remaining, which represents the sorted list. + +### Example with Visualization + +Let's sort the list `[38, 27, 43, 3, 9, 82, 10]` using merge sort. + +1. **Initial Division:** + - Divide the list into sublists: `[38, 27, 43, 3, 9, 82, 10]` + - Visually it looks like + `[38], [27], [43], [3], [9], [82], [10]` + +2. **Merge Passes:** + - Merge adjacent sublists while maintaining sorted order: + - Pass 1: `[27, 38]`, `[3, 43]`, `[9, 82]`, `[10]` + - Pass 2: `[3, 27, 38, 43]`, `[9, 10, 82]` + - Pass 3: `[3, 9, 10, 27, 38, 43, 82]` + + +3. **Final Sorted List:** + - `[3, 9, 10, 27, 38, 43, 82]` + +### Merge Sort Code in Python (Iterative) + +```python +def merge_sort_iterative(arr): + n = len(arr) + curr_size = 1 + while curr_size < n: + left = 0 + while left < n - 1: + mid = min(left + curr_size - 1, n - 1) + right = min(left + 2 * curr_size - 1, n - 1) + merge(arr, left, mid, right) + left += 2 * curr_size + curr_size *= 2 + +def merge(arr, left, mid, right): + n1 = mid - left + 1 + n2 = right - mid + L = [0] * n1 + R = [0] * n2 + for i in range(n1): + L[i] = arr[left + i] + for j in range(n2): + R[j] = arr[mid + 1 + j] + i = j = 0 + k = left + while i < n1 and j < n2: + if L[i] <= R[j]: + arr[k] = L[i] + i += 1 + else: + arr[k] = R[j] + j += 1 + k += 1 + while i < n1: + arr[k] = L[i] + i += 1 + k += 1 + while j < n2: + arr[k] = R[j] + j += 1 + k += 1 + +arr = [38, 27, 43, 3, 9, 82, 10] +merge_sort_iterative(arr) +print("Sorted array:", arr) # Output: [3, 9, 10, 27, 38, 43, 82] +``` + +### Merge Sort Code in Python (Recursive) + +```python +def merge_sort(arr): + if len(arr) > 1: + mid = len(arr) // 2 + left_half = arr[:mid] + right_half = arr[mid:] + + merge_sort(left_half) + merge_sort(right_half) + + i = j = k = 0 + + # Merge the two sorted halves + while i < len(left_half) and j < len(right_half): + if left_half[i] < right_half[j]: + arr[k] = left_half[i] + i += 1 + else: + arr[k] = right_half[j] + j += 1 + k += 1 + + # Check if any elements are remaining in the left half + while i < len(left_half): + arr[k] = left_half[i] + i += 1 + k += 1 + + # Check if any elements are remaining in the right half + while j < len(right_half): + arr[k] = right_half[j] + j += 1 + k += 1 + +# Example usage: +arr = [38, 27, 43, 3, 9, 82, 10] +merge_sort(arr) +print("Sorted array:", arr) # Output: [3, 9, 10, 27, 38, 43, 82] +``` + +### Complexity Analysis +- **Time Complexity**: `O(n log n)` for all cases. Merge sort always divides the list into halves until each sublist contains only one element, and then merges them back together, resulting in O(n log n) time complexity. +- **Space Complexity**: `O(n)` auxiliary space. In the iterative version, merge sort uses additional space for creating temporary sublists during merging operations. + +## 5. Insertion Sort + +Insertion sort is a straightforward and efficient sorting algorithm for small datasets. It builds the final sorted array one element at a time. It is much like sorting playing cards in your hands: you take one card at a time and insert it into its correct position among the already sorted cards. + +**Algorithm Overview:** +- **Start from the Second Element:** Begin with the second element, assuming the first element is already sorted. +- **Compare with Sorted Subarray:** Take the current element and compare it with elements in the sorted subarray (the part of the array before the current element). +- **Insert in Correct Position:** Shift all elements in the sorted subarray that are greater than the current element to one position ahead. Insert the current element into its correct position. +- **Repeat Until End:** Repeat this process for all elements in the array. + +### Example with Visualization +Let's sort the list `[5, 3, 8, 1, 2]` using insertion sort. + +**Step-by-Step Visualization:** +**Initial List:** `[5, 3, 8, 1, 2]` + +1. **Pass 1:** + - Current element: 3 + - Compare 3 with 5, move 5 to the right: `[5, 5, 8, 1, 2]` + - Insert 3 in its correct position: `[3, 5, 8, 1, 2]` + +2. **Pass 2:** + - Current element: 8 + - 8 is already in the correct position: `[3, 5, 8, 1, 2]` + +3. **Pass 3:** + - Current element: 1 + - Compare 1 with 8, move 8 to the right: `[3, 5, 8, 8, 2]` + - Compare 1 with 5, move 5 to the right: `[3, 5, 5, 8, 2]` + - Compare 1 with 3, move 3 to the right: `[3, 3, 5, 8, 2]` + - Insert 1 in its correct position: `[1, 3, 5, 8, 2]` + +4. **Pass 4:** + - Current element: 2 + - Compare 2 with 8, move 8 to the right: `[1, 3, 5, 8, 8]` + - Compare 2 with 5, move 5 to the right: `[1, 3, 5, 5, 8]` + - Compare 2 with 3, move 3 to the right: `[1, 3, 3, 5, 8]` + - Insert 2 in its correct position: `[1, 2, 3, 5, 8]` + +### Insertion Sort Code in Python + + +```python + +def insertion_sort(arr): + # Traverse from 1 to len(arr) + for i in range(1, len(arr)): + key = arr[i] + # Move elements of arr[0..i-1], that are greater than key, + # to one position ahead of their current position + j = i - 1 + while j >= 0 and key < arr[j]: + arr[j + 1] = arr[j] + j -= 1 + arr[j + 1] = key + +# Example usage +arr = [5, 3, 8, 1, 2] +insertion_sort(arr) +print("Sorted array:", arr) # Output: [1, 2, 3, 5, 8] +``` + +### Complexity Analysis + - **Worst Case:** `𝑂(𝑛^2)` comparisons and swaps. This occurs when the array is in reverse order. + - **Best Case:** `𝑂(𝑛)` comparisons and `𝑂(1)` swaps. This happens when the array is already sorted. + - **Average Case:** `𝑂(𝑛^2)` comparisons and swaps. This is the expected number of comparisons and swaps over all possible input sequences. + +## 6. Heap Sort + +Heap Sort is an efficient comparison-based sorting algorithm that uses a binary heap data structure. It divides its input into a sorted and an unsorted region and iteratively shrinks the unsorted region by extracting the largest (or smallest) element and moving it to the sorted region. + +**Algorithm Overview:** +- **Build a Max Heap:** Convert the array into a max heap, a complete binary tree where the value of each node is greater than or equal to the values of its children. +- **Heapify:** Ensure that the subtree rooted at each node satisfies the max heap property. This process is called heapify. +- **Extract Maximum:** Swap the root (the maximum element) with the last element of the heap and reduce the heap size by one. Restore the max heap property by heapifying the root. +- **Repeat:** Continue extracting the maximum element and heapifying until the entire array is sorted. + +### Example with Visualization + +Let's sort the list `[5, 3, 8, 1, 2]` using heap sort. + +1. **Build Max Heap:** + - Initial array: `[5, 3, 8, 1, 2]` + - Start heapifying from the last non-leaf node. + - Heapify at index 1: `[5, 3, 8, 1, 2]` (no change, children are already less than the parent) + - Heapify at index 0: `[8, 3, 5, 1, 2]` (swap 5 and 8 to make 8 the root) + +2. **Heapify Process:** + - Heapify at index 0: `[8, 3, 5, 1, 2]` (no change needed, already a max heap) + +3. **Extract Maximum:** + - Swap root with the last element: `[2, 3, 5, 1, 8]` + - Heapify at index 0: `[5, 3, 2, 1, 8]` (swap 2 and 5) + +4. **Repeat Extraction:** + - Swap root with the second last element: `[1, 3, 2, 5, 8]` + - Heapify at index 0: `[3, 1, 2, 5, 8]` (swap 1 and 3) + - Swap root with the third last element: `[2, 1, 3, 5, 8]` + - Heapify at index 0: `[2, 1, 3, 5, 8]` (no change needed) + - Swap root with the fourth last element: `[1, 2, 3, 5, 8]` + +After all extractions, the array is sorted: `[1, 2, 3, 5, 8]`. + +### Heap Sort Code in Python + +```python +def heapify(arr, n, i): + largest = i # Initialize largest as root + left = 2 * i + 1 # left child index + right = 2 * i + 2 # right child index + + # See if left child of root exists and is greater than root + if left < n and arr[largest] < arr[left]: + largest = left + + # See if right child of root exists and is greater than root + if right < n and arr[largest] < arr[right]: + largest = right + + # Change root, if needed + if largest != i: + arr[i], arr[largest] = arr[largest], arr[i] # swap + + # Heapify the root. + heapify(arr, n, largest) + +def heap_sort(arr): + n = len(arr) + + # Build a maxheap. + for i in range(n // 2 - 1, -1, -1): + heapify(arr, n, i) + + # One by one extract elements + for i in range(n - 1, 0, -1): + arr[i], arr[0] = arr[0], arr[i] # swap + heapify(arr, i, 0) + +# Example usage +arr = [5, 3, 8, 1, 2] +heap_sort(arr) +print("Sorted array:", arr) # Output: [1, 2, 3, 5, 8] +``` + +### Complexity Analysis + - **Worst Case:** `𝑂(𝑛log𝑛)`. Building the heap takes `𝑂(𝑛)` time, and each of the 𝑛 element extractions takes `𝑂(log𝑛)` time. + - **Best Case:** `𝑂(𝑛log𝑛)`. Even if the array is already sorted, heap sort will still build the heap and perform the extractions. + - **Average Case:** `𝑂(𝑛log𝑛)`. Similar to the worst-case, the overall complexity remains `𝑂(𝑛log𝑛)` because each insertion and deletion in a heap takes `𝑂(log𝑛)` time, and these operations are performed 𝑛 times. + + ## 7. Radix Sort +Radix Sort is a non-comparative integer sorting algorithm that sorts numbers by processing individual digits. It processes digits from the least significant digit (LSD) to the most significant digit (MSD) or vice versa. This algorithm is efficient for sorting numbers with a fixed number of digits. + +**Algorithm Overview:** +- **Digit by Digit sorting:** Radix sort processes the digits of the numbers starting from either the least significant digit (LSD) or the most significant digit (MSD). Typically, LSD is used. +- **Stable Sort:** A stable sorting algorithm like Counting Sort or Bucket Sort is used as an intermediate sorting technique. Radix Sort relies on this stability to maintain the relative order of numbers with the same digit value. +- **Multiple passes:** The algorithm performs multiple passes over the numbers, one for each digit, from the least significant to the most significant. + +### Radix Sort Code in Python + +```python +def counting_sort(arr, exp): + n = len(arr) + output = [0] * n + count = [0] * 10 + + for i in range(n): + index = arr[i] // exp + count[index % 10] += 1 + + for i in range(1, 10): + count[i] += count[i - 1] + + i = n - 1 + while i >= 0: + index = arr[i] // exp + output[count[index % 10] - 1] = arr[i] + count[index % 10] -= 1 + i -= 1 + + for i in range(n): + arr[i] = output[i] + +def radix_sort(arr): + max_num = max(arr) + exp = 1 + while max_num // exp > 0: + counting_sort(arr, exp) + exp *= 10 + +# Example usage +arr = [170, 45, 75, 90] +print("Original array:", arr) +radix_sort(arr) +print("Sorted array:", arr) +``` + +### Complexity Analysis + - **Time Complexity:** O(d * (n + k)) for all cases. Radix Sort always processes each digit of every number in the array. + - **Space Complexity:** O(n + k). This is due to the space required for: +- The output array used in Counting Sort, which is of size n. +- The count array used in Counting Sort, which is of size k. + +## 8. Counting Sort +Counting sort is a sorting technique based on keys between a specific range. It works by counting the number of objects having distinct key values (kind of hashing). Then do some arithmetic to calculate the position of each object in the output sequence. + +**Algorithm Overview:** +- Convert the input string into a list of characters. +- Count the occurrence of each character in the list using the collections.Counter() method. +- Sort the keys of the resulting Counter object to get the unique characters in the list in sorted order. +- For each character in the sorted list of keys, create a list of repeated characters using the corresponding count from the Counter object. +- Concatenate the lists of repeated characters to form the sorted output list. + + +### Counting Sort Code in Python using counter method. + +```python +from collections import Counter + +def counting_sort(arr): + count = Counter(arr) + output = [] + for c in sorted(count.keys()): + output += * count + return output + +arr = "geeksforgeeks" +arr = list(arr) +arr = counting_sort(arr) +output = ''.join(arr) +print("Sorted character array is", output) + +``` +### Counting Sort Code in Python using sorted() and reduce(): + +```python +from functools import reduce +string = "geeksforgeeks" +sorted_str = reduce(lambda x, y: x+y, sorted(string)) +print("Sorted string:", sorted_str) +``` + +### Complexity Analysis + - **Time Complexity:** O(n+k) for all cases.No matter how the elements are placed in the array, the algorithm goes through n+k times + - **Space Complexity:** O(max). Larger the range of elements, larger is the space complexity. + + +## 9. Cyclic Sort + +### Theory +Cyclic Sort is an in-place sorting algorithm that is useful for sorting arrays where the elements are in a known range (e.g., 1 to N). The key idea behind the algorithm is that each number should be placed at its correct index. If we find a number that is not at its correct index, we swap it with the number at its correct index. This process is repeated until every number is at its correct index. + +### Algorithm +- Iterate over the array from the start to the end. +- For each element, check if it is at its correct index. +- If it is not at its correct index, swap it with the element at its correct index. +- Continue this process until the element at the current index is in its correct position. Move to the next index and repeat the process until the end of the array is reached. + +### Steps +- Start with the first element. +- Check if it is at the correct index (i.e., if arr[i] == i + 1). +- If it is not, swap it with the element at the index arr[i] - 1. +- Repeat step 2 for the current element until it is at the correct index. +- Move to the next element and repeat the process. + +### Code + +```python +def cyclic_sort(nums): + i = 0 + while i < len(nums): + correct_index = nums[i] - 1 + if nums[i] != nums[correct_index]: + nums[i], nums[correct_index] = nums[correct_index], nums[i] # Swap + else: + i += 1 + return nums +``` + +### Example +``` +arr = [3, 1, 5, 4, 2] +sorted_arr = cyclic_sort(arr) +print(sorted_arr) + ``` +### Output +``` +[1, 2, 3, 4, 5] +``` + +### Complexity Analysis +**Time Complexity:** + +The time complexity of Cyclic Sort is **O(n)**. +This is because in each cycle, each element is either placed in its correct position or a swap is made. Since each element is swapped at most once, the total number of swaps (and hence the total number of operations) is linear in the number of elements. + +**Space Complexity:** + +The space complexity of Cyclic Sort is **O(1)**. +This is because the algorithm only requires a constant amount of additional space beyond the input array. \ No newline at end of file diff --git a/contrib/ds-algorithms/splay-trees.md b/contrib/ds-algorithms/splay-trees.md new file mode 100644 index 00000000..ee900ed8 --- /dev/null +++ b/contrib/ds-algorithms/splay-trees.md @@ -0,0 +1,162 @@ +# Splay Tree + +In Data Structures and Algorithms, a **Splay Tree** is a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, search, and deletion in O(log n) amortized time. This is achieved by a process called **splaying**, where the accessed node is moved to the root through a series of tree rotations. + +## Points to be Remembered + +- **Splaying**: Moving the accessed node to the root using rotations. +- **Rotations**: Tree rotations (left and right) are used to balance the tree during splaying. +- **Self-adjusting**: The tree adjusts itself with each access, keeping frequently accessed nodes near the root. + +## Real Life Examples of Splay Trees + +- **Cache Implementation**: Frequently accessed data is kept near the top of the tree, making repeated accesses faster. +- **Networking**: Routing tables in network switches can use splay trees to prioritize frequently accessed routes. + +## Applications of Splay Trees + +Splay trees are used in various applications in Computer Science: + +- **Cache Implementations** +- **Garbage Collection Algorithms** +- **Data Compression Algorithms (e.g., LZ78)** + +Understanding these applications is essential for Software Development. + +## Operations in Splay Tree + +Key operations include: + +- **INSERT**: Insert a new element into the splay tree. +- **SEARCH**: Find the position of an element in the splay tree. +- **DELETE**: Remove an element from the splay tree. + +## Implementing Splay Tree in Python + +```python +class SplayTreeNode: + def __init__(self, key): + self.key = key + self.left = None + self.right = None + +class SplayTree: + def __init__(self): + self.root = None + + def insert(self, key): + self.root = self.splay_insert(self.root, key) + + def search(self, key): + self.root = self.splay_search(self.root, key) + return self.root + + def splay(self, root, key): + if not root or root.key == key: + return root + + if root.key > key: + if not root.left: + return root + if root.left.key > key: + root.left.left = self.splay(root.left.left, key) + root = self.rotateRight(root) + elif root.left.key < key: + root.left.right = self.splay(root.left.right, key) + if root.left.right: + root.left = self.rotateLeft(root.left) + return root if not root.left else self.rotateRight(root) + + else: + if not root.right: + return root + if root.right.key > key: + root.right.left = self.splay(root.right.left, key) + if root.right.left: + root.right = self.rotateRight(root.right) + elif root.right.key < key: + root.right.right = self.splay(root.right.right, key) + root = self.rotateLeft(root) + return root if not root.right else self.rotateLeft(root) + + def splay_insert(self, root, key): + if not root: + return SplayTreeNode(key) + + root = self.splay(root, key) + + if root.key == key: + return root + + new_node = SplayTreeNode(key) + + if root.key > key: + new_node.right = root + new_node.left = root.left + root.left = None + else: + new_node.left = root + new_node.right = root.right + root.right = None + + return new_node + + def splay_search(self, root, key): + return self.splay(root, key) + + def rotateRight(self, node): + temp = node.left + node.left = temp.right + temp.right = node + return temp + + def rotateLeft(self, node): + temp = node.right + node.right = temp.left + temp.left = node + return temp + + def preOrder(self, root): + if root: + print(root.key, end=' ') + self.preOrder(root.left) + self.preOrder(root.right) + +#Example usage: +splay_tree = SplayTree() +splay_tree.insert(50) +splay_tree.insert(30) +splay_tree.insert(20) +splay_tree.insert(40) +splay_tree.insert(70) +splay_tree.insert(60) +splay_tree.insert(80) + +print("Preorder traversal of the Splay tree is:") +splay_tree.preOrder(splay_tree.root) + +splay_tree.search(60) + +print("\nSplay tree after search operation for key 60:") +splay_tree.preOrder(splay_tree.root) +``` + +## Output + +```markdown +Preorder traversal of the Splay tree is: +50 30 20 40 70 60 80 + +Splay tree after search operation for key 60: +60 50 30 20 40 70 80 +``` + +## Complexity Analysis + +The worst-case time complexities of the main operations in a Splay Tree are as follows: + +- **Insertion**: (O(n)). In the worst case, insertion may take linear time if the tree is highly unbalanced. +- **Search**: (O(n)). In the worst case, searching for a node may take linear time if the tree is highly unbalanced. +- **Deletion**: (O(n)). In the worst case, deleting a node may take linear time if the tree is highly unbalanced. + +While these operations can take linear time in the worst case, the splay operation ensures that the tree remains balanced over a sequence of operations, leading to better average-case performance. \ No newline at end of file diff --git a/contrib/ds-algorithms/stacks.md b/contrib/ds-algorithms/stacks.md new file mode 100644 index 00000000..428a1938 --- /dev/null +++ b/contrib/ds-algorithms/stacks.md @@ -0,0 +1,116 @@ +# Stacks in Python + +In Data Structures and Algorithms, a stack is a linear data structure that complies with the Last In, First Out (LIFO) rule. It works by use of two fundamental techniques: **PUSH** which inserts an element on top of the stack and **POP** which takes out the topmost element.This concept is similar to a stack of plates in a cafeteria. Stacks are usually used for handling function calls, expression evaluation, and parsing in programming. Indeed, they are efficient in managing memory as well as tracking program state. + +## Points to be Remebered + +- A stack is a collection of data items that can be accessed at only one end, called **TOP**. +- Items can be inserted and deleted in a stack only at the TOP. +- The last item inserted in a stack is the first one to be deleted. +- Therefore, a stack is called a **Last-In-First-Out (LIFO)** data structure. + +## Real Life Examples of Stacks + +- **PILE OF BOOKS** - Suppose a set of books are placed one over the other in a pile. When you remove books from the pile, the topmost book will be removed first. Similarly, when you have to add a book to the pile, the book will be placed at the top of the file. + +- **PILE OF PLATES** - The first plate begins the pile. The second plate is placed on the top of the first plate and the third plate is placed on the top of the second plate, and so on. In general, if you want to add a plate to the pile, you can keep it on the top of the pile. Similarly, if you want to remove a plate, you can remove the plate from the top of the pile. + +- **BANGLES IN A HAND** - When a person wears bangles, the last bangle worn is the first one to be removed. + +## Applications of Stacks + +Stacks are widely used in Computer Science: + +- Function call management +- Maintaining the UNDO list for the application +- Web browser *history management* +- Evaluating expressions +- Checking the nesting of parentheses in an expression +- Backtracking algorithms (Recursion) + +Understanding these applications is essential for Software Development. + +## Operations on a Stack + +Key operations on a stack include: + +- **PUSH** - It is the process of inserting a new element on the top of a stack. +- **OVERFLOW** - A situation when we are pushing an item in a stack that is full. +- **POP** - It is the process of deleting an element from the top of a stack. +- **UNDERFLOW** - A situation when we are popping item from an empty stack. +- **PEEK** - It is the process of getting the most recent value of stack *(i.e. the value at the top of the stack)* +- **isEMPTY** - It is the function which return true if stack is empty else false. +- **SHOW** -Displaying stack items. + +## Implementing Stacks in Python + +```python +def isEmpty(S): + if len(S) == 0: + return True + else: + return False + +def Push(S, item): + S.append(item) + +def Pop(S): + if isEmpty(S): + return "Underflow" + else: + val = S.pop() + return val + +def Peek(S): + if isEmpty(S): + return "Underflow" + else: + top = len(S) - 1 + return S[top] + +def Show(S): + if isEmpty(S): + print("Sorry, No items in Stack") + else: + print("(Top)", end=' ') + t = len(S) - 1 + while t >= 0: + print(S[t], "<", end=' ') + t -= 1 + print() + +stack = [] # initially stack is empty + +Push(stack, 5) +Push(stack, 10) +Push(stack, 15) + +print("Stack after Push operations:") +Show(stack) +print("Peek operation:", Peek(stack)) +print("Pop operation:", Pop(stack)) +print("Stack after Pop operation:") +Show(stack) +``` + +## Output + +```markdown +Stack after Push operations: + +(Top) 15 < 10 < 5 < + +Peek operation: 15 + +Pop operation: 15 + +Stack after Pop operation: + +(Top) 10 < 5 < +``` + +## Complexity Analysis + +- **Worst case**: `O(n)` This occurs when the stack is full, it is dominated by the usage of Show operation. +- **Best case**: `O(1)` When the operations like isEmpty, Push, Pop and Peek are used, they have a constant time complexity of O(1). +- **Average case**: `O(n)` The average complexity is likely to be lower than O(n), as the stack is not always full. diff --git a/contrib/ds-algorithms/time-space-complexity.md b/contrib/ds-algorithms/time-space-complexity.md new file mode 100644 index 00000000..eeadd649 --- /dev/null +++ b/contrib/ds-algorithms/time-space-complexity.md @@ -0,0 +1,243 @@ +# Time and Space Complexity + +We can solve a problem using one or more algorithms. It's essential to learn how to compare the performance of different algorithms and select the best one for a specific task. + +Therefore, it is highly required to use a method to compare the solutions in order to judge which one is more optimal. + +The method must be: + +- Regardless of the system or its settings on which the algorithm is executing. +- Demonstrate a direct relationship with the quantity of inputs. +- Able to discriminate between two methods with clarity and precision. + +Two such methods use to analyze algorithms are `time complexity` and `space complexity`. + +## What is Time Complexity? + +The _number of operations an algorithm performs in proportion to the quantity of the input_ is measured by time complexity. It facilitates our investigation of how the performance of the algorithm scales with increasing input size. But in real life, **_time complexity does not refer to the time taken by the machine to execute a particular code_**. + +## Order of Growth and Asymptotic Notations + +The Order of Growth explains how an algorithm's space or running time expands as the amount of the input does. This increase is described via asymptotic language, such Big O notation, which concentrates on the dominating term as the input size approaches infinity and is independent of lower-order terms and machine-specific constants. + +### Common Asymptotic Notation + +1. `Big Oh (O)`: Provides the worst-case scenario for describing the upper bound of an algorithm's execution time. +2. `Big Omega (Ω)`: Provides the best-case scenario and describes the lower bound. +3. `Big Theta (Θ)`: Gives a tight constraint on the running time by describing both the upper and lower bounds. + +### 1. Big Oh (O) Notation + +Big O notation describes how an algorithm behaves as the input size gets closer to infinity and provides an upper bound on the time or space complexity of the method. It helps developers and computer scientists to evaluate the effectiveness of various algorithms without regard to the software or hardware environment. + +To denote asymptotic upper bound, we use O-notation. For a given function `g(n)`, we denote by `O(g(n))` (pronounced "big-oh of g of n") the set of functions: + +$$ +O(g(n)) = \{ f(n) : \exists \text{ positive constants } c \text{ and } n_0 \text{ such that } 0 \leq f(n) \leq c \cdot g(n) \text{ for all } n \geq n_0 \} +$$ + +Graphical representation of Big Oh: + +![BigOh Notation Graph](images/Time-And-Space-Complexity-BigOh.png) + +### 2. Big Omega (Ω) Notation + +Big Omega (Ω) notation is used to describe the lower bound of an algorithm's running time. It provides a way to express the minimum time complexity that an algorithm will take to complete. In other words, Big Omega gives us a guarantee that the algorithm will take at least a certain amount of time to run, regardless of other factors. + +To denote asymptotic lower bound, we use Omega-notation. For a given function `g(n)`, we denote by `Ω(g(n))` (pronounced "big-omega of g of n") the set of functions: + +$$ +\Omega(g(n)) = \{ f(n) : \exists \text{ positive constants } c \text{ and } n_0 \text{ such that } 0 \leq c \cdot g(n) \leq f(n) \text{ for all } n \geq n_0 \} +$$ + +Graphical representation of Big Omega: + +![BigOmega Notation Graph](images/Time-And-Space-Complexity-BigOmega.png) + +### 3. Big Theta (Θ) Notation + +Big Theta (Θ) notation provides a way to describe the asymptotic tight bound of an algorithm's running time. It offers a precise measure of the time complexity by establishing both an upper and lower bound, indicating that the running time of an algorithm grows at the same rate as a given function, up to constant factors. + +To denote asymptotic tight bound, we use Theta-notation. For a given function `g(n)`, we denote by `Θ(g(n))` (pronounced "big-theta of g of n") the set of functions: + +$$ +\Theta(g(n)) = \{ f(n) : \exists \text{ positive constants } c_1, c_2, \text{ and } n_0 \text{ such that } 0 \leq c_1 \cdot g(n) \leq f(n) \leq c_2 \cdot g(n) \text{ for all } n \geq n_0 \} +$$ + +Graphical representation of Big Theta: + +![Big Theta Notation Graph](images/Time-And-Space-Complexity-BigTheta.png) + +## Best Case, Worst Case and Average Case + +### 1. Best-Case Scenario: + +The best-case scenario refers to the situation where an algorithm performs optimally, achieving the lowest possible time or space complexity. It represents the most favorable conditions under which an algorithm operates. + +#### Characteristics: + +- Represents the minimum time or space required by an algorithm to solve a problem. +- Occurs when the input data is structured in such a way that the algorithm can exploit its strengths fully. +- Often used to analyze the lower bound of an algorithm's performance. + +#### Example: + +Consider the `linear search algorithm` where we're searching for a `target element` in an array. The best-case scenario occurs when the target element is found `at the very beginning of the array`. In this case, the algorithm would only need to make one comparison, resulting in a time complexity of `O(1)`. + +### 2. Worst-Case Scenario: + +The worst-case scenario refers to the situation where an algorithm performs at its poorest, achieving the highest possible time or space complexity. It represents the most unfavorable conditions under which an algorithm operates. + +#### Characteristics: + +- Represents the maximum time or space required by an algorithm to solve a problem. +- Occurs when the input data is structured in such a way that the algorithm encounters the most challenging conditions. +- Often used to analyze the upper bound of an algorithm's performance. + +#### Example: + +Continuing with the `linear search algorithm`, the worst-case scenario occurs when the `target element` is either not present in the array or located `at the very end`. In this case, the algorithm would need to iterate through the entire array, resulting in a time complexity of `O(n)`, where `n` is the size of the array. + +### 3. Average-Case Scenario: + +The average-case scenario refers to the expected performance of an algorithm over all possible inputs, typically calculated as the arithmetic mean of the time or space complexity. + +#### Characteristics: + +- Represents the typical performance of an algorithm across a range of input data. +- Takes into account the distribution of inputs and their likelihood of occurrence. +- Provides a more realistic measure of an algorithm's performance compared to the best-case or worst-case scenarios. + +#### Example: + +For the `linear search algorithm`, the average-case scenario considers the probability distribution of the target element's position within the array. If the `target element is equally likely to be found at any position in the array`, the average-case time complexity would be `O(n/2)`, as the algorithm would, on average, need to search halfway through the array. + +## Space Complexity + +The memory space that a code utilizes as it is being run is often referred to as space complexity. Additionally, space complexity depends on the machine, therefore rather than using the typical memory units like MB, GB, etc., we will express space complexity using the Big O notation. + +#### Examples of Space Complexity + +1. `Constant Space Complexity (O(1))`: Algorithms that operate on a fixed-size array or use a constant number of variables have O(1) space complexity. +2. `Linear Space Complexity (O(n))`: Algorithms that store each element of the input array in a separate variable or data structure have O(n) space complexity. +3. `Quadratic Space Complexity (O(n^2))`: Algorithms that create a two-dimensional array or matrix with dimensions based on the input size have O(n^2) space complexity. + +#### Analyzing Space Complexity + +To analyze space complexity: + +- Identify the variables, data structures, and recursive calls used by the algorithm. +- Determine how the space requirements scale with the input size. +- Express the space complexity using Big O notation, considering the dominant terms that contribute most to the overall space usage. + +## Examples to calculate time and space complexity + +#### 1. Print all elements of given array + +Consider each line takes one unit of time to run. So, to simply iterate over an array to print all elements it will take `O(n)` time, where n is the size of array. + +Code: + +```python +arr = [1,2,3,4] #1 +for x in arr: #2 + print(x) #3 +``` + +Here, the 1st statement executes only once. So, it takes one unit of time to run. The for loop consisting of 2nd and 3rd statements executes 4 times. +Also, as the code dosen't take any additional space except the input arr its Space Complexity is O(1) constant. + +#### 2. Linear Search + +Linear search is a simple algorithm for finding an element in an array by sequentially checking each element until a match is found or the end of the array is reached. Here's an example of calculating the time and space complexity of linear search: + +```python +def linear_search(arr, target): + for x in arr: # n iterations in worst case + if x == target: # 1 + return True # 1 + return False # If element not found + +# Example usage +arr = [1, 3, 5, 7, 9] +target = 5 +print(linear_search(arr, target)) +``` + +**Time Complexity Analysis** + +The for loop iterates through the entire array, which takes O(n) time in the worst case, where n is the size of the array. +Inside the loop, each operation takes constant time (O(1)). +Therefore, the time complexity of linear search is `O(n)`. + +**Space Complexity Analysis** + +The space complexity of linear search is `O(1)` since it only uses a constant amount of additional space for variables regardless of the input size. + + +#### 3. Binary Search + +Binary search is an efficient algorithm for finding an element in a sorted array by repeatedly dividing the search interval in half. Here's an example of calculating the time and space complexity of binary search: + +```python +def binary_search(arr, target): + left = 0 # 1 + right = len(arr) - 1 # 1 + + while left <= right: # log(n) iterations in worst case + mid = (left + right) // 2 # log(n) + + if arr[mid] == target: # 1 + return mid # 1 + elif arr[mid] < target: # 1 + left = mid + 1 # 1 + else: + right = mid - 1 # 1 + + return -1 # If element not found + +# Example usage +arr = [1, 3, 5, 7, 9] +target = 5 +print(binary_search(arr, target)) +``` + +**Time Complexity Analysis** + +The initialization of left and right takes constant time (O(1)). +The while loop runs for log(n) iterations in the worst case, where n is the size of the array. +Inside the loop, each operation takes constant time (O(1)). +Therefore, the time complexity of binary search is `O(log n)`. + +**Space Complexity Analysis** + +The space complexity of binary search is `O(1)` since it only uses a constant amount of additional space for variables regardless of the input size. + +#### 4. Fibbonaci Sequence + +Let's consider an example of a function that generates Fibonacci numbers up to a given index and stores them in a list. In this case, the space complexity will not be constant because the size of the list grows with the Fibonacci sequence. + +```python +def fibonacci_sequence(n): + fib_list = [0, 1] # Initial Fibonacci sequence with first two numbers + + while len(fib_list) < n: # O(n) iterations in worst case + next_fib = fib_list[-1] + fib_list[-2] # Calculating next Fibonacci number + fib_list.append(next_fib) # Appending next Fibonacci number to list + + return fib_list + +# Example usage +n = 10 +fib_sequence = fibonacci_sequence(n) +print(fib_sequence) +``` + +**Time Complexity Analysis** + +The while loop iterates until the length of the Fibonacci sequence list reaches n, so it takes `O(n)` iterations in the `worst case`.Inside the loop, each operation takes constant time (O(1)). + +**Space Complexity Analysis** + +The space complexity of this function is not constant because it creates and stores a list of Fibonacci numbers. +As n grows, the size of the list also grows, so the space complexity is O(n), where n is the index of the last Fibonacci number generated. diff --git a/contrib/ds-algorithms/tree-traversal.md b/contrib/ds-algorithms/tree-traversal.md new file mode 100644 index 00000000..4ec72ee8 --- /dev/null +++ b/contrib/ds-algorithms/tree-traversal.md @@ -0,0 +1,195 @@ +# Tree Traversal Algorithms + +Tree Traversal refers to the process of visiting or accessing each node of the tree exactly once in a certain order. Tree traversal algorithms help us to visit and process all the nodes of the tree. Since tree is not a linear data structure, there are multiple nodes which we can visit after visiting a certain node. There are multiple tree traversal techniques which decide the order in which the nodes of the tree are to be visited. + + +A Tree Data Structure can be traversed in following ways: + - **Level Order Traversal or Breadth First Search or BFS** + - **Depth First Search or DFS** + - Inorder Traversal + - Preorder Traversal + - Postorder Traversal + + ![Tree Traversal](images/traversal.png) + + + +## Binary Tree Structure + Before diving into traversal techniques, let's define a simple binary tree node structure: + +![Binary Tree](images/binarytree.png)) + + ```python +class Node: + def __init__(self, key): + self.leftChild = None + self.rightChild = None + self.data = key + +# Main class +if __name__ == "__main__": + root = Node(1) + root.leftChild = Node(2) + root.rightChild = Node(3) + root.leftChild.leftChild = Node(4) + root.leftChild.rightChild = Node(5) + root.rightChild.leftChild = Node(6) + root.rightChild.rightChild = Node(6) +``` + +## Level Order Traversal +When the nodes of the tree are wrapped in a level-wise mode from left to right, then it represents the level order traversal. We can use a queue data structure to execute a level order traversal. + +### Algorithm + - Create an empty queue Q + - Enqueue the root node of the tree to Q + - Loop while Q is not empty + - Dequeue a node from Q and visit it + - Enqueue the left child of the dequeued node if it exists + - Enqueue the right child of the dequeued node if it exists + +### code for level order traversal in python +```python +def printLevelOrder(root): + if root is None: + return + + # Create an empty queue + queue = [] + + # Enqueue Root and initialize height + queue.append(root) + + while(len(queue) > 0): + + # Print front of queue and + # remove it from queue + print(queue[0].data, end=" ") + node = queue.pop(0) + + # Enqueue left child + if node.left is not None: + queue.append(node.left) + + # Enqueue right child + if node.right is not None: + queue.append(node.right) +``` + +**output** + +` Inorder traversal of binary tree is : +1 2 3 4 5 6 7 ` + + + +## Depth First Search +When we do a depth-first traversal, we travel in one direction up to the bottom first, then turn around and go the other way. There are three kinds of depth-first traversals. + +## 1. Inorder Traversal + +In this traversal method, the left subtree is visited first, then the root and later the right sub-tree. We should always remember that every node may represent a subtree itself. + +`Note :` If a binary search tree is traversed in-order, the output will produce sorted key values in an ascending order. + +![Inorder](images/inorder-traversal.png) + +**The order:** Left -> Root -> Right + +### Algorithm + - Traverse the left subtree. + - Visit the root node. + - Traverse the right subtree. + +### code for inorder traversal in python +```python +def printInorder(root): + if root: + # First recur on left child + printInorder(root.left) + + # Then print the data of node + print(root.val, end=" "), + + # Now recur on right child + printInorder(root.right) +``` + +**output** + +` Inorder traversal of binary tree is : +4 2 5 1 6 3 7 ` + + +## 2. Preorder Traversal + +In this traversal method, the root node is visited first, then the left subtree and finally the right subtree. + +![preorder](images/preorder-traversal.png)) + +**The order:** Root -> Left -> Right + +### Algorithm + - Visit the root node. + - Traverse the left subtree. + - Traverse the right subtree. + +### code for preorder traversal in python +```python +def printPreorder(root): + if root: + # First print the data of node + print(root.val, end=" "), + + # Then recur on left child + printPreorder(root.left) + + # Finally recur on right child + printPreorder(root.right) +``` + +**output** + +` Inorder traversal of binary tree is : +1 2 4 5 3 6 7 ` + +## 3. Postorder Traversal + +In this traversal method, the root node is visited last, hence the name. First we traverse the left subtree, then the right subtree and finally the root node. + +![postorder](images/postorder-traversal.png) + +**The order:** Left -> Right -> Root + +### Algorithm + - Traverse the left subtree. + - Traverse the right subtree. + - Visit the root node. + +### code for postorder traversal in python +```python +def printPostorder(root): + if root: + # First recur on left child + printPostorder(root.left) + + # The recur on right child + printPostorder(root.right) + + # Now print the data of node + print(root.val, end=" ") +``` + +**output** + +` Inorder traversal of binary tree is : +4 5 2 6 7 3 1 ` + + +## Complexity Analysis + - **Time Complexity:** All three tree traversal methods (Inorder, Preorder, and Postorder) have a time complexity of `𝑂(𝑛)`, where 𝑛 is the number of nodes in the tree. + - **Space Complexity:** The space complexity is influenced by the recursion stack. In the worst case, the depth of the recursion stack can go up to `𝑂(ℎ)`, where ℎ is the height of the tree. + + + + diff --git a/contrib/ds-algorithms/trie.md b/contrib/ds-algorithms/trie.md new file mode 100644 index 00000000..0ccfbaad --- /dev/null +++ b/contrib/ds-algorithms/trie.md @@ -0,0 +1,152 @@ +# Trie + +A Trie is a tree-like data structure used for storing a dynamic set of strings where the keys are usually strings. It is also known as prefix tree or digital tree. + +>Trie is a type of search tree, where each node represents a single character of a string. + +>Nodes are linked in such a way that they form a tree, where each path from the root to a leaf node represents a unique string stored in the Trie. + +## Characteristics of Trie +- **Prefix Matching**: Tries are particularly useful for prefix matching operations. Any node in the Trie represents a common prefix of all strings below it. +- **Space Efficiency**: Tries can be more space-efficient than other data structures like hash tables for storing large sets of strings with common prefixes. +- **Time Complexity**: Insertion, deletion, and search operations in a Trie have a time complexity of +𝑂(𝑚), where m is the length of the string. This makes Tries very efficient for these operations. + +## Structure of Trie + +Trie mainly consists of three parts: +- **Root**: The root of a Trie is an empty node that does not contain any character. +- **Edges**: Each edge in the Trie represents a character in the alphabet of the stored strings. +- **Nodes**: Each node contains a character and possibly additional information, such as a boolean flag indicating if the node represents the end of a valid string. + +To implement the nodes of trie. We use Classes in Python. Each node is an object of the Node Class. + +Node Class have mainly two components +- *Array of size 26*: It is used to represent the 26 alphabets. Initially all are None. While inserting the words, then array will be filled with object of child nodes. +- *End of word*: It is used to represent the end of word while inserting. + +Code Block of Node Class : + +```python +class Node: + def __init__(self): + self.alphabets = [None] * 26 + self.end_of_word = 0 +``` + +Now we need to implement Trie. We create another class named Trie with some methods like Insertion, Searching and Deletion. + +**Initialization:** In this, we initializes the Trie with a `root` node. + +Code Implementation of Initialization: + +```python +class Trie: + def __init__(self): + self.root = Node() +``` + +## Operations on Trie + +1. **Insertion**: Inserts the word into the Trie. This method takes `word` as parameter. For each character in the word, it checks if there is a corresponding child node. If not, it creates a new `Node`. After processing all the characters in word, it increments the `end_of_word` value of the last node. + +Code Implementation of Insertion: +```python +def insert(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + if not node.alphabets[index]: + node.alphabets[index] = Node() + node = node.alphabets[index] + node.end_of_word += 1 +``` + +2. **Searching**: Search the `word` in trie. Searching process starts from the `root` node. Each character of the `word` is processed. After traversing the whole word in trie, it return the count of words. + +There are two cases in Searching: +- *Word Not found*: It happens when the word we search not present in the trie. This case will occur, if the value of `alphabets` array at that character is `None` or if the value of `end_of_word` of the node, reached after traversing the whole word is `0`. +- *Word found*: It happens when the search word is present in the Trie. This case will occur, when the `end_of_word` value is greater than `0` of the node after traversing the whole word. + +Code Implementation of Searching: +```python + def Search(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + if not node.alphabets[index]: + return 0 + node = node.alphabets[index] + return node.end_of_word +``` + +3. **Deletion**: To delete a string, follow the path of the string. If the end node is reached and `end_of_word` is greater than `0` then decrement the value. + +Code Implementation of Deletion: + +```python +def delete(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + node = node.alphabets[index] + if node.end_of_word: + node.end_of_word-=1 +``` + +Python Code to implement Trie: + +```python +class Node: + def __init__(self): + self.alphabets = [None] * 26 + self.end_of_word = 0 + +class Trie: + def __init__(self): + self.root = Node() + + def insert(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + if not node.alphabets[index]: + node.alphabets[index] = Node() + node = node.alphabets[index] + node.end_of_word += 1 + + def Search(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + if not node.alphabets[index]: + return 0 + node = node.alphabets[index] + return node.end_of_word + + def delete(self, word): + node = self.root + for char in word: + index = ord(char) - ord('a') + node = node.alphabets[index] + if node.end_of_word: + node.end_of_word-=1 + +if __name__ == "__main__": + trie = Trie() + + word1 = "apple" + word2 = "app" + word3 = "bat" + + trie.insert(word1) + trie.insert(word2) + trie.insert(word3) + + print(trie.Search(word1)) + print(trie.Search(word2)) + print(trie.Search(word3)) + + trie.delete(word2) + print(trie.Search(word2)) +``` diff --git a/contrib/ds-algorithms/two-pointer-technique.md b/contrib/ds-algorithms/two-pointer-technique.md new file mode 100644 index 00000000..6b8720a5 --- /dev/null +++ b/contrib/ds-algorithms/two-pointer-technique.md @@ -0,0 +1,132 @@ +# Two-Pointer Technique + +--- + +- The two-pointer technique is a popular algorithmic strategy used to solve various problems efficiently. This technique involves using two pointers (or indices) to traverse through data structures such as arrays or linked lists. +- The pointers can move in different directions, allowing for efficient processing of elements to achieve the desired results. + +## Common Use Cases + +1. **Finding pairs in a sorted array that sum to a target**: One pointer starts at the beginning and the other at the end. +2. **Reversing a linked list**: One pointer starts at the head, and the other at the next node, progressing through the list. +3. **Removing duplicates from a sorted array**: One pointer keeps track of the unique elements, and the other traverses the array. +4. **Merging two sorted arrays**: Two pointers are used to iterate through the arrays and merge them. + +## Example 1: Finding Pairs with a Given Sum + +### Problem Statement + +Given a sorted array of integers and a target sum, find all pairs in the array that sum up to the target. + +### Approach + +1. Initialize two pointers: one at the beginning (`left`) and one at the end (`right`) of the array. +2. Calculate the sum of the elements at the `left` and `right` pointers. +3. If the sum is equal to the target, record the pair and move both pointers inward. +4. If the sum is less than the target, move the `left` pointer to the right to increase the sum. +5. If the sum is greater than the target, move the `right` pointer to the left to decrease the sum. +6. Repeat the process until the `left` pointer is not less than the `right` pointer. + +### Example Code + +```python +def find_pairs_with_sum(arr, target): + left = 0 + right = len(arr) - 1 + pairs = [] + + while left < right: + current_sum = arr[left] + arr[right] + + if current_sum == target: + pairs.append((arr[left], arr[right])) + left += 1 + right -= 1 + elif current_sum < target: + left += 1 + else: + right -= 1 + + return pairs + +# Example usage +arr = [1, 2, 3, 4, 5, 6, 7, 8, 9] +target = 10 +result = find_pairs_with_sum(arr, target) +print("Pairs with sum", target, "are:", result) + ``` + +## Example 2: Removing Duplicates from a Sorted Array + +### Problem Statement +Given a sorted array, remove the duplicates in place such that each element appears only once and return the new length of the array. + +### Approach +1. If the array is empty, return 0. +2. Initialize a slow pointer at the beginning of the array. +3. Use a fast pointer to traverse through the array. +4. Whenever the element at the fast pointer is different from the element at the slow pointer, increment the slow pointer and update the element at the slow pointer with the element at the fast pointer. +5. Continue this process until the fast pointer reaches the end of the array. +6. The slow pointer will indicate the position of the last unique element. + +### Example Code + +```python +def remove_duplicates(arr): + if not arr: + return 0 + + slow = 0 + + for fast in range(1, len(arr)): + if arr[fast] != arr[slow]: + slow += 1 + arr[slow] = arr[fast] + + return slow + 1 + +# Example usage +arr = [1, 1, 2, 2, 3, 4, 4, 5] +new_length = remove_duplicates(arr) +print("Array after removing duplicates:", arr[:new_length]) +print("New length of array:", new_length) +``` +# Advantages of the Two-Pointer Technique + +Here are some key benefits of using the two-pointer technique: + +## 1. **Improved Time Complexity** + +It often reduces the time complexity from O(n^2) to O(n), making it significantly faster for many problems. + +### Example +- **Finding pairs with a given sum**: Efficiently finds pairs in O(n) time. + +## 2. **Simplicity** + +The implementation is straightforward, using basic operations like incrementing or decrementing pointers. + +### Example +- **Removing duplicates from a sorted array**: Easy to implement and understand. + +## 3. **In-Place Solutions** + +Many problems can be solved in place, requiring no extra space beyond the input data. + +### Example +- **Reversing a linked list**: Adjusts pointers within the existing nodes. + +## 4. **Versatility** + +Applicable to a wide range of problems, from arrays and strings to linked lists. + +### Example +- **Merging two sorted arrays**: Efficiently merges using two pointers. + +## 5. **Efficiency** + +Minimizes redundant operations and enhances performance, especially with large data sets. + +### Example +- **Partitioning problems**: Efficiently partitions elements with minimal operations. + diff --git a/contrib/machine-learning/ann.md b/contrib/machine-learning/ann.md new file mode 100644 index 00000000..c577c945 --- /dev/null +++ b/contrib/machine-learning/ann.md @@ -0,0 +1,153 @@ +# Understanding the Neural Network + +## Table of Contents +
+Click to expand + +- [Introduciton](#introduction) +- [Neuron to Perceptron](#neuron-to-perceptron) +- [Key concepts](#key-concepts) + - [Layers](#layers) + - [Weights and Biases](#weights-and-biases) + - [Activation Function](#activation-functions) + - [Forward and Backward Pass](#forward-and-backward-propagation) +- [Implementation](#building-from-scratch) + +
+ + +## Introduction + +This guide will walk you through a fundamental neural network implementation in Python. We'll build a `Neural Network` from scratch, allowing you to grasp the core concepts of how neural networks learn and make predictions. + +### Let's start by Understanding the Basic Architecture of Neural Nets + +## Neuron to Perceptron + +| `Neuron` cells forming the humand nervous system | `Perceptron` inspired from human brain | +| :----------------------------------------------- | -------------------------------------: | +| Neurons are nerve cells that send messages all over your body to allow you to do everything from breathing to talking, eating, walking, and thinking. | The perceptron is a mathematical model of a biological neuron. Performing heavy computations to think like humans. | +| Neuron collects signals from dendrites. | The first layer is knownn as Input Layer, acting like dendritres to receive the input signal. | +| Synapses are the connections between neurons where signals are transmitted. | Weights represent synapses. | +The axon terminal releases neurotransmitters to transmit the signal to other neurons. | The output is the final result – between 1 & 0, representing classification or prediction. | +--- +> Human brain has a Network of Neurons, about 86 billion neurons and more than a 100 trillion synapses connections! + + +## **Key Concepts** + +Artificial neurons are the fundamental processing units in an ANN. They receive inputs, multiply them by weights (representing the strength of connections), sum those weighted inputs, and then apply an activation function to produce an output. + +### Layers +Neurons in ANNs are organized into layers: +* **Input Layer:** Receives the raw data. +* **(n) Hidden Layers:** (Optional) Intermediate layers where complex transformations occur. They learn to detect patterns and features in the data. +* **Output Layer:** Produces the final result (prediction or classification). + +### Weights and Biases +- For each input $(x_i)$, a weight $(w_i)$ is associated with it. Weights, multiplied with input units $(w_i \cdot x_i)$, determine the influence of one neuron's output on another. +- A bias $(b_i)$ is added to help influence the end product, giving the equation as $(w_i \cdot x_i + b_i)$. +- During training, the network adjusts these weights and biases to minimize errors and improve its predictions. + +### Activation Functions +- An activation function is applied to the result to introduce non-linearity in the model, allowing ANNs to learn more complex relationships from the data. +- The resulting equation: $y = f(g(x))$, determines whether the neuron will "fire" or not, i.e., if its output will be used as input for the next neuron. +- Common activation functions include the sigmoid function, tanh (hyperbolic tangent), and ReLU (Rectified Linear Unit). + +### Forward and Backward Propagation +- **Flow of Information:** All the above steps are part of Forward Propagation. It gives the output equation as $y = f\left(\sum_{i=1}^n w_i x_i + b_i\right)$ +- **Error Correction:** Backpropagation is the algorithm used to train ANNs by calculating the gradient of error at the output layer and then propagating this error backward through the network. This allows the network to adjust its weights and biases in the direction that reduces the error. +- The chain rule of calculus is the foundational concept to compute the gradient of the error: + $ + \delta_{ij}(E) = \frac{\partial E}{\partial w_{ij}} = \frac{\partial E}{\partial \hat{y}_j} \cdot \frac{\partial \hat{y}_j}{\partial \theta_j} \cdot \frac{\partial \theta_j}{\partial w_{ij}} + $ + where $E$ is the error, $\hat{y}_j$ is the predicted output, $\theta_j$ is the input to the activation function of the $j^{th}$ neuron, and $w_{ij}$ is the weight from neuron $i$ to neuron $j$. + + +## Building From Scratch + +```python +# Import required libraries +import numpy as np +import matplotlib.pyplot as plt + +class SimpleNeuralNetwork: + def __init__(self, input_size, hidden_size, output_size): + self.input_size = input_size + self.hidden_size = hidden_size + self.output_size = output_size + + # Initialize weights and biases + self.weights_input_hidden = np.random.randn(input_size, hidden_size) + self.bias_hidden = np.random.randn(hidden_size) + self.weights_hidden_output = np.random.randn(hidden_size, output_size) + self.bias_output = np.random.randn(output_size) + + def sigmoid(self, x): + return 1 / (1 + np.exp(-x)) + + def sigmoid_derivative(self, x): + return x * (1 - x) + + def forward(self, X): + self.hidden_layer_input = np.dot(X, self.weights_input_hidden) + self.bias_hidden + self.hidden_layer_output = self.sigmoid(self.hidden_layer_input) + + self.output_layer_input = np.dot(self.hidden_layer_output, self.weights_hidden_output) + self.bias_output + self.output = self.sigmoid(self.output_layer_input) + + return self.output + + def backward(self, X, y, learning_rate): + output_error = y - self.output + output_delta = output_error * self.sigmoid_derivative(self.output) + + hidden_error = output_delta.dot(self.weights_hidden_output.T) + hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_layer_output) + + self.weights_hidden_output += self.hidden_layer_output.T.dot(output_delta) * learning_rate + self.bias_output += np.sum(output_delta, axis=0) * learning_rate + self.weights_input_hidden += X.T.dot(hidden_delta) * learning_rate + self.bias_hidden += np.sum(hidden_delta, axis=0) * learning_rate + + def train(self, X, y, epochs, learning_rate): + self.losses = [] + for epoch in range(epochs): + self.forward(X) + self.backward(X, y, learning_rate) + loss = np.mean(np.square(y - self.output)) + self.losses.append(loss) + if epoch % 1000 == 0: + print(f"Epoch {epoch}, Loss: {loss}") + + def plot_loss(self): + plt.plot(self.losses) + plt.xlabel('Epochs') + plt.ylabel('Loss') + plt.title('Training Loss Over Epochs') + plt.show() +``` + +### Creating the Input & Output Array +Let's create a dummy input and outpu dataset. Here, the first two columns will be useful, while the rest might be noise. +```python +X = np.array([[0,0], [0,1], [1,0], [1,1]]) +y = np.array([[0], [1], [1], [1]]) +``` + +### Defining the Neural Network +With our input and output data ready, we'll define a simple neural network with one hidden layer containing three neurons. +```python +# neural network architecture +input_size = 2 +hidden_layers = 1 +hidden_neurons = [2] +output_size = 1 +``` + +### Visualizing the Training Loss +To understand how well our model is learning, let's visualize the training loss over epochs. +```python +model = NeuralNetwork(input_size, hidden_layers, hidden_neurons, output_size) +model.train(X, y, 100) +``` diff --git a/contrib/machine-learning/assets/XG_1.webp b/contrib/machine-learning/assets/XG_1.webp new file mode 100644 index 00000000..c693d3da Binary files /dev/null and b/contrib/machine-learning/assets/XG_1.webp differ diff --git a/contrib/machine-learning/assets/cnn-dropout.png b/contrib/machine-learning/assets/cnn-dropout.png new file mode 100644 index 00000000..9cb18f95 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-dropout.png differ diff --git a/contrib/machine-learning/assets/cnn-filters.png b/contrib/machine-learning/assets/cnn-filters.png new file mode 100644 index 00000000..463ca600 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-filters.png differ diff --git a/contrib/machine-learning/assets/cnn-flattened.png b/contrib/machine-learning/assets/cnn-flattened.png new file mode 100644 index 00000000..2d1ca6f2 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-flattened.png differ diff --git a/contrib/machine-learning/assets/cnn-input_shape.png b/contrib/machine-learning/assets/cnn-input_shape.png new file mode 100644 index 00000000..34379f1d Binary files /dev/null and b/contrib/machine-learning/assets/cnn-input_shape.png differ diff --git a/contrib/machine-learning/assets/cnn-ouputs.png b/contrib/machine-learning/assets/cnn-ouputs.png new file mode 100644 index 00000000..27972265 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-ouputs.png differ diff --git a/contrib/machine-learning/assets/cnn-padding.png b/contrib/machine-learning/assets/cnn-padding.png new file mode 100644 index 00000000..a441b2b9 Binary files /dev/null and b/contrib/machine-learning/assets/cnn-padding.png differ diff --git a/contrib/machine-learning/assets/cnn-pooling.png b/contrib/machine-learning/assets/cnn-pooling.png new file mode 100644 index 00000000..c3ada5cf Binary files /dev/null and b/contrib/machine-learning/assets/cnn-pooling.png differ diff --git a/contrib/machine-learning/assets/cnn-strides.png b/contrib/machine-learning/assets/cnn-strides.png new file mode 100644 index 00000000..26339a9f Binary files /dev/null and b/contrib/machine-learning/assets/cnn-strides.png differ diff --git a/contrib/machine-learning/assets/eda/bi-variate-analysis.png b/contrib/machine-learning/assets/eda/bi-variate-analysis.png new file mode 100644 index 00000000..076cc505 Binary files /dev/null and b/contrib/machine-learning/assets/eda/bi-variate-analysis.png differ diff --git a/contrib/machine-learning/assets/eda/correlation-analysis.png b/contrib/machine-learning/assets/eda/correlation-analysis.png new file mode 100644 index 00000000..e6f3ee60 Binary files /dev/null and b/contrib/machine-learning/assets/eda/correlation-analysis.png differ diff --git a/contrib/machine-learning/assets/eda/multi-variate-analysis.png b/contrib/machine-learning/assets/eda/multi-variate-analysis.png new file mode 100644 index 00000000..5dc042b9 Binary files /dev/null and b/contrib/machine-learning/assets/eda/multi-variate-analysis.png differ diff --git a/contrib/machine-learning/assets/eda/uni-variate-analysis1.png b/contrib/machine-learning/assets/eda/uni-variate-analysis1.png new file mode 100644 index 00000000..b4905dcf Binary files /dev/null and b/contrib/machine-learning/assets/eda/uni-variate-analysis1.png differ diff --git a/contrib/machine-learning/assets/eda/uni-variate-analysis2.png b/contrib/machine-learning/assets/eda/uni-variate-analysis2.png new file mode 100644 index 00000000..cf56c70a Binary files /dev/null and b/contrib/machine-learning/assets/eda/uni-variate-analysis2.png differ diff --git a/contrib/machine-learning/assets/km_.png b/contrib/machine-learning/assets/km_.png new file mode 100644 index 00000000..3f674126 Binary files /dev/null and b/contrib/machine-learning/assets/km_.png differ diff --git a/contrib/machine-learning/assets/km_2.png b/contrib/machine-learning/assets/km_2.png new file mode 100644 index 00000000..cf786cf2 Binary files /dev/null and b/contrib/machine-learning/assets/km_2.png differ diff --git a/contrib/machine-learning/assets/km_3.png b/contrib/machine-learning/assets/km_3.png new file mode 100644 index 00000000..ecd34ff5 Binary files /dev/null and b/contrib/machine-learning/assets/km_3.png differ diff --git a/contrib/machine-learning/assets/knm.png b/contrib/machine-learning/assets/knm.png new file mode 100644 index 00000000..4b7a2190 Binary files /dev/null and b/contrib/machine-learning/assets/knm.png differ diff --git a/contrib/machine-learning/assets/transformer-architecture.png b/contrib/machine-learning/assets/transformer-architecture.png new file mode 100644 index 00000000..2854ab0a Binary files /dev/null and b/contrib/machine-learning/assets/transformer-architecture.png differ diff --git a/contrib/machine-learning/binomial-distribution.md b/contrib/machine-learning/binomial-distribution.md new file mode 100644 index 00000000..0d1d3280 --- /dev/null +++ b/contrib/machine-learning/binomial-distribution.md @@ -0,0 +1,123 @@ +# Binomial Distribution + +## Introduction + +The binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials, each with the same probability of success. It is commonly used in statistics and probability theory. + +### Key Characteristics + +- **Number of trials (n):** The number of independent experiments or trials. +- **Probability of success (p):** The probability of success on an individual trial. +- **Number of successes (k):** The number of successful outcomes in n trials. + +The binomial distribution is defined by the probability mass function (PMF): + +P(X = k) = (n choose k) p^k (1 - p)^(n - k) + +where: +- (n choose k) is the binomial coefficient, calculated as n! / (k!(n-k)!). + +## Properties of Binomial Distribution + +- **Mean:** μ = np +- **Variance:** σ² = np(1 - p) +- **Standard Deviation:** σ = √(np(1 - p)) + +## Python Implementation + +Let's implement the binomial distribution using Python. We'll use the `scipy.stats` library to compute the binomial PMF and CDF, and `matplotlib` to visualize it. + +### Step-by-Step Implementation + +1. **Import necessary libraries:** + + ```python + import numpy as np + import matplotlib.pyplot as plt + from scipy.stats import binom + ``` + +2. **Define parameters:** + + ```python + # Number of trials + n = 10 + # Probability of success + p = 0.5 + # Number of successes + k = np.arange(0, n + 1) + ``` + +3. **Compute the PMF:** + + ```python + pmf = binom.pmf(k, n, p) + ``` + +4. **Plot the PMF:** + + ```python + plt.bar(k, pmf, color='blue') + plt.xlabel('Number of Successes') + plt.ylabel('Probability') + plt.title('Binomial Distribution PMF') + plt.show() + ``` + +5. **Compute the CDF:** + + ```python + cdf = binom.cdf(k, n, p) + ``` + +6. **Plot the CDF:** + + ```python + plt.plot(k, cdf, marker='o', linestyle='--', color='blue') + plt.xlabel('Number of Successes') + plt.ylabel('Cumulative Probability') + plt.title('Binomial Distribution CDF') + plt.grid(True) + plt.show() + ``` + +### Complete Code + +Here is the complete code for the binomial distribution implementation: + +```python +import numpy as np +import matplotlib.pyplot as plt +from scipy.stats import binom + +# Parameters +n = 10 # Number of trials +p = 0.5 # Probability of success + +# Number of successes +k = np.arange(0, n + 1) + +# Compute PMF +pmf = binom.pmf(k, n, p) + +# Plot PMF +plt.figure(figsize=(12, 6)) +plt.subplot(1, 2, 1) +plt.bar(k, pmf, color='blue') +plt.xlabel('Number of Successes') +plt.ylabel('Probability') +plt.title('Binomial Distribution PMF') + +# Compute CDF +cdf = binom.cdf(k, n, p) + +# Plot CDF +plt.subplot(1, 2, 2) +plt.plot(k, cdf, marker='o', linestyle='--', color='blue') +plt.xlabel('Number of Successes') +plt.ylabel('Cumulative Probability') +plt.title('Binomial Distribution CDF') +plt.grid(True) + +plt.tight_layout() +plt.show() diff --git a/contrib/machine-learning/clustering.md b/contrib/machine-learning/clustering.md new file mode 100644 index 00000000..bc02d374 --- /dev/null +++ b/contrib/machine-learning/clustering.md @@ -0,0 +1,96 @@ +# Clustering + +Clustering is an unsupervised machine learning technique that groups a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters). This README provides an overview of clustering, including its fundamental concepts, types, algorithms, and how to implement it using Python. + +## Introduction + +Clustering is a technique used to find inherent groupings within data without pre-labeled targets. It is widely used in exploratory data analysis, pattern recognition, image analysis, information retrieval, and bioinformatics. + +## Concepts + +### Centroid + +A centroid is the center of a cluster. In the k-means clustering algorithm, for example, each cluster is represented by its centroid, which is the mean of all the data points in the cluster. + +### Distance Measure + +Distance measures are used to quantify the similarity or dissimilarity between data points. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity. + +### Inertia + +Inertia is a metric used to assess the quality of the clusters formed. It is the sum of squared distances of samples to their nearest cluster center. + +## Types of Clustering + +1. **Hard Clustering**: Each data point either belongs to a cluster completely or not at all. +2. **Soft Clustering (Fuzzy Clustering)**: Each data point can belong to multiple clusters with varying degrees of membership. + +## Clustering Algorithms + +### K-Means Clustering + +K-Means is a popular clustering algorithm that partitions the data into k clusters, where each data point belongs to the cluster with the nearest mean. The algorithm follows these steps: +1. Initialize k centroids randomly. +2. Assign each data point to the nearest centroid. +3. Recalculate the centroids as the mean of all data points assigned to each cluster. +4. Repeat steps 2 and 3 until convergence. + +### Hierarchical Clustering + +Hierarchical clustering builds a tree of clusters. There are two types: +- **Agglomerative (bottom-up)**: Starts with each data point as a separate cluster and merges the closest pairs of clusters iteratively. +- **Divisive (top-down)**: Starts with all data points in one cluster and splits the cluster iteratively into smaller clusters. + +### DBSCAN (Density-Based Spatial Clustering of Applications with Noise) + +DBSCAN groups together points that are close to each other based on a distance measurement and a minimum number of points. It can find arbitrarily shaped clusters and is robust to noise. + +## Implementation + +### Using Scikit-learn + +Scikit-learn is a popular machine learning library in Python that provides tools for clustering. + +### Code Example + +```python +import numpy as np +import pandas as pd +from sklearn.cluster import KMeans +from sklearn.preprocessing import StandardScaler +from sklearn.metrics import silhouette_score + +# Load dataset +data = pd.read_csv('path/to/your/dataset.csv') + +# Preprocess the data +scaler = StandardScaler() +data_scaled = scaler.fit_transform(data) + +# Initialize and fit KMeans model +kmeans = KMeans(n_clusters=3, random_state=42) +kmeans.fit(data_scaled) + +# Get cluster labels +labels = kmeans.labels_ + +# Calculate silhouette score +silhouette_avg = silhouette_score(data_scaled, labels) +print("Silhouette Score:", silhouette_avg) + +# Add cluster labels to the original data +data['Cluster'] = labels + +print(data.head()) +``` + +## Evaluation Metrics + +- **Silhouette Score**: Measures how similar a data point is to its own cluster compared to other clusters. +- **Inertia (Within-cluster Sum of Squares)**: Measures the compactness of the clusters. +- **Davies-Bouldin Index**: Measures the average similarity ratio of each cluster with the cluster that is most similar to it. +- **Dunn Index**: Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance. + +## Conclusion + +Clustering is a powerful technique for discovering structure in data. Understanding different clustering algorithms and their evaluation metrics is crucial for selecting the appropriate method for a given problem. diff --git a/contrib/machine-learning/confusion-matrix.md b/contrib/machine-learning/confusion-matrix.md new file mode 100644 index 00000000..4bedf667 --- /dev/null +++ b/contrib/machine-learning/confusion-matrix.md @@ -0,0 +1,70 @@ +## Confusion Matrix + +A confusion matrix is a fundamental performance evaluation tool used in machine learning to assess the accuracy of a classification model. It is an N x N matrix, where N represents the number of target classes. + +For binary classification, it results in a 2 x 2 matrix that outlines four key parameters: +1. True Positive (TP) - The predicted value matches the actual value, or the predicted class matches the actual class. +For example - the actual value was positive, and the model predicted a positive value. +2. True Negative (TN) - The predicted value matches the actual value, or the predicted class matches the actual class. +For example - the actual value was negative, and the model predicted a negative value. +3. False Positive (FP)/Type I Error - The predicted value was falsely predicted. +For example - the actual value was negative, but the model predicted a positive value. +4. False Negative (FN)/Type II Error - The predicted value was falsely predicted. +For example - the actual value was positive, but the model predicted a negative value. + +The confusion matrix enables the calculation of various metrics like accuracy, precision, recall, F1-Score and specificity. +1. Accuracy - It represents the proportion of correctly classified instances out of the total number of instances in the dataset. +2. Precision - It quantifies the accuracy of positive predictions made by the model. +3. Recall - It quantifies the ability of a model to correctly identify all positive instances in the dataset and is also known as sensitivity or true positive rate. +4. F1-Score - It is a single measure that combines precision and recall, offering a balanced evaluation of a classification model's effectiveness. + +To implement the confusion matrix in Python, we can use the confusion_matrix() function from the sklearn.metrics module of the scikit-learn library. +The function returns a 2D array that represents the confusion matrix. +We can also visualize the confusion matrix using a heatmap. + +```python +# Import necessary libraries +import numpy as np +from sklearn.metrics import confusion_matrix, classification_report +import seaborn as sns +import matplotlib.pyplot as plt + +# Create the NumPy array for actual and predicted labels +actual = np.array(['Apple', 'Apple', 'Apple', 'Not Apple', 'Apple', + 'Not Apple', 'Apple', 'Apple', 'Not Apple', 'Not Apple']) +predicted = np.array(['Apple', 'Not Apple', 'Apple', 'Not Apple', 'Apple', + 'Apple', 'Apple', 'Apple', 'Not Apple', 'Not Apple']) + +# Compute the confusion matrix +cm = confusion_matrix(actual,predicted) + +# Plot the confusion matrix with the help of the seaborn heatmap +sns.heatmap(cm, + annot=True, + fmt='g', + xticklabels=['Apple', 'Not Apple'], + yticklabels=['Apple', 'Not Apple']) +plt.xlabel('Prediction', fontsize=13) +plt.ylabel('Actual', fontsize=13) +plt.title('Confusion Matrix', fontsize=17) +plt.show() + +# Classifications Report based on Confusion Metrics +print(classification_report(actual, predicted)) +``` + +### Results + +``` +1. Confusion Matrix: +[[5 1] +[1 3]] +2. Classification Report: + precision recall f1-score support +Apple 0.83 0.83 0.83 6 +Not Apple 0.75 0.75 0.75 4 + +accuracy 0.80 10 +macro avg 0.79 0.79 0.79 10 +weighted avg 0.80 0.80 0.80 10 +``` diff --git a/contrib/machine-learning/cost-functions.md b/contrib/machine-learning/cost-functions.md new file mode 100644 index 00000000..c1fe2170 --- /dev/null +++ b/contrib/machine-learning/cost-functions.md @@ -0,0 +1,235 @@ + +# Cost Functions in Machine Learning + +Cost functions, also known as loss functions, play a crucial role in training machine learning models. They measure how well the model performs on the training data by quantifying the difference between predicted and actual values. Different types of cost functions are used depending on the problem domain and the nature of the data. + +## Types of Cost Functions + +### 1. Mean Squared Error (MSE) + +**Explanation:** +MSE is one of the most commonly used cost functions, particularly in regression problems. It calculates the average squared difference between the predicted and actual values. + +**Mathematical Formulation:** +The MSE is defined as: +$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$ +Where: +- `n` is the number of samples. +- $y_i$ is the actual value. +- $\hat{y}_i$ is the predicted value. + +**Advantages:** +- Sensitive to large errors due to squaring. +- Differentiable and convex, facilitating optimization. + +**Disadvantages:** +- Sensitive to outliers, as the squared term amplifies their impact. + +**Python Implementation:** +```python +import numpy as np + +def mean_squared_error(y_true, y_pred): + n = len(y_true) + return np.mean((y_true - y_pred) ** 2) +``` + +### 2. Mean Absolute Error (MAE) + +**Explanation:** +MAE is another commonly used cost function for regression tasks. It measures the average absolute difference between predicted and actual values. + +**Mathematical Formulation:** +The MAE is defined as: +$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$ +Where: +- `n` is the number of samples. +- $y_i$ is the actual value. +- $\hat{y}_i$ is the predicted value. + +**Advantages:** +- Less sensitive to outliers compared to MSE. +- Provides a linear error term, which can be easier to interpret. + + +**Disadvantages:** +- Not differentiable at zero, which can complicate optimization. + +**Python Implementation:** +```python +import numpy as np + +def mean_absolute_error(y_true, y_pred): + n = len(y_true) + return np.mean(np.abs(y_true - y_pred)) +``` + +### 3. Cross-Entropy Loss (Binary) + +**Explanation:** +Cross-entropy loss is commonly used in binary classification problems. It measures the dissimilarity between the true and predicted probability distributions. + +**Mathematical Formulation:** + +For binary classification, the cross-entropy loss is defined as: + +$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$ + +Where: +- `n` is the number of samples. +- $y_i$ is the actual class label (0 or 1). +- $\hat{y}_i$ is the predicted probability of the positive class. + + +**Advantages:** +- Penalizes confident wrong predictions heavily. +- Suitable for probabilistic outputs. + +**Disadvantages:** +- Sensitive to class imbalance. + +**Python Implementation:** +```python +import numpy as np + +def binary_cross_entropy(y_true, y_pred): + n = len(y_true) + return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)) +``` + +### 4. Cross-Entropy Loss (Multiclass) + +**Explanation:** +For multiclass classification problems, the cross-entropy loss is adapted to handle multiple classes. + +**Mathematical Formulation:** + +The multiclass cross-entropy loss is defined as: + +$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$$ + +Where: +- `n` is the number of samples. +- `C` is the number of classes. +- $y_{i,c}$ is the indicator function for the true class of sample `i`. +- $\hat{y}_{i,c}$ is the predicted probability of sample `i` belonging to class `c`. + +**Advantages:** +- Handles multiple classes effectively. +- Encourages the model to assign high probabilities to the correct classes. + +**Disadvantages:** +- Requires one-hot encoding for class labels, which can increase computational complexity. + +**Python Implementation:** +```python +import numpy as np + +def categorical_cross_entropy(y_true, y_pred): + n = len(y_true) + return -np.mean(np.sum(y_true * np.log(y_pred), axis=1)) +``` + +### 5. Hinge Loss (SVM) + +**Explanation:** +Hinge loss is commonly used in support vector machines (SVMs) for binary classification tasks. It penalizes misclassifications by a linear margin. + +**Mathematical Formulation:** + +For binary classification, the hinge loss is defined as: + +$$\text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i)$$ + +Where: +- `n` is the number of samples. +- $y_i$ is the actual class label (-1 or 1). +- $\hat{y}_i$ is the predicted score for sample \( i \). + +**Advantages:** +- Encourages margin maximization in SVMs. +- Robust to outliers due to the linear penalty. + +**Disadvantages:** +- Not differentiable at the margin, which can complicate optimization. + +**Python Implementation:** +```python +import numpy as np + +def hinge_loss(y_true, y_pred): + n = len(y_true) + loss = np.maximum(0, 1 - y_true * y_pred) + return np.mean(loss) +``` + +### 6. Huber Loss + +**Explanation:** +Huber loss is a combination of MSE and MAE, providing a compromise between the two. It is less sensitive to outliers than MSE and provides a smooth transition to MAE for large errors. + +**Mathematical Formulation:** + +The Huber loss is defined as: + + +$$\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{ +\begin{array}{ll} +\frac{1}{2} (y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \\ +\delta(|y_i - \hat{y}_i| - \frac{1}{2} \delta) & \text{otherwise} +\end{array} +\right.$$ + +Where: +- `n` is the number of samples. +- $\delta$ is a threshold parameter. + +**Advantages:** +- Provides a smooth loss function. +- Less sensitive to outliers than MSE. + +**Disadvantages:** +- Requires tuning of the threshold parameter. + +**Python Implementation:** +```python +import numpy as np + +def huber_loss(y_true, y_pred, delta): + error = y_true - y_pred + loss = np.where(np.abs(error) <= delta, 0.5 * error ** 2, delta * (np.abs(error) - 0.5 * delta)) + return np.mean(loss) +``` + +### 7. Log-Cosh Loss + +**Explanation:** +Log-Cosh loss is a smooth approximation of the MAE and is less sensitive to outliers than MSE. It provides a smooth transition from quadratic for small errors to linear for large errors. + +**Mathematical Formulation:** + +The Log-Cosh loss is defined as: + +$$\text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i))$$ + +Where: +- `n` is the number of samples. + +**Advantages:** +- Smooth and differentiable everywhere. +- Less sensitive to outliers. + +**Disadvantages:** +- Computationally more expensive than simple losses like MSE. + +**Python Implementation:** +```python +import numpy as np + +def logcosh_loss(y_true, y_pred): + error = y_true - y_pred + loss = np.log(np.cosh(error)) + return np.mean(loss) +``` + +These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains. diff --git a/contrib/machine-learning/decision-tree.md b/contrib/machine-learning/decision-tree.md new file mode 100644 index 00000000..8159bcf2 --- /dev/null +++ b/contrib/machine-learning/decision-tree.md @@ -0,0 +1,257 @@ +# Decision Trees +Decision trees are a type of supervised machine learning algorithm that is mostly used in classification problems. They work for both categorical and continuous input and output variables. + +It is also interpreted as acyclic graph that can be utilized for decision-making is called a decision tree. Every branching node in the graph looks at a particular feature (j) of the feature vector. The left branch is taken when the feature's value is less than a certain threshold; the right branch is taken when it is higher. The class to which the example belongs is decided upon as soon as the leaf node is reached. + +## Key Components of a Decision Tree +**Root Node:** This is the decision tree's first node, and it symbolizes the whole population or sample. + +**Internal Nodes:** These are the nodes that make decisions and they stand in for the characteristics or features. + +**Leaf Nodes:** These are the nodes that make decisions and they stand in for the characteristics or features. + +**Branches:** These are the lines that connect the nodes, and they show how the choice was made depending on the feature value. + +### Example: Predicting Loan Approval + +In this example, we will use a decision tree to forecast the approval or denial of a loan application based on a number of features, including job status, credit score, and income. + +``` + Root Node + (All Applications) + / \ + Internal Node Internal Node + (Credit Score) (Employment Status) + / \ / \ + Leaf Node Leaf Node Leaf Node Leaf Node +(Approve Loan) (Deny Loan) (Approve Loan) (Deny Loan) +``` +> There are various formulations of the decision tree learning algorithm. Here, we consider just one, called ID3. + +## Appropriate Problems For Decision Tree Learning +In general, decision tree learning works best on issues that have the following characteristics: +1. ***Instances*** are represented by ***key-value pairs*** +2. The ***output values of the target function are discrete***. Each sample is given a Boolean categorization (yes or no) by the decision tree. Learning functions with multiple possible output values can be effortlessly integrated into decision tree approaches. +3. ***Disjunctive descriptions may be required*** +4. The ***training data may contain errors*** – ***Decision tree learning methods are robust to errors,*** both errors in classifications of the training examples and errors in the attribute +values that describe these examples. +5. ***Missing attribute values could be present in the training data.*** Using decision tree approaches is possible even in cases where some training examples have missing values. + +# Decision Tree Algorithm +The decision tree method classifies the data according to a tree structure. The root node, that holds the complete dataset, is where it all begins. The algorithm then determines which feature, according to a certain criterion like information gain or Gini impurity, is appropriate for splitting the dataset. Subsets of the dataset are then created according to the values of the chosen feature. Until a halting condition is satisfied—for example, obtaining a minimal number of samples per leaf node or a maximum tree depth—this procedure is repeated recursively for every subset. + + +### Which Attribute Is the Best Classifier? +- The ID3 algorithm's primary idea is choose which characteristic to test at each tree node. +- Information gain, a statistical feature that quantifies how well a certain attribute divides the training samples into groups based on the target classification. +- When building the tree, ID3 chooses a candidate attribute  using the information gain metric. + +## Entropy & Information + +**Entropy** is a metric that quantifies the level of impurity or uncertainty present in a given dataset. When it comes to decision trees, entropy measures how similar the target variable is within a specific node or subset of the data. It is utilized for assessing the quality of potential splits during the tree construction process. + +The entropy of a node is calculated as: +__Entropy = -Σ(pi * log2(pi))__ + +where `p``i` is the proportion of instances belonging to class `i` in the current node. The entropy is at its maximum when all classes are equally represented in the node, indicating maximum impurity or uncertainty. + +**Information Gain** is a measure used to estimate the possible reduction in entropy achieved by separating the data according to a certain attribute. It quantifies the projected decrease in impurity or uncertainty after the separation. + +The information gain for a feature `A` is calculated as: +__Information Gain = Entropy(parent) - Σ(weight(child) * Entropy(child))__ + +### Example of a Decision Tree +Let us look at a basic decision tree example that predicts a person's likelihood of playing tennis based on climate conditions + +**Data Set:** +--- +| Day | Outlook | Temperature | Humidity | Wind | PlayTennis | +|-----|---------|-------------|----------|------|------------| +| D1 | Sunny | Hot | High | Weak | No | +| D2 | Sunny | Hot | High | Strong | No | +| D3 | Overcast| Hot | High | Weak | Yes | +| D4 | Rain | Mild | High | Weak | Yes | +| D5 | Rain | Cool | Normal | Weak | Yes | +| D6 | Rain | Cool | Normal | Strong | No | +| D7 | Overcast| Cool | Normal | Strong | Yes | +| D8 | Sunny | Mild | High | Weak | No | +| D9 | Sunny | Cool | Normal | Weak | Yes | +| D10 | Rain | Mild | Normal | Weak | Yes | +| D11 | Sunny | Mild | Normal | Strong | Yes | +| D12 | Overcast| Mild | High | Strong | Yes | +| D13 | Overcast| Hot | Normal | Weak | Yes | +| D14 | Rain | Mild | High | Strong | No | +--- + + +1. Calculate the entropy of the entire dataset. +2. For each feature, calculate the information gain by splitting the data based on that feature. +3. Select the feature with the highest information gain to create the root node. +4. Repeat steps 1-3 for each child node until a stopping criterion is met (e.g., all instances in a node belong to the same class, or the maximum depth is reached). + +Let's start with calculating the entropy of the entire dataset: +Total instances: 14 +No instances: 5 +Yes instances: 9 + +**Entropy** = -((5/14) * log2(5/14) + (9/14) * log2(9/14)) = 0.940 + +Now, we'll calculate the information gain for each feature: + +**Outlook**: +- Sunny: 2 No, 3 Yes (Entropy = 0.971) +- Overcast: 0 No, 4 Yes (Entropy = 0) +- Rain: 3 No, 2 Yes (Entropy = 0.971) + +Information Gain = 0.940 - ((5/14) * 0.971 + (4/14) * 0 + (5/14) * 0.971) = 0.246 + +**Temperature**: +- Hot: 2 No, 2 Yes (Entropy = 1) +- Mild: 2 No, 4 Yes (Entropy = 0.811) +- Cool: 1 No, 3 Yes (Entropy = 0.918) + +Information Gain = 0.940 - ((4/14) * 1 + (6/14) * 0.811 + (4/14) * 0.918) = 0.029 + +**Humidity**: +- High: 3 No, 4 Yes (Entropy = 0.985) +- Normal: 2 No, 5 Yes (Entropy = 0.971) + +Information Gain = 0.940 - ((7/14) * 0.985 + (7/14) * 0.971) = 0.012 + +**Wind**: +- Weak: 2 No, 6 Yes (Entropy = 0.811) +- Strong: 3 No, 3 Yes (Entropy = 1) + +Information Gain = 0.940 - ((8/14) * 0.811 + (6/14) * 1) = 0.048 + +The feature with the highest information gain is Outlook, so we'll create the root node based on that. + +**Step 1: Root Node (Outlook)** +``` + Root Node (Outlook) + / | \ + Sunny Overcast Rain + Entropy: 0.971 Entropy: 0 Entropy: 0.971 + 5 instances 4 instances 5 instances + +``` + +Now, we'll continue building the tree by recursively splitting the child nodes based on the feature with the highest information gain within each subset. + +**Step 2: Splitting Sunny and Rain Nodes** + +For the Sunny node: +- Temperature: + - Hot: 2 No, 0 Yes (Entropy = 0) + - Mild: 0 No, 3 Yes (Entropy = 0) + - Cool: 0 No, 0 Yes (Entropy = 0) + Information Gain = 0.971 + +- Humidity: + - High: 1 No, 2 Yes (Entropy = 0.918) + - Normal: 1 No, 1 Yes (Entropy = 1) + Information Gain = 0.153 + +- Wind: + - Weak: 1 No, 2 Yes (Entropy = 0.918) + - Strong: 1 No, 1 Yes (Entropy = 1) + Information Gain = 0.153 + +The highest information gain is achieved by splitting on Temperature, so we'll create child nodes for Sunny based on Temperature. + +For the Rain node: +- Wind: + - Weak: 1 No, 3 Yes (Entropy = 0.918) + - Strong: 2 No, 0 Yes (Entropy = 0) + Information Gain = 0.153 + +Since there is only one feature left (Wind), we'll create child nodes for Rain based on Wind. + +**Step 3: Updated Decision Tree** +``` + Root Node (Outlook) + / | \ + Sunny Overcast Rain + / | \ Entropy: 0 / \ + Hot Mild Cool 4 instances Weak Strong + Entropy: 0 Entropy: 0 Entropy: 0.918 Entropy: 0 + 2 instances 3 instances 4 instances 1 instance +``` +At this point, all leaf nodes are either pure (entropy = 0) or have instances belonging to a single class. Therefore, we can stop the tree construction process. + +**Step 4: Pruning the Decision Tree** + +The decision tree we constructed in the previous steps is a complete tree that perfectly classifies the training data. However, this can lead to overfitting, meaning the tree may perform poorly on new, unseen data due to its complexity and memorization of noise in the training set. + +To address this, we can prune the tree by removing some of the leaf nodes or branches that contribute little to the overall classification accuracy. Pruning helps to generalize the tree and improve its performance on unseen data. + +There are various pruning techniques, such as: + +1. **Pre-pruning**: Stopping the tree growth based on a pre-defined criterion (e.g., maximum depth, minimum instances in a node, etc.). +2. **Post-pruning**: Growing the tree to its full depth and then removing subtrees or branches based on a pruning criterion. + +>We can observe that the "Cool" node under the "Sunny" branch has no instances in the training data. Removing this node will not affect the classification accuracy on the training set, and it may help generalize the tree better. + +**Step 5: Pruned Decision Tree** +``` + Root Node (Outlook) + / | \ + / | \ + Sunny Overcast Rain + / \ Entropy: 0 / \ + Hot Mild 4 instances Weak Strong +Entropy: 0 Entropy: 0.918 Entropy: 0 Entropy: 0 + 2 instances 4 instances 3 instances 2 instances +``` + +**Step 6: Visualizing the Decision Tree** + +Decision trees can be visualized graphically to provide a clear representation of the hierarchical structure and the decision rules. This visualization can aid in understanding the tree's logic and interpreting the results. + +There are various tools and libraries available for visualizing decision trees. One popular library in Python is `graphviz`, which can create tree-like diagrams and visualizations. + +Here's an example of how to visualize our pruned decision tree using `graphviz` in Python: + +```python +import graphviz +from sklearn import tree + +# Create a decision tree classifier +decision_tree_classifier = tree.DecisionTreeClassifier() + +# Train the classifier on the dataset X and labels y +decision_tree_classifier.fit(X, y) + +# Visualize the decision tree +tree_dot_data = tree.export_graphviz(decision_tree_classifier, out_file=None, + feature_names=['Outlook', 'Temperature', 'Humidity', 'Wind'], + class_names=['No', 'Yes'], filled=True, rounded=True, special_characters=True) + +# Create a graph from the DOT data +graph = graphviz.Source(tree_dot_data) + +# Render and save the decision tree as an image file +graph.render("decision_tree") + +``` +``` + Outlook + / | \ + Sunny Overcast Rain + / | / \ + Humidity Yes Wind Wind + / \ / \ +High Normal Weak Strong + No Yes Yes No +``` + +The final decision tree classifies instances based on the following rules: + +- If Outlook is Overcast, PlayTennis is Yes +- If Outlook is Sunny and Temperature is Hot, PlayTennis is No +- If Outlook is Sunny and Temperature is Mild, PlayTennis is Yes +- If Outlook is Sunny and Temperature is Cool, PlayTennis is Yes (no instances in the dataset) +- If Outlook is Rain and Wind is Weak, PlayTennis is Yes +- If Outlook is Rain and Wind is Strong, PlayTennis is No + +> Note that the calculated entropies and information gains may vary slightly depending on the specific implementation and rounding methods used. diff --git a/contrib/machine-learning/eda.md b/contrib/machine-learning/eda.md new file mode 100644 index 00000000..1559a099 --- /dev/null +++ b/contrib/machine-learning/eda.md @@ -0,0 +1,184 @@ +# Exploratory Data Analysis + +Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. EDA is used to understand the data, get a sense of the data, and to identify relationships between variables. EDA is a crucial step in the data analysis process and should be done before building a model. + +## Why is EDA important? + +1. **Understand the data**: EDA helps to understand the data, its structure, and its characteristics. + +2. **Identify patterns and relationships**: EDA helps to identify patterns and relationships between variables. + +3. **Detect outliers and anomalies**: EDA helps to detect outliers and anomalies in the data. + +4. **Prepare data for modeling**: EDA helps to prepare the data for modeling by identifying missing values, handling missing values, and transforming variables. + +## Steps in EDA + +1. **Data Collection**: Collect the data from various sources. + +2. **Data Cleaning**: Clean the data by handling missing values, removing duplicates, and transforming variables. + +3. **Data Exploration**: Explore the data by visualizing the data, summarizing the data, and identifying patterns and relationships. + +4. **Data Analysis**: Analyze the data by performing statistical analysis, hypothesis testing, and building models. + +5. **Data Visualization**: Visualize the data using various plots and charts to understand the data better. + +## Tools for EDA + +1. **Python**: Python is a popular programming language for data analysis and has many libraries for EDA, such as Pandas, NumPy, Matplotlib, Seaborn, and Plotly. + +2. **Jupiter Notebook**: Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. + +## Techniques for EDA + +1. **Descriptive Statistics**: Descriptive statistics summarize the main characteristics of a data set, such as mean, median, mode, standard deviation, and variance. + +2. **Data Visualization**: Data visualization is the graphical representation of data to understand the data better, such as histograms, scatter plots, box plots, and heat maps. + +3. **Correlation Analysis**: Correlation analysis is used to measure the strength and direction of the relationship between two variables. + +4. **Hypothesis Testing**: Hypothesis testing is used to test a hypothesis about a population parameter based on sample data. + +5. **Dimensionality Reduction**: Dimensionality reduction is the process of reducing the number of variables in a data set while retaining as much information as possible. + +6. **Clustering Analysis**: Clustering analysis is used to group similar data points together based on their characteristics. + +## Commonly Used Techniques in EDA + +1. **Uni-variate Analysis**: Uni-variate analysis is the simplest form of data analysis that involves analyzing a single variable at a time. + +2. **Bi-variate Analysis**: Bi-variate analysis involves analyzing two variables at a time to understand the relationship between them. + +3. **Multi-variate Analysis**: Multi-variate analysis involves analyzing more than two variables at a time to understand the relationship between them. + +## Understand with an Example + +Let's understand EDA with an example. Here we use a famous dataset called Iris dataset. + +The dataset consists of 150 samples of iris flowers, where each sample represents measurements of four features (variables) for three species of iris flowers. + +The four features measured are : +Sepal length (in cm) Sepal width (in cm) Petal length (in cm) Petal width (in cm). + +The three species of iris flowers included in the dataset are : +**Setosa**, **Versicolor**, **Virginica** + +```python +# Import libraries +import pandas as pd +import numpy as np +import matplotlib.pyplot as plt +import seaborn as sns +from sklearn import datasets + +# Load the Iris dataset +iris = datasets.load_iris() +df = pd.DataFrame(iris.data, columns=iris.feature_names) +df.head() +``` + +| Sepal Length (cm) | Sepal Width (cm) | Petal Length (cm) | Petal Width (cm) | +|-------------------|------------------|-------------------|------------------| +| 5.1 | 3.5 | 1.4 | 0.2 | +| 4.9 | 3.0 | 1.4 | 0.2 | +| 4.7 | 3.2 | 1.3 | 0.2 | +| 4.6 | 3.1 | 1.5 | 0.2 | +| 5.0 | 3.6 | 1.4 | 0.2 | + + +### Uni-variate Analysis + +```python +# Uni-variate Analysis +df_setosa=df.loc[df['species']=='setosa'] +df_virginica=df.loc[df['species']=='virginica'] +df_versicolor=df.loc[df['species']=='versicolor'] + +plt.plot(df_setosa['sepal_length']) +plt.plot(df_virginica['sepal_length']) +plt.plot(df_versicolor['sepal_length']) +plt.xlabel('sepal length') +plt.show() +``` +![Uni-variate Analysis](assets/eda/uni-variate-analysis1.png) + +```python +plt.hist(df_setosa['petal_length']) +plt.hist(df_virginica['petal_length']) +plt.hist(df_versicolor['petal_length']) +plt.xlabel('petal length') +plt.show() +``` +![Uni-variate Analysis](assets/eda/uni-variate-analysis2.png) + +### Bi-variate Analysis + +```python +# Bi-variate Analysis +sns.FacetGrid(df,hue="species",height=5).map(plt.scatter,"petal_length","sepal_width").add_legen() +plt.show() +``` +![Bi-variate Analysis](assets/eda/bi-variate-analysis.png) + +### Multi-variate Analysis + +```python +# Multi-variate Analysis +sns.pairplot(df,hue="species",height=3) +``` +![Multi-variate Analysis](assets/eda/multi-variate-analysis.png) + +### Correlation Analysis + +```python +# Correlation Analysis +corr_matrix = df.corr() +sns.heatmap(corr_matrix) +``` +| | sepal_length | sepal_width | petal_length | petal_width | +|-------------|--------------|-------------|--------------|-------------| +| sepal_length| 1.000000 | -0.109369 | 0.871754 | 0.817954 | +| sepal_width | -0.109369 | 1.000000 | -0.420516 | -0.356544 | +| petal_length| 0.871754 | -0.420516 | 1.000000 | 0.962757 | +| petal_width | 0.817954 | -0.356544 | 0.962757 | 1.000000 | + +![Correlation Analysis](assets/eda/correlation-analysis.png) + +## Exploratory Data Analysis (EDA) Report on Iris Dataset + +### Introduction +The Iris dataset consists of 150 samples of iris flowers, each characterized by four features: Sepal Length, Sepal Width, Petal Length, and Petal Width. These samples belong to three species of iris flowers: Setosa, Versicolor, and Virginica. In this EDA report, we explore the dataset to gain insights into the characteristics and relationships among the features and species. + +### Uni-variate Analysis +Uni-variate analysis examines each variable individually. +- Sepal Length: The distribution of Sepal Length varies among the different species, with Setosa generally having shorter sepals compared to Versicolor and Virginica. +- Petal Length: Setosa tends to have shorter petal lengths, while Versicolor and Virginica have relatively longer petal lengths. + +### Bi-variate Analysis +Bi-variate analysis explores the relationship between two variables. +- Petal Length vs. Sepal Width: There is a noticeable separation between species, especially Setosa, which typically has shorter and wider sepals compared to Versicolor and Virginica. +- This analysis suggests potential patterns distinguishing the species based on these two features. + +### Multi-variate Analysis +Multi-variate analysis considers interactions among multiple variables simultaneously. +- Pairplot: The pairplot reveals distinctive clusters for each species, particularly in the combinations of Petal Length and Petal Width, indicating clear separation among species based on these features. + +### Correlation Analysis +Correlation analysis examines the relationship between variables. +- Correlation Heatmap: There are strong positive correlations between Petal Length and Petal Width, as well as between Petal Length and Sepal Length. Sepal Width shows a weaker negative correlation with Petal Length and Petal Width. + +### Insights +1. Petal dimensions (length and width) exhibit strong correlations, suggesting that they may collectively contribute more significantly to distinguishing between iris species. +2. Setosa tends to have shorter and wider sepals compared to Versicolor and Virginica. +3. The combination of Petal Length and Petal Width appears to be a more effective discriminator among iris species, as indicated by the distinct clusters observed in multi-variate analysis. + +### Conclusion +Through comprehensive exploratory data analysis, we have gained valuable insights into the Iris dataset, highlighting key characteristics and relationships among features and species. Further analysis and modeling could leverage these insights to develop robust classification models for predicting iris species based on their measurements. + +## Conclusion + +Exploratory Data Analysis (EDA) is a critical step in the data analysis process that helps to understand the data, identify patterns and relationships, detect outliers, and prepare the data for modeling. By using various techniques and tools, such as descriptive statistics, data visualization, correlation analysis, and hypothesis testing, EDA provides valuable insights into the data, enabling data scientists to make informed decisions and build accurate models. + + + diff --git a/contrib/machine-learning/ensemble-learning.md b/contrib/machine-learning/ensemble-learning.md new file mode 100644 index 00000000..508f45e7 --- /dev/null +++ b/contrib/machine-learning/ensemble-learning.md @@ -0,0 +1,140 @@ +# Ensemble Learning + +Ensemble Learning is a powerful machine learning paradigm that combines multiple models to achieve better performance than any individual model. The idea is to leverage the strengths of different models to improve overall accuracy, robustness, and generalization. + + + +## Introduction + +Ensemble Learning is a technique that combines the predictions from multiple machine learning models to make more accurate and robust predictions than a single model. It leverages the diversity of different models to reduce errors and improve performance. + +## Types of Ensemble Learning + +### Bagging + +Bagging, or Bootstrap Aggregating, involves training multiple versions of the same model on different subsets of the training data and averaging their predictions. The most common example of bagging is the `RandomForest` algorithm. + +### Boosting + +Boosting focuses on training models sequentially, where each new model corrects the errors made by the previous ones. This way, the ensemble learns from its mistakes, leading to improved performance. `AdaBoost` and `Gradient Boosting` are popular examples of boosting algorithms. + +### Stacking + +Stacking involves training multiple models (the base learners) and a meta-model that combines their predictions. The base learners are trained on the original dataset, while the meta-model is trained on the outputs of the base learners. This approach allows leveraging the strengths of different models. + +## Advantages and Disadvantages + +### Advantages + +- **Improved Accuracy**: Combines the strengths of multiple models. +- **Robustness**: Reduces the risk of overfitting and model bias. +- **Versatility**: Can be applied to various machine learning tasks, including classification and regression. + +### Disadvantages + +- **Complexity**: More complex than individual models, making interpretation harder. +- **Computational Cost**: Requires more computational resources and training time. +- **Implementation**: Can be challenging to implement and tune effectively. + +## Key Concepts + +- **Diversity**: The models in the ensemble should be diverse to benefit from their different strengths. +- **Voting/Averaging**: For classification, majority voting is used to combine predictions. For regression, averaging is used. +- **Weighting**: In some ensembles, models are weighted based on their accuracy or other metrics. + +## Code Examples + +### Bagging with Random Forest + +Below is an example of using Random Forest for classification on the Iris dataset. + +```python +import numpy as np +import pandas as pd +from sklearn.datasets import load_iris +from sklearn.ensemble import RandomForestClassifier +from sklearn.model_selection import train_test_split +from sklearn.metrics import accuracy_score, classification_report + +# Load dataset +iris = load_iris() +X, y = iris.data, iris.target + +# Split dataset +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) + +# Initialize Random Forest model +clf = RandomForestClassifier(n_estimators=100, random_state=42) + +# Train the model +clf.fit(X_train, y_train) + +# Make predictions +y_pred = clf.predict(X_test) + +# Evaluate the model +accuracy = accuracy_score(y_test, y_pred) +print(f"Accuracy: {accuracy * 100:.2f}%") +print("Classification Report:\n", classification_report(y_test, y_pred)) +``` + +### Boosting with AdaBoost +Below is an example of using AdaBoost for classification on the Iris dataset. + +``` +from sklearn.ensemble import AdaBoostClassifier +from sklearn.tree import DecisionTreeClassifier + +# Initialize base model +base_model = DecisionTreeClassifier(max_depth=1) + +# Initialize AdaBoost model +ada_clf = AdaBoostClassifier(base_estimator=base_model, n_estimators=50, random_state=42) + +# Train the model +ada_clf.fit(X_train, y_train) + +# Make predictions +y_pred = ada_clf.predict(X_test) + +# Evaluate the model +accuracy = accuracy_score(y_test, y_pred) +print(f"Accuracy: {accuracy * 100:.2f}%") +print("Classification Report:\n", classification_report(y_test, y_pred)) +``` + +### Stacking with Multiple Models +Below is an example of using stacking with multiple models for classification on the Iris dataset. + +``` +from sklearn.linear_model import LogisticRegression +from sklearn.neighbors import KNeighborsClassifier +from sklearn.svm import SVC +from sklearn.ensemble import StackingClassifier + +# Define base models +base_models = [ + ('knn', KNeighborsClassifier(n_neighbors=5)), + ('svc', SVC(kernel='linear', probability=True)) +] + +# Define meta-model +meta_model = LogisticRegression() + +# Initialize Stacking model +stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_model, cv=5) + +# Train the model +stacking_clf.fit(X_train, y_train) + +# Make predictions +y_pred = stacking_clf.predict(X_test) + +# Evaluate the model +accuracy = accuracy_score(y_test, y_pred) +print(f"Accuracy: {accuracy * 100:.2f}%") +print("Classification Report:\n", classification_report(y_test, y_pred)) +``` + +## Conclusion +Ensemble Learning is a powerful technique that combines multiple models to improve overall performance. By leveraging the strengths of different models, it provides better accuracy, robustness, and generalization. However, it comes with increased complexity and computational cost. Understanding and implementing ensemble methods can significantly enhance machine learning solutions. diff --git a/contrib/machine-learning/grid-search.md b/contrib/machine-learning/grid-search.md new file mode 100644 index 00000000..ae44412f --- /dev/null +++ b/contrib/machine-learning/grid-search.md @@ -0,0 +1,71 @@ +# Grid Search + +Grid Search is a hyperparameter tuning technique in Machine Learning that helps to find the best combination of hyperparameters for a given model. It works by defining a grid of hyperparameters and then training the model with all the possible combinations of hyperparameters to find the best performing set. + +The Grid Search Method considers some hyperparameter combinations and selects the one returning a lower error score. This method is specifically useful when there are only some hyperparameters in order to optimize. However, it is outperformed by other weighted-random search methods when the Machine Learning model grows in complexity. + +## Implementation + +Before applying Grid Searching on any algorithm, data is divided into training and validation set, a validation set is used to validate the models. A model with all possible combinations of hyperparameters is tested on the validation set to choose the best combination. + +Grid Searching can be applied to any hyperparameters algorithm whose performance can be improved by tuning hyperparameter. For example, we can apply grid searching on K-Nearest Neighbors by validating its performance on a set of values of K in it. Same thing we can do with Logistic Regression by using a set of values of learning rate to find the best learning rate at which Logistic Regression achieves the best accuracy. + +Let us consider that the model accepts the below three parameters in the form of input: +1. Number of hidden layers `[2, 4]` +2. Number of neurons in every layer `[5, 10]` +3. Number of epochs `[10, 50]` + +If we want to try out two options for every parameter input (as specified in square brackets above), it estimates different combinations. For instance, one possible combination can be `[2, 5, 10]`. Finding such combinations manually would be a headache. + +Now, suppose that we had ten different parameters as input, and we would like to try out five possible values for each and every parameter. It would need manual input from the programmer's end every time we like to alter the value of a parameter, re-execute the code, and keep a record of the outputs for every combination of the parameters. + +Grid Search automates that process, as it accepts the possible value for every parameter and executes the code in order to try out each and every possible combination outputs the result for the combinations and outputs the combination having the best accuracy. + +Higher values of C tell the model, the training data resembles real world information, place a greater weight on the training data. While lower values of C do the opposite. + +## Explaination of the Code + +The code provided performs hyperparameter tuning for a Logistic Regression model using a manual grid search approach. It evaluates the model's performance for different values of the regularization strength hyperparameter C on the Iris dataset. +1. datasets from sklearn is imported to load the Iris dataset. +2. LogisticRegression from sklearn.linear_model is imported to create and fit the logistic regression model. +3. The Iris dataset is loaded, with X containing the features and y containing the target labels. +4. A LogisticRegression model is instantiated with max_iter=10000 to ensure convergence during the fitting process, as the default maximum iterations (100) might not be sufficient. +5. A list of different values for the regularization strength C is defined. The hyperparameter C controls the regularization strength, with smaller values specifying stronger regularization. +6. An empty list scores is initialized to store the model's performance scores for different values of C. +7. A for loop iterates over each value in the C list: +8. logit.set_params(C=choice) sets the C parameter of the logistic regression model to the current value in the loop. +9. logit.fit(X, y) fits the logistic regression model to the entire Iris dataset (this is typically done on training data in a real scenario, not the entire dataset). +10. logit.score(X, y) calculates the accuracy of the fitted model on the dataset and appends this score to the scores list. +11. After the loop, the scores list is printed, showing the accuracy for each value of C. + +### Python Code + +```python +from sklearn import datasets +from sklearn.linear_model import LogisticRegression + +iris = datasets.load_iris() +X = iris['data'] +y = iris['target'] + +logit = LogisticRegression(max_iter = 10000) + +C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2] + +scores = [] +for choice in C: + logit.set_params(C=choice) + logit.fit(X, y) + scores.append(logit.score(X, y)) +print(scores) +``` + +#### Results + +``` +[0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667] +``` + +We can see that the lower values of `C` performed worse than the base parameter of `1`. However, as we increased the value of `C` to `1.75` the model experienced increased accuracy. + +It seems that increasing `C` beyond this amount does not help increase model accuracy. diff --git a/contrib/machine-learning/hierarchical-clustering.md b/contrib/machine-learning/hierarchical-clustering.md new file mode 100644 index 00000000..93822703 --- /dev/null +++ b/contrib/machine-learning/hierarchical-clustering.md @@ -0,0 +1,99 @@ +# Hierarchical Clustering + +Hierarchical Clustering is a method of cluster analysis that seeks to build a hierarchy of clusters. This README provides an overview of the hierarchical clustering algorithm, including its fundamental concepts, types, steps, and how to implement it using Python. + +## Introduction + +Hierarchical Clustering is an unsupervised learning method used to group similar objects into clusters. Unlike other clustering techniques, hierarchical clustering does not require the number of clusters to be specified beforehand. It produces a tree-like structure called a dendrogram, which displays the arrangement of the clusters and their sub-clusters. + +## Concepts + +### Dendrogram + +A dendrogram is a tree-like diagram that records the sequences of merges or splits. It is a useful tool for visualizing the process of hierarchical clustering. + +### Distance Measure + +Distance measures are used to quantify the similarity or dissimilarity between data points. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity. + +### Linkage Criteria + +Linkage criteria determine how the distance between clusters is calculated. Different linkage criteria include single linkage, complete linkage, average linkage, and Ward's linkage. + +## Types of Hierarchical Clustering + +1. **Agglomerative Clustering (Bottom-Up Approach)**: + - Starts with each data point as a separate cluster. + - Repeatedly merges the closest pairs of clusters until only one cluster remains or a stopping criterion is met. + +2. **Divisive Clustering (Top-Down Approach)**: + - Starts with all data points in a single cluster. + - Repeatedly splits clusters into smaller clusters until each data point is its own cluster or a stopping criterion is met. + +## Steps in Hierarchical Clustering + +1. **Calculate Distance Matrix**: Compute the distance between each pair of data points. +2. **Create Clusters**: Treat each data point as a single cluster. +3. **Merge Closest Clusters**: Find the two clusters that are closest to each other and merge them into a single cluster. +4. **Update Distance Matrix**: Update the distance matrix to reflect the distance between the new cluster and the remaining clusters. +5. **Repeat**: Repeat steps 3 and 4 until all data points are merged into a single cluster or the desired number of clusters is achieved. + +## Linkage Criteria + +1. **Single Linkage (Minimum Linkage)**: The distance between two clusters is defined as the minimum distance between any single data point in the first cluster and any single data point in the second cluster. +2. **Complete Linkage (Maximum Linkage)**: The distance between two clusters is defined as the maximum distance between any single data point in the first cluster and any single data point in the second cluster. +3. **Average Linkage**: The distance between two clusters is defined as the average distance between all pairs of data points, one from each cluster. +4. **Ward's Linkage**: The distance between two clusters is defined as the increase in the sum of squared deviations from the mean when the two clusters are merged. + +## Implementation + +### Using Scikit-learn + +Scikit-learn is a popular machine learning library in Python that provides tools for hierarchical clustering. + +### Code Example + +```python +import numpy as np +import pandas as pd +import matplotlib.pyplot as plt +from scipy.cluster.hierarchy import dendrogram, linkage +from sklearn.cluster import AgglomerativeClustering +from sklearn.preprocessing import StandardScaler + +# Load dataset +data = pd.read_csv('path/to/your/dataset.csv') + +# Preprocess the data +scaler = StandardScaler() +data_scaled = scaler.fit_transform(data) + +# Perform hierarchical clustering +Z = linkage(data_scaled, method='ward') + +# Plot the dendrogram +plt.figure(figsize=(10, 7)) +dendrogram(Z) +plt.title('Dendrogram') +plt.xlabel('Data Points') +plt.ylabel('Distance') +plt.show() + +# Perform Agglomerative Clustering +agg_clustering = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward') +labels = agg_clustering.fit_predict(data_scaled) + +# Add cluster labels to the original data +data['Cluster'] = labels +print(data.head()) +``` + +## Evaluation Metrics + +- **Silhouette Score**: Measures how similar a data point is to its own cluster compared to other clusters. +- **Cophenetic Correlation Coefficient**: Measures how faithfully a dendrogram preserves the pairwise distances between the original data points. +- **Dunn Index**: Ratio of the minimum inter-cluster distance to the maximum intra-cluster distance. + +## Conclusion + +Hierarchical clustering is a versatile and intuitive method for clustering data. It is particularly useful when the number of clusters is not known beforehand. By understanding the different linkage criteria and evaluation metrics, one can effectively apply hierarchical clustering to various types of data. diff --git a/contrib/machine-learning/index.md b/contrib/machine-learning/index.md new file mode 100644 index 00000000..7ee61ebe --- /dev/null +++ b/contrib/machine-learning/index.md @@ -0,0 +1,30 @@ +# List of sections + +- [Introduction to scikit-learn](sklearn-introduction.md) +- [Binomial Distribution](binomial-distribution.md) +- [Naive Bayes](naive-bayes.md) +- [Regression in Machine Learning](regression.md) +- [Polynomial Regression](polynomial-regression.md) +- [Confusion Matrix](confusion-matrix.md) +- [Decision Tree Learning](decision-tree.md) +- [Random Forest](random-forest.md) +- [Support Vector Machine Algorithm](support-vector-machine.md) +- [Ensemble Learning](ensemble-learning.md) +- [Types of optimizers](types-of-optimizers.md) +- [Logistic Regression](logistic-regression.md) +- [Types_of_Cost_Functions](cost-functions.md) +- [Clustering](clustering.md) +- [Hierarchical Clustering](hierarchical-clustering.md) +- [Grid Search](grid-search.md) +- [K-Means](kmeans.md) +- [K-nearest neighbor (KNN)](knn.md) +- [Xgboost](xgboost.md) +- [Artificial Neural Network from the Ground Up](ann.md) +- [Introduction To Convolutional Neural Networks (CNNs)](intro-to-cnn.md) +- [TensorFlow](tensorflow.md) +- [PyTorch](pytorch.md) +- [PyTorch Fundamentals](pytorch-fundamentals.md) +- [Transformers](transformers.md) +- [Reinforcement Learning](reinforcement-learning.md) +- [Neural network regression](neural-network-regression.md) +- [Exploratory Data Analysis](eda.md) diff --git a/contrib/machine-learning/intro-to-cnn.md b/contrib/machine-learning/intro-to-cnn.md new file mode 100644 index 00000000..0221ca10 --- /dev/null +++ b/contrib/machine-learning/intro-to-cnn.md @@ -0,0 +1,225 @@ +# Understanding Convolutional Neural Networks (CNN) + +## Introduction +Convolutional Neural Networks (CNNs) are a specialized type of artificial neural network designed primarily for processing structured grid data like images. CNNs are particularly powerful for tasks involving image recognition, classification, and computer vision. They have revolutionized these fields, outperforming traditional neural networks by leveraging their unique architecture to capture spatial hierarchies in images. + +### Why CNNs are Superior to Traditional Neural Networks +1. **Localized Receptive Fields**: CNNs use convolutional layers that apply filters to local regions of the input image. This localized connectivity ensures that the network learns spatial hierarchies and patterns, such as edges and textures, which are essential for image recognition tasks. +2. **Parameter Sharing**: In CNNs, the same filter (set of weights) is used across different parts of the input, significantly reducing the number of parameters compared to fully connected layers in traditional neural networks. This not only lowers the computational cost but also mitigates the risk of overfitting. +3. **Translation Invariance**: Due to the shared weights and pooling operations, CNNs are inherently invariant to translations of the input image. This means that they can recognize objects even when they appear in different locations within the image. +4. **Hierarchical Feature Learning**: CNNs automatically learn a hierarchy of features from low-level features like edges to high-level features like shapes and objects. Traditional neural networks, on the other hand, require manual feature extraction which is less effective and more time-consuming. + +### Use Cases of CNNs +- **Image Classification**: Identifying objects within an image (e.g., classifying a picture as containing a cat or a dog). +- **Object Detection**: Detecting and locating objects within an image (e.g., finding faces in a photo). +- **Image Segmentation**: Partitioning an image into segments or regions (e.g., dividing an image into different objects and background). +- **Medical Imaging**: Analyzing medical scans like MRI, CT, and X-rays for diagnosis. + +> This guide will walk you through the fundamentals of CNNs and their implementation in Python. We'll build a simple CNN from scratch, explaining each component to help you understand how CNNs process images and extract features. + +### Let's start by understanding the basic architecture of CNNs. + +## CNN Architecture +Convolution layers, pooling layers, and fully connected layers are just a few of the many building blocks that CNNs use to automatically and adaptively learn spatial hierarchies of information through backpropagation. + +### Convolutional Layer +The convolutional layer is the core building block of a CNN. The layer's parameters consist of a set of learnable filters (or kernels), which have a small receptive field but extend through the full depth of the input volume. + +#### Input Shape +The dimensions of the input image, including the number of channels (e.g., 3 for RGB images & 1 for Grayscale images). +![image](assets/cnn-input_shape.png) + +- The input matrix is a binary image of handwritten digits, +where '1' marks the pixels containing the digit (ink/grayscale area) and '0' marks the background pixels (empty space). +- The first matrix shows the represnetation of 1 and 0, which can be depicted as a vertical line and a closed loop. +- The second matrix represents 9, combining the loop and line. + +#### Strides +The step size with which the filter moves across the input image. +![image](assets/cnn-strides.png) + +- This visualization will help you understand how the filter (kernel) moves acroos the input matrix with stride values of (3,3) and (2,2). +- A stride of 1 means the filter moves one step at a time, ensuring it covers the entire input matrix. +- However, with larger strides (like 3 or 2 in this example), the filter may not cover all elements, potentially missing some information. +- While this might seem like a drawback, higher strides are often used to reduce computational cost and decrease the output size, which can be beneficial in speeding up the training process and preventing overfitting. + +#### Padding +Determines whether the output size is the same as the input size ('same') or reduced ('valid'). +![image](assets/cnn-padding.png) + +- `Same` padding is preferred in earlier layers to preserve spatial and edge information, as it can help the network learn more detailed features. +- Choose `valid` padding when focusing on the central input region or requiring specific output dimensions. +- Padding value can be determined by $ ( f - 1 ) \over 2 $, where f isfilter size + +#### Filters +Small matrices that slide over the input data to extract features. +![image](assets/cnn-filters.png) + +- The first filter aims to detect closed loops within the input image, being highly relevant for recognizing digits with circular or oval shapes, such as '0', '6', '8', or '9'. +- The next filter helps in detecting vertical lines, crucial for identifying digits like '1', '4', '7', and parts of other digits that contain vertical strokes. +- The last filter shows how to detect diagonal lines in the input image, useful for identifying the slashes present in digits like '1', '7', or parts of '4' and '9'. + +#### Output +A set of feature maps that represent the presence of different features in the input. +![image](assets/cnn-ouputs.png) + +- With no padding and a stride of 1, the 3x3 filter moves one step at a time across the 7x5 input matrix. The filter can only move within the original boundaries of the input, resulting in a smaller 5x3 output matrix. This configuration is useful when you want to reduce the spatial dimensions of the feature map while preserving the exact spatial relationships between features. +- By adding zero padding to the input matrix, it is expanded to 9x7, allowing the 3x3 filter to "fit" fully on the edges and corners. With a stride of 1, the filter still moves one step at a time, but now the output matrix is the same size (7x5) as the original input. Same padding is often preferred in early layers of a CNN to preserve spatial information and avoid rapid feature map shrinkage. +- Without padding, the 3x3 filter operates within the original input matrix boundaries, but now it moves two steps at a time (stride 2). This significantly reduces the output matrix size to 3x2. Larger strides are employed to decrease computational cost and the output size, which can be beneficial in speeding up the training process and preventing overfitting. However, they might miss some finer details due to the larger jumps. +- The output dimension of a CNN model is given by, $$ n_{out} = { n_{in} + (2 \cdot p) - k \over s } $$ +where, + nin = number of input features + p = padding + k = kernel size + s = stride + +- Also, the number of trainable parameters for each layer is given by, $ (n_c \cdot [k \cdot k] \cdot f) + f $ +where, + nc = number of input channels + k x k = kernel size + f = number of filters + an additional f is added for bias + +### Pooling Layer +Pooling layers reduce the dimensionality of each feature map while retaining the most critical information. The most common form of pooling is max pooling. +- **Input Shape:** The dimensions of the feature map from the convolutional layer. +- **Pooling Size:** The size of the pooling window (e.g., 2x2). +- **Strides:** The step size for the pooling operation. +- **Output:** A reduced feature map highlighting the most important features. +
+ +
+ +- The high values (8) indicate that the "closed loop" filter found a strong match in those regions. +- First matrix of size 6x4 represents a downsampled version of the input. +- While the second matrix with 3x2, resulting in more aggressive downsampling. + +### Flatten Layer +The flatten layer converts the 2D matrix data to a 1D vector, which can be fed into a fully connected (dense) layer. +- **Input Shape:** The 2D feature maps from the previous layer. +- **Output:** A 1D vector that represents the same data in a flattened format. +![image](assets/cnn-flattened.png) + +### Dropout Layer +Dropout is a regularization technique to prevent overfitting in neural networks by randomly setting a fraction of input units to zero at each update during training time. +- **Input Shape:** The data from the previous layer. +- **Dropout Rate:** The fraction of units to drop (e.g., 0.5 for 50% dropout). +- **Output:** The same shape as the input, with some units set to zero. +![image](assets/cnn-dropout.png) + +- The updated 0 values represents the dropped units. + +## Implementation + +Below is the implementation of a simple CNN in Python. Each function within the `CNN` class corresponds to a layer in the network. + +```python +import numpy as np + +class CNN: + def __init__(self): + pass + + def convLayer(self, input_shape, channels, strides, padding, filter_size): + height, width = input_shape + input_shape_with_channels = (height, width, channels) + print("Input Shape (with channels):", input_shape_with_channels) + + # Generate random input and filter matrices + input_matrix = np.random.randint(0, 10, size=input_shape_with_channels) + filter_matrix = np.random.randint(0, 5, size=(filter_size[0], filter_size[1], channels)) + + print("\nInput Matrix:\n", input_matrix[:, :, 0]) + print("\nFilter Matrix:\n", filter_matrix[:, :, 0]) + + padding = padding.lower() + + if padding == 'same': + # Calculate padding needed for each dimension + pad_height = filter_size[0] // 2 + pad_width = filter_size[1] // 2 + + # Apply padding to the input matrix + input_matrix = np.pad(input_matrix, ((pad_height, pad_height), (pad_width, pad_width), (0, 0)), mode='constant') + + # Adjust height and width to consider the padding + height += 2 * pad_height + width += 2 * pad_width + + elif padding == 'valid': + pass + + else: + return "Invalid Padding!!" + + # Output dimensions + conv_height = (height - filter_size[0]) // strides[0] + 1 + conv_width = (width - filter_size[1]) // strides[1] + 1 + output_matrix = np.zeros((conv_height, conv_width, channels)) + + # Convolution Operation + for i in range(0, height - filter_size[0] + 1, strides[0]): + for j in range(0, width - filter_size[1] + 1, strides[1]): + receptive_field = input_matrix[i:i + filter_size[0], j:j + filter_size[1], :] + output_matrix[i // strides[0], j // strides[1], :] = np.sum(receptive_field * filter_matrix, axis=(0, 1, 2)) + + return output_matrix + + def maxPooling(self, input_matrix, pool_size=(2, 2), strides_pooling=(2, 2)): + input_height, input_width, input_channels = input_matrix.shape + pool_height, pool_width = pool_size + stride_height, stride_width = strides_pooling + + # Calculate output dimensions + pooled_height = (input_height - pool_height) // stride_height + 1 + pooled_width = (input_width - pool_width) // stride_width + 1 + + # Initialize output + pooled_matrix = np.zeros((pooled_height, pooled_width, input_channels)) + + # Perform max pooling + for c in range(input_channels): + for i in range(0, input_height - pool_height + 1, stride_height): + for j in range(0, input_width - pool_width + 1, stride_width): + patch = input_matrix[i:i + pool_height, j:j + pool_width, c] + pooled_matrix[i // stride_height, j // stride_width, c] = np.max(patch) + + return pooled_matrix + + def flatten(self, input_matrix): + return input_matrix.flatten() + + def dropout(self, input_matrix, dropout_rate=0.5): + assert 0 <= dropout_rate < 1, "Dropout rate must be in [0, 1)." + dropout_mask = np.random.binomial(1, 1 - dropout_rate, size=input_matrix.shape) + return input_matrix * dropout_mask +``` + +Run the below command to generate output with random input and filter matrices, depending on the given size. + +```python +input_shape = (5, 5) +channels = 1 +strides = (1, 1) +padding = 'valid' +filter_size = (3, 3) + +cnn_model = CNN() + +conv_output = cnn_model.convLayer(input_shape, channels, strides, padding, filter_size) +print("\nConvolution Output:\n", conv_output[:, :, 0]) + +pool_size = (2, 2) +strides_pooling = (1, 1) + +maxPool_output = cnn_model.maxPooling(conv_output, pool_size, strides_pooling) +print("\nMax Pooling Output:\n", maxPool_output[:, :, 0]) + +flattened_output = cnn_model.flatten(maxPool_output) +print("\nFlattened Output:\n", flattened_output) + +dropout_output = cnn_model.dropout(flattened_output, dropout_rate=0.3) +print("\nDropout Output:\n", dropout_output) +``` + +Feel free to play around with the parameters! diff --git a/contrib/machine-learning/kmeans.md b/contrib/machine-learning/kmeans.md new file mode 100644 index 00000000..52db92e5 --- /dev/null +++ b/contrib/machine-learning/kmeans.md @@ -0,0 +1,92 @@ +# K-Means Clustering +Unsupervised Learning Algorithm for Grouping Similar Data. + +## Introduction +K-means clustering is a fundamental unsupervised machine learning algorithm that excels at grouping similar data points together. It's a popular choice due to its simplicity and efficiency in uncovering hidden patterns within unlabeled datasets. + +## Unsupervised Learning +Unlike supervised learning algorithms that rely on labeled data for training, unsupervised algorithms, like K-means, operate solely on input data (without predefined categories). Their objective is to discover inherent structures or groupings within the data. + +## The K-Means Objective +Organize similar data points into clusters to unveil underlying patterns. The main objective is to minimize total intra-cluster variance or the squared function. + +![image](assets/knm.png) +## Clusters and Centroids +A cluster represents a collection of data points that share similar characteristics. K-means identifies a pre-determined number (k) of clusters within the dataset. Each cluster is represented by a centroid, which acts as its central point (imaginary or real). + +## Minimizing In-Cluster Variation +The K-means algorithm strategically assigns each data point to a cluster such that the total variation within each cluster (measured by the sum of squared distances between points and their centroid) is minimized. In simpler terms, K-means strives to create clusters where data points are close to their respective centroids. + +## The Meaning Behind "K-Means" +The "means" in K-means refers to the averaging process used to compute the centroid, essentially finding the center of each cluster. + +## K-Means Algorithm in Action +![image](assets/km_.png) +The K-means algorithm follows an iterative approach to optimize cluster formation: + +1. **Initial Centroid Placement:** The process begins with randomly selecting k centroids to serve as initial reference points for each cluster. +2. **Data Point Assignment:** Each data point is assigned to the closest centroid, effectively creating a preliminary clustering. +3. **Centroid Repositioning:** Once data points are assigned, the centroids are recalculated by averaging the positions of the points within their respective clusters. These new centroids represent the refined centers of the clusters. +4. **Iteration Until Convergence:** Steps 2 and 3 are repeated iteratively until a stopping criterion is met. This criterion can be either: + - **Centroid Stability:** No significant change occurs in the centroids' positions, indicating successful clustering. + - **Reaching Maximum Iterations:** A predefined number of iterations is completed. + +## Code +Following is a simple implementation of K-Means. + +```python +# Generate and Visualize Sample Data +# import the necessary Libraries + +import numpy as np +import matplotlib.pyplot as plt + +# Create data points for cluster 1 and cluster 2 +X = -2 * np.random.rand(100, 2) +X1 = 1 + 2 * np.random.rand(50, 2) + +# Combine data points from both clusters +X[50:100, :] = X1 + +# Plot data points and display the plot +plt.scatter(X[:, 0], X[:, 1], s=50, c='b') +plt.show() + +# K-Means Model Creation and Training +from sklearn.cluster import KMeans + +# Create KMeans object with 2 clusters +kmeans = KMeans(n_clusters=2) +kmeans.fit(X) # Train the model on the data + +# Visualize Data Points with Centroids +centroids = kmeans.cluster_centers_ # Get centroids (cluster centers) + +plt.scatter(X[:, 0], X[:, 1], s=50, c='b') # Plot data points again +plt.scatter(centroids[0, 0], centroids[0, 1], s=200, c='g', marker='s') # Plot centroid 1 +plt.scatter(centroids[1, 0], centroids[1, 1], s=200, c='r', marker='s') # Plot centroid 2 +plt.show() # Display the plot with centroids + +# Predict Cluster Label for New Data Point +new_data = np.array([-3.0, -3.0]) +new_data_reshaped = new_data.reshape(1, -1) +predicted_cluster = kmeans.predict(new_data_reshaped) +print("Predicted cluster for new data:", predicted_cluster) +``` + +### Output: +Before Implementing K-Means Clustering +![Before Implementing K-Means Clustering](assets/km_2.png) + +After Implementing K-Means Clustering +![After Implementing K-Means Clustering](assets/km_3.png) + +Predicted cluster for new data: `[0]` + +## Conclusion +**K-Means** can be applied to data that has a smaller number of dimensions, is numeric, and is continuous or can be used to find groups that have not been explicitly labeled in the data. As an example, it can be used for Document Classification, Delivery Store Optimization, or Customer Segmentation. + +## References + +- [Survey of Machine Learning and Data Mining Techniques used in Multimedia System](https://www.researchgate.net/publication/333457161_Survey_of_Machine_Learning_and_Data_Mining_Techniques_used_in_Multimedia_System?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6Il9kaXJlY3QiLCJwYWdlIjoiX2RpcmVjdCJ9fQ) +- [A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database](https://www.researchgate.net/publication/339267868_A_Clustering_Approach_for_Outliers_Detection_in_a_Big_Point-of-Sales_Database?_tp=eyJjb250ZXh0Ijp7ImZpcnN0UGFnZSI6Il9kaXJlY3QiLCJwYWdlIjoiX2RpcmVjdCJ9fQ) diff --git a/contrib/machine-learning/knn.md b/contrib/machine-learning/knn.md new file mode 100644 index 00000000..85578f3f --- /dev/null +++ b/contrib/machine-learning/knn.md @@ -0,0 +1,122 @@ +# K-Nearest Neighbors (KNN) Machine Learning Algorithm in Python + +## Introduction +K-Nearest Neighbors (KNN) is a simple, yet powerful, supervised machine learning algorithm used for both classification and regression tasks. It assumes that similar things exist in close proximity. In other words, similar data points are near to each other. + +## How KNN Works +KNN works by finding the distances between a query and all the examples in the data, selecting the specified number of examples (K) closest to the query, then voting for the most frequent label (in classification) or averaging the labels (in regression). + +### Steps: +1. **Choose the number K of neighbors** +2. **Calculate the distance** between the query-instance and all the training samples +3. **Sort the distances** and determine the nearest neighbors based on the K-th minimum distance +4. **Gather the labels** of the nearest neighbors +5. **Vote for the most frequent label** (in case of classification) or **average the labels** (in case of regression) + +## When to Use KNN +### Advantages: +- **Simple and easy to understand:** KNN is intuitive and easy to implement. +- **No training phase:** KNN is a lazy learner, meaning there is no explicit training phase. +- **Effective with a small dataset:** KNN performs well with a small number of input variables. + +### Disadvantages: +- **Computationally expensive:** The algorithm becomes significantly slower as the number of examples and/or predictors/independent variables increase. +- **Sensitive to irrelevant features:** All features contribute to the distance equally. +- **Memory-intensive:** Storing all the training data can be costly. + +### Use Cases: +- **Recommender Systems:** Suggest items based on similarity to user preferences. +- **Image Recognition:** Classify images by comparing new images to the training set. +- **Finance:** Predict credit risk or fraud detection based on historical data. + +## KNN in Python + +### Required Libraries +To implement KNN, we need the following Python libraries: +- `numpy` +- `pandas` +- `scikit-learn` +- `matplotlib` (for visualization) + +### Installation +```bash +pip install numpy pandas scikit-learn matplotlib +``` + +### Example Code +Let's implement a simple KNN classifier using the Iris dataset. + +#### Step 1: Import Libraries +```python +import numpy as np +import pandas as pd +from sklearn.model_selection import train_test_split +from sklearn.neighbors import KNeighborsClassifier +from sklearn.metrics import accuracy_score +import matplotlib.pyplot as plt +``` + +#### Step 2: Load Dataset +```python +from sklearn.datasets import load_iris +iris = load_iris() +X = iris.data +y = iris.target +``` + +#### Step 3: Split Dataset +```python +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) +``` + +#### Step 4: Train KNN Model +```python +knn = KNeighborsClassifier(n_neighbors=3) +knn.fit(X_train, y_train) +``` + +#### Step 5: Make Predictions +```python +y_pred = knn.predict(X_test) +``` + +#### Step 6: Evaluate the Model +```python +accuracy = accuracy_score(y_test, y_pred) +print(f'Accuracy: {accuracy}') +``` + +### Visualization (Optional) +```python +# Plotting the decision boundary for visualization (for 2D data) +h = .02 # step size in the mesh +# Create color maps +cmap_light = plt.cm.RdYlBu +cmap_bold = plt.cm.RdYlBu + +# For simplicity, we take only the first two features of the dataset +X_plot = X[:, :2] +x_min, x_max = X_plot[:, 0].min() - 1, X_plot[:, 0].max() + 1 +y_min, y_max = X_plot[:, 1].min() - 1, y_plot[:, 1].max() + 1 +xx, yy = np.meshgrid(np.arange(x_min, x_max, h), + np.arange(y_min, y_max, h)) + +Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]) +Z = Z.reshape(xx.shape) +plt.figure() +plt.pcolormesh(xx, yy, Z, cmap=cmap_light) + +# Plot also the training points +plt.scatter(X_plot[:, 0], X_plot[:, 1], c=y, edgecolor='k', cmap=cmap_bold) +plt.xlim(xx.min(), xx.max()) +plt.ylim(yy.min(), yy.max()) +plt.title("3-Class classification (k = 3)") +plt.show() +``` + +## Generalization and Considerations +- **Choosing K:** The choice of K is critical. Smaller values of K can lead to noisy models, while larger values make the algorithm computationally expensive and might oversimplify the model. +- **Feature Scaling:** Since KNN relies on distance calculations, features should be scaled (standardized or normalized) to ensure that all features contribute equally to the distance computation. +- **Distance Metrics:** The choice of distance metric (Euclidean, Manhattan, etc.) can affect the performance of the algorithm. + +In conclusion, KNN is a versatile and easy-to-implement algorithm suitable for various classification and regression tasks, particularly when working with small datasets and well-defined features. However, careful consideration should be given to the choice of K, feature scaling, and distance metrics to optimize its performance. diff --git a/contrib/machine-learning/logistic-regression.md b/contrib/machine-learning/logistic-regression.md new file mode 100644 index 00000000..2e45e984 --- /dev/null +++ b/contrib/machine-learning/logistic-regression.md @@ -0,0 +1,115 @@ +# Logistic Regression + +Logistic Regression is a statistical method used for binary classification problems. It is a type of regression analysis where the dependent variable is categorical. This README provides an overview of logistic regression, including its fundamental concepts, assumptions, and how to implement it using Python. + +## Table of Contents + +1. [Introduction](#introduction) +2. [Concepts](#concepts) +3. [Assumptions](#assumptions) +4. [Implementation](#implementation) + - [Using Scikit-learn](#using-scikit-learn) + - [Code Example](#code-example) +5. [Evaluation Metrics](#evaluation-metrics) +6. [Conclusion](#conclusion) +7. [References](#references) + +## Introduction + +Logistic Regression is used to model the probability of a binary outcome based on one or more predictor variables (features). It is widely used in various fields such as medical research, social sciences, and machine learning for tasks such as spam detection, fraud detection, and predicting user behavior. + +## Concepts + +### Sigmoid Function + +The logistic regression model uses the sigmoid function to map predicted values to probabilities. The sigmoid function is defined as: + +$$ +\sigma(z) = \frac{1}{1 + e^{-z}} +$$ + +Where \( z \) is a linear combination of the input features. + +### Odds and Log-Odds + +- **Odds**: The odds represent the ratio of the probability of an event occurring to the probability of it not occurring. + +$$\text{Odds} = \frac{P(Y=1)}{P(Y=0)}$$ + +- **Log-Odds**: The log-odds is the natural logarithm of the odds. + + $$\text{Log-Odds} = \log \left( \frac{P(Y=1)}{P(Y=0)} \right)$$ + +Logistic regression models the log-odds as a linear combination of the input features. + +### Model Equation + +The logistic regression model equation is: + +$$ +\log \left( \frac{P(Y=1)}{P(Y=0)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n +$$ + +Where: +- β₀ is the intercept. +- βi are the coefficients for the predictor variables Xi. + + +## Assumptions + +1. **Linearity**: The log-odds of the response variable are a linear combination of the predictor variables. +2. **Independence**: Observations should be independent of each other. +3. **No Multicollinearity**: Predictor variables should not be highly correlated with each other. +4. **Large Sample Size**: Logistic regression requires a large sample size to provide reliable results. + +## Implementation + +### Using Scikit-learn + +Scikit-learn is a popular machine learning library in Python that provides tools for logistic regression. + +### Code Example + +```python +import numpy as np +import pandas as pd +from sklearn.model_selection import train_test_split +from sklearn.linear_model import LogisticRegression +from sklearn.metrics import accuracy_score, confusion_matrix, classification_report + +# Load dataset +data = pd.read_csv('path/to/your/dataset.csv') + +# Define features and target variable +X = data[['feature1', 'feature2', 'feature3']] +y = data['target'] + +# Split data into training and testing sets +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) + +# Initialize and train logistic regression model +model = LogisticRegression() +model.fit(X_train, y_train) + +# Make predictions +y_pred = model.predict(X_test) + +# Evaluate the model +accuracy = accuracy_score(y_test, y_pred) +conf_matrix = confusion_matrix(y_test, y_pred) +class_report = classification_report(y_test, y_pred) + +print("Accuracy:", accuracy) +print("Confusion Matrix:\n", conf_matrix) +print("Classification Report:\n", class_report) +``` + +## Evaluation Metrics + +- **Accuracy**: The proportion of correctly classified instances among all instances. +- **Confusion Matrix**: A table showing the number of true positives, true negatives, false positives, and false negatives. +- **Precision, Recall, and F1-Score**: Metrics to evaluate the performance of the classification model. + +## Conclusion + +Logistic regression is a fundamental classification technique that is easy to implement and interpret. It is a powerful tool for binary classification problems and provides a probabilistic framework for predicting binary outcomes. diff --git a/contrib/machine-learning/naive-bayes.md b/contrib/machine-learning/naive-bayes.md new file mode 100644 index 00000000..4bf0f04c --- /dev/null +++ b/contrib/machine-learning/naive-bayes.md @@ -0,0 +1,328 @@ +# Naive Bayes + +## Introduction + +The Naive Bayes model uses probabilities to predict an outcome.It is a supervised machine learning technique, i.e. it reqires labelled data for training. It is used for classification and is based on the Bayes' Theorem. The basic assumption of this model is the independence among the features, i.e. a feature is unaffected by any other feture. + +## Bayes' Theorem + +Bayes' theorem is given by: + +$$ +P(a|b) = \frac{P(b|a)*P(a)}{P(b)} +$$ + +where: +- $P(a|b)$ is the posterior probability, i.e. probability of 'a' given that 'b' is true, +- $P(b|a)$ is the likelihood probability i.e. probability of 'b' given that 'a' is true, +- $P(a)$ and $P(b)$ are the probabilities of 'a' and 'b' respectively, independent of each other. + + +## Applications + +Naive Bayes classifier has numerous applications including : + 1. Text classification. + 2. Sentiment analysis. + 3. Spam filtering. + 4. Multiclass classification (eg. Weather prediction). + 5. Recommendation Systems. + 6. Healthcare sector. + 7. Document categorization. + + +## Advantages + + 1. Easy to implement. + 2. Useful even if training dataset is limited (where a decision tree would not be recommended). + 3. Supports multiclass classification which is not supported by some machine learning algorithms like SVM and logistic regression. + 4. Scalable, fast and efficient. + +## Disadvantages + + 1. Assumes features to be independent, which may not be true in certain scenarios. + 2. Zero probability error. + 3. Sensitive to noise. + +## Zero Probability Error + + Zero probability error is said to occur if in some case the number of occurances of an event given another event is zero. + To handle zero probability error, Laplace's correction is used by adding a small constant . + +**Example:** + + +Given the data below, find whether tennis can be played if ( outlook=overcast, wind=weak ). + +**Data** + +--- +| SNo | Outlook (A) | Wind (B) | PlayTennis (R) | +|-----|--------------|------------|-------------------| +| 1 | Rain | Weak | No | +| 2 | Rain | Strong | No | +| 3 | Overcast | Weak | Yes | +| 4 | Rain | Weak | Yes | +| 5 | Overcast | Weak | Yes | +| 6 | Rain | Strong | No | +| 7 | Overcast | Strong | Yes | +| 8 | Rain | Weak | No | +| 9 | Overcast | Weak | Yes | +| 10 | Rain | Weak | Yes | +--- + +- **Calculate prior probabilities** + +$$ + P(Yes) = \frac{6}{10} = 0.6 +$$ +$$ + P(No) = \frac{4}{10} = 0.4 +$$ + +- **Calculate likelihoods** + + 1.**Outlook (A):** + + --- + | A\R | Yes | No | + |-----------|-------|-----| + | Rain | 2 | 4 | + | Overcast | 4 | 0 | + | Total | 6 | 4 | + --- + +- Rain: + +$$P(Rain|Yes) = \frac{2}{6}$$ + +$$P(Rain|No) = \frac{4}{4}$$ + +- Overcast: + +$$ + P(Overcast|Yes) = \frac{4}{6} +$$ +$$ + P(Overcast|No) = \frac{0}{4} +$$ + + +Here, we can see that P(Overcast|No) = 0 +This is a zero probability error! + +Since probability is 0, naive bayes model fails to predict. + + **Applying Laplace's correction:** + + In Laplace's correction, we scale the values for 1000 instances. + - **Calculate prior probabilities** + + $$P(Yes) = \frac{600}{1002}$$ + + $$P(No) = \frac{402}{1002}$$ + +- **Calculate likelihoods** + + 1. **Outlook (A):** + + + ( Converted to 1000 instances ) + + We will add 1 instance each to the (PlayTennis|No) column {Laplace's correction} + + --- + | A\R | Yes | No | + |-----------|-------|---------------| + | Rain | 200 | (400+1)=401 | + | Overcast | 400 | (0+1)=1 | + | Total | 600 | 402 | + --- + + - **Rain:** + + $$P(Rain|Yes) = \frac{200}{600}$$ + $$P(Rain|No) = \frac{401}{402}$$ + + - **Overcast:** + + $$P(Overcast|Yes) = \frac{400}{600}$$ + $$P(Overcast|No) = \frac{1}{402}$$ + + + 2. **Wind (B):** + + + --- + | B\R | Yes | No | + |-----------|---------|-------| + | Weak | 500 | 200 | + | Strong | 100 | 200 | + | Total | 600 | 400 | + --- + + - **Weak:** + + $$P(Weak|Yes) = \frac{500}{600}$$ + $$P(Weak|No) = \frac{200}{400}$$ + + - **Strong:** + + $$P(Strong|Yes) = \frac{100}{600}$$ + $$P(Strong|No) = \frac{200}{400}$$ + + - **Calculting probabilities:** + + $$P(PlayTennis|Yes) = P(Yes) * P(Overcast|Yes) * P(Weak|Yes)$$ + $$= \frac{600}{1002} * \frac{400}{600} * \frac{500}{600}$$ + $$= 0.3326$$ + + $$P(PlayTennis|No) = P(No) * P(Overcast|No) * P(Weak|No)$$ + $$= \frac{402}{1002} * \frac{1}{402} * \frac{200}{400}$$ + $$= 0.000499 = 0.0005$$ + + +Since , +$$P(PlayTennis|Yes) > P(PlayTennis|No)$$ +we can conclude that tennis can be played if outlook is overcast and wind is weak. + + +# Types of Naive Bayes classifier + + +## Guassian Naive Bayes + + It is used when the dataset has **continuous data**. It assumes that the data is distributed normally (also known as guassian distribution). + A guassian distribution can be characterized by a bell-shaped curve. + + **Continuous data features :** Features which can take any real values within a certain range. These features have an infinite number of possible values.They are generally measured, not counted. + eg. weight, height, temperature, etc. + + **Code** + + ```python + +#import libraries +import pandas as pd +from sklearn.model_selection import train_test_split +from sklearn.naive_bayes import GaussianNB +from sklearn import metrics +from sklearn.metrics import confusion_matrix + +#read data +d=pd.read_csv("data.csv") +df=pd.DataFrame(d) + +X = df.iloc[:,1:7:1] +y = df.iloc[:,7:8:1] + +# splitting X and y into training and testing sets +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42) + + +# training the model on training set +obj = GaussianNB() +obj.fit(X_train, y_train) + +#making predictions on the testing set +y_pred = obj.predict(X_train) + +#comparing y_test and y_pred +print("Gaussian Naive Bayes model accuracy:", metrics.accuracy_score(y_train, y_pred)) +print("Confusion matrix: \n",confusion_matrix(y_train,y_pred)) + + ``` + + +## Multinomial Naive Bayes + + Appropriate when the features are categorical or countable. It models the likelihood of each feature as a multinomial distribution. + Multinomial distribution is used to find probabilities of each category, given multiple categories (eg. Text classification). + + **Code** + + ```python + +#import libraries +import pandas as pd +from sklearn.model_selection import train_test_split +from sklearn.naive_bayes import MultinomialNB +from sklearn import metrics +from sklearn.metrics import confusion_matrix + +#read data +d=pd.read_csv("data.csv") +df=pd.DataFrame(d) + +X = df.iloc[:,1:7:1] +y = df.iloc[:,7:8:1] + +# splitting X and y into training and testing sets +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42) + + +# training the model on training set +obj = MultinomialNB() +obj.fit(X_train, y_train) + +#making predictions on the testing set +y_pred = obj.predict(X_train) + +#comparing y_test and y_pred +print("Gaussian Naive Bayes model accuracy:", metrics.accuracy_score(y_train, y_pred)) +print("Confusion matrix: \n",confusion_matrix(y_train,y_pred)) + + + ``` + +## Bernoulli Naive Bayes + + It is specifically designed for binary features (eg. Yes or No). It models the likelihood of each feature as a Bernoulli distribution. + Bernoulli distribution is used when there are only two possible outcomes (eg. success or failure of an event). + + **Code** + + ```python + +#import libraries +import pandas as pd +from sklearn.model_selection import train_test_split +from sklearn.naive_bayes import BernoulliNB +from sklearn import metrics +from sklearn.metrics import confusion_matrix + +#read data +d=pd.read_csv("data.csv") +df=pd.DataFrame(d) + +X = df.iloc[:,1:7:1] +y = df.iloc[:,7:8:1] + +# splitting X and y into training and testing sets +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42) + + +# training the model on training set +obj = BernoulliNB() +obj.fit(X_train, y_train) + +#making predictions on the testing set +y_pred = obj.predict(X_train) + +#comparing y_test and y_pred +print("Gaussian Naive Bayes model accuracy:", metrics.accuracy_score(y_train, y_pred)) +print("Confusion matrix: \n",confusion_matrix(y_train,y_pred)) + + ``` + + +## Evaluation + + 1. Confusion matrix. + 2. Accuracy. + 3. ROC curve. + + +## Conclusion + + We can conclude that naive bayes may limit in some cases due to the assumption that the features are independent of each other but still reliable in many cases. Naive Bayes is an efficient classifier and works even on small datasets. + diff --git a/contrib/machine-learning/neural-network-regression.md b/contrib/machine-learning/neural-network-regression.md new file mode 100644 index 00000000..aa16bc6c --- /dev/null +++ b/contrib/machine-learning/neural-network-regression.md @@ -0,0 +1,84 @@ +# Neural Network Regression in Python using Scikit-learn + +## Overview + +Neural Network Regression is used to predict continuous values based on input features. Scikit-learn provides an easy-to-use interface for implementing neural network models, specifically through the `MLPRegressor` class, which stands for Multi-Layer Perceptron Regressor. + +## When to Use Neural Network Regression + +### Suitable Scenarios + +1. **Complex Relationships**: Ideal when the relationship between features and the target variable is complex and non-linear. +2. **Sufficient Data**: Works well with large datasets that can support training deep learning models. +3. **Feature Extraction**: Useful in cases where the neural network's feature extraction capabilities can be leveraged, such as with image or text data. + +### Unsuitable Scenarios + +1. **Small Datasets**: Less effective with small datasets due to overfitting and inability to learn complex patterns. +2. **Low-latency Predictions**: Might not be suitable for real-time applications with strict latency requirements. +3. **Interpretability**: Not ideal when model interpretability is crucial, as neural networks are often seen as "black-box" models. + +## Implementing Neural Network Regression in Python with Scikit-learn + +### Step-by-Step Implementation + +1. **Import Libraries** + +```python +import numpy as np +import pandas as pd +from sklearn.model_selection import train_test_split +from sklearn.preprocessing import StandardScaler +from sklearn.neural_network import MLPRegressor +from sklearn.metrics import mean_absolute_error +``` + +2. **Load and Prepare Data** + +For illustration, let's use a synthetic dataset. + +```python +# Generate synthetic data +np.random.seed(42) +X = np.random.rand(1000, 3) +y = X[:, 0] * 3 + X[:, 1] * -2 + X[:, 2] * 0.5 + np.random.randn(1000) * 0.1 + +# Split the data +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) + +# Standardize the data +scaler = StandardScaler() +X_train = scaler.fit_transform(X_train) +X_test = scaler.transform(X_test) +``` + +3. **Build and Train the Neural Network Model** + +```python +# Create the MLPRegressor model +mlp = MLPRegressor(hidden_layer_sizes=(64, 64), activation='relu', solver='adam', max_iter=500, random_state=42) + +# Train the model +mlp.fit(X_train, y_train) +``` + +4. **Evaluate the Model** + +```python +# Make predictions +y_pred = mlp.predict(X_test) + +# Calculate the Mean Absolute Error +mae = mean_absolute_error(y_test, y_pred) +print(f"Test Mean Absolute Error: {mae}") +``` + +### Explanation + +- **Data Generation and Preparation**: Synthetic data is created, split into training and test sets, and standardized to improve the efficiency of the neural network training process. +- **Model Construction and Training**: An `MLPRegressor` is created with two hidden layers, each containing 64 neurons and ReLU activation functions. The model is trained using the Adam optimizer for a maximum of 500 iterations. +- **Evaluation**: The model's performance is evaluated on the test set using Mean Absolute Error (MAE) as the performance metric. + +## Conclusion + +Neural Network Regression with Scikit-learn's `MLPRegressor` is a powerful method for predicting continuous values in complex, non-linear scenarios. However, it's essential to ensure that you have enough data to train the model effectively and consider the computational resources required. Simpler models may be more appropriate for small datasets or when model interpretability is necessary. By following the steps outlined, you can build, train, and evaluate a neural network for regression tasks in Python using Scikit-learn. diff --git a/contrib/machine-learning/polynomial-regression.md b/contrib/machine-learning/polynomial-regression.md new file mode 100644 index 00000000..d00ede3b --- /dev/null +++ b/contrib/machine-learning/polynomial-regression.md @@ -0,0 +1,102 @@ +# Polynomial Regression + +Polynomial Regression is a form of regression analysis in which the relationship between the independent variable $x$ and the dependent variable $y$ is modeled as an $nth$ degree polynomial. This guide provides an overview of polynomial regression, including its fundamental concepts, assumptions, and how to implement it using Python. + +## Introduction + +Polynomial Regression is used when the data shows a non-linear relationship between the independent variable $x$ and the dependent variable $y$ is modeled as an $nth$ degree polynomial. It extends the simple linear regression model by considering polynomial terms of the independent variable, allowing for a more flexible fit to the data. + +## Concepts + +### Polynomial Equation + +The polynomial regression model is based on the following polynomial equation: + +$$ +\[ y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \cdots + \beta_n x^n + \epsilon \] +$$ + +Where: +- $y$ is the dependent variable. +- $x$ is the independent variable. +- $\beta_0, \beta_1, \ldots, \beta_n$ are the coefficients of the polynomial. +- $\epsilon$ is the error term. + +### Degree of Polynomial + +The degree of the polynomial (n) determines the flexibility of the model. A higher degree allows the model to fit more complex, non-linear relationships, but it also increases the risk of overfitting. + +### Overfitting and Underfitting + +- **Overfitting**: When the model fits the noise in the training data too closely, resulting in poor generalization to new data. +- **Underfitting**: When the model is too simple to capture the underlying pattern in the data. + +## Assumptions + +1. **Independence**: Observations are independent of each other. +2. **Homoscedasticity**: The variance of the residuals (errors) is constant across all levels of the independent variable. +3. **Normality**: The residuals of the model are normally distributed. +4. **No Multicollinearity**: The predictor variables are not highly correlated with each other. + +## Implementation + +### Using Scikit-learn + +Scikit-learn is a popular machine learning library in Python that provides tools for polynomial regression. + +### Code Example + +```python +import numpy as np +import pandas as pd +import matplotlib.pyplot as plt +from sklearn.preprocessing import PolynomialFeatures +from sklearn.linear_model import LinearRegression +from sklearn.metrics import mean_squared_error, r2_score + +# Load dataset +data = pd.read_csv('path/to/your/dataset.csv') + +# Define features and target variable +X = data[['feature']] +y = data['target'] + +# Transform features to polynomial features +poly = PolynomialFeatures(degree=3) +X_poly = poly.fit_transform(X) + +# Initialize and train polynomial regression model +model = LinearRegression() +model.fit(X_poly, y) + +# Make predictions +y_pred = model.predict(X_poly) + +# Evaluate the model +mse = mean_squared_error(y, y_pred) +r2 = r2_score(y, y_pred) +print("Mean Squared Error:", mse) +print("R^2 Score:", r2) + +# Visualize the results +plt.scatter(X, y, color='blue') +plt.plot(X, y_pred, color='red') +plt.xlabel('Feature') +plt.ylabel('Target') +plt.title('Polynomial Regression') +plt.show() +``` + +## Evaluation Metrics + +- **Mean Squared Error (MSE)**: The average of the squared differences between actual and predicted values. +- **R-squared (R²) Score**: A statistical measure that represents the proportion of the variance for the dependent variable that is explained by the independent variables in the model. + +## Conclusion + +Polynomial Regression is a powerful tool for modeling non-linear relationships between variables. It is important to choose the degree of the polynomial carefully to balance between underfitting and overfitting. Understanding and properly evaluating the model using appropriate metrics ensures its effectiveness. + +## References + +- [Scikit-learn Documentation](https://scikit-learn.org/stable/modules/linear_model.html#polynomial-regression) +- [Wikipedia: Polynomial Regression](https://en.wikipedia.org/wiki/Polynomial_reg) diff --git a/contrib/machine-learning/pytorch-fundamentals.md b/contrib/machine-learning/pytorch-fundamentals.md new file mode 100644 index 00000000..b244ec1f --- /dev/null +++ b/contrib/machine-learning/pytorch-fundamentals.md @@ -0,0 +1,469 @@ +# PyTorch Fundamentals + + +```python +# Import pytorch in our codespace +import torch +print(torch.__version__) +``` + +#### Output +``` +2.3.0+cu121 +``` + + +2.3.0 is the pytorch version and 121 is the cuda version + +Now you have already seen how to create a tensor in pytorch. In this notebook i am going to show you the operations which can be applied on a tensor with a quick previous revision. + +### 1. Creating tensors + +Scalar tensor ( a zero dimension tensor) + +```python +scalar = torch.tensor(7) +print(scalar) +``` + +#### Output +``` +tensor(7) +``` + +Check the dimension of the above tensor + +```python +print(scalar.ndim) +``` + +#### Output +``` +0 +``` + +To retrieve the number from the tensor we use `item()` + +```python +print(scalar.item()) +``` + +#### Output +``` +7 +``` + +Vector (It is a single dimension tensor but contain many numbers) + +```python +vector = torch.tensor([1,2]) +print(vector) +``` + +#### Output +``` +tensor([1, 2]) +``` + +Check the dimensions + +```python +print(vector.ndim) +``` + +#### Output +``` +1 +``` + +Check the shape of the vector + +```python +print(vector.shape) +``` + +#### Output +``` +torch.Size([2]) +``` + + +The above returns torch.Size([2]) which means our vector has a shape of [2]. This is because of the two elements we placed inside the square brackets ([1,2]) + +Note: +I'll let you in on a trick. + +You can tell the number of dimensions a tensor in PyTorch has by the number of square brackets on the outside ([) and you only need to count one side. + + +```python +# Let's create a matrix +MATRIX = torch.tensor([[1,2], + [4,5]]) +print(MATRIX) +``` + +#### Output +``` +tensor([[1, 2], + [4, 5]]) +``` + +There are two brackets so it must be 2 dimensions , lets check + + +```python +print(MATRIX.ndim) +``` + +#### Output +``` +2 +``` + + +```python +# Shape +print(MATRIX.shape) +``` + +#### Output +``` +torch.Size([2, 2]) +``` + +It means MATRIX has 2 rows and 2 columns. + +Let's create a TENSOR + +```python +TENSOR = torch.tensor([[[1,2,3], + [4,5,6], + [7,8,9]]]) +print(TENSOR) +``` + +#### Output +``` +tensor([[[1, 2, 3], + [4, 5, 6], + [7, 8, 9]]]) +``` + +Let's check the dimensions +```python +print(TENSOR.ndim) +``` + +#### Output +``` +3 +``` + +shape +```python +print(TENSOR.shape) +``` + +#### Output +``` +torch.Size([1, 3, 3]) +``` + +The dimensions go outer to inner. + +That means there's 1 dimension of 3 by 3. + +##### Let's summarise + +* scalar -> a single number having 0 dimension. +* vector -> have many numbers but having 1 dimension. +* matrix -> a array of numbers having 2 dimensions. +* tensor -> a array of numbers having n dimensions. + +### Random Tensors + +We can create them using `torch.rand()` and passing in the `size` parameter. + +Creating a random tensor of size (3,4) +```python +rand_tensor = torch.rand(size = (3,4)) +print(rand_tensor) +``` + +#### Output +``` +tensor([[0.7462, 0.4950, 0.7851, 0.8277], + [0.6112, 0.5159, 0.1728, 0.6847], + [0.4472, 0.1612, 0.6481, 0.3236]]) +``` + +Check the dimensions + +```python +print(rand_tensor.ndim) +``` + +#### Output +``` +2 +``` + +Shape +```python +print(rand_tensor.shape) +``` + +#### Output +``` +torch.Size([3, 4]) +``` + +Datatype +```python +print(rand_tensor.dtype) +``` + +#### Output +``` +torch.float32 +``` + +### Zeros and ones + +Here we will create a tensor of any shape filled with zeros and ones + + +```python +# Create a tensor of all zeros +zeros = torch.zeros(size = (3,4)) +print(zeros) +``` + +#### Output +``` +tensor([[0., 0., 0., 0.], + [0., 0., 0., 0.], + [0., 0., 0., 0.]]) +``` + +Create a tensor of ones +```python +ones = torch.ones(size = (3,4)) +print(ones) +``` + +#### Output +``` +tensor([[1., 1., 1., 1.], + [1., 1., 1., 1.], + [1., 1., 1., 1.]]) +``` + +### Create a tensor having range of numbers + +You can use `torch.arange(start, end, step)` to do so. + +Where: + +* start = start of range (e.g. 0) +* end = end of range (e.g. 10) +* step = how many steps in between each value (e.g. 1) + +> Note: In Python, you can use range() to create a range. However in PyTorch, torch.range() is deprecated show error, show use `torch.arange()` + + +```python +zero_to_ten = torch.arange(start = 0, + end = 10, + step = 1) +print(zero_to_ten) +``` + +#### Output +``` +tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) +``` + +# 2. Manipulating tensors (tensor operations) + +The operations are : + +* Addition +* Substraction +* Multiplication (element-wise) +* Division +* Matrix multiplication + +### 1. Addition + + +```python +tensor = torch.tensor([1,2,3]) +print(tensor+10) +``` + +#### Output +``` +tensor([11, 12, 13]) +``` + +We have add 10 to each tensor element. + + +```python +tensor1 = torch.tensor([4,5,6]) +print(tensor+tensor1) +``` + +#### Output +``` +tensor([5, 7, 9]) +``` + +We have added two tensors , remember that addition takes place element wise. + +### 2. Subtraction + + +```python +print(tensor-8) +``` + +#### Output +``` +tensor([-7, -6, -5]) +``` + +We've subtracted 8 from the above tensor. + + +```python +print(tensor-tensor1) +``` + +#### Output +``` +tensor([-3, -3, -3]) +``` + +### 3. Multiplication + + +```python +# Multiply the tensor with 10 (element wise) +print(tensor*10) +``` + +#### Output +``` +tensor([10, 20, 30]) +``` + +Each element of tensor gets multiplied by 10. + +Note: + +PyTorch also has a bunch of built-in functions like `torch.mul()` (short for multiplication) and `torch.add()` to perform basic operations. + + +```python +# let's see them +print(torch.add(tensor,10)) +``` + +#### Output +``` +tensor([11, 12, 13]) +``` + + +```python +print(torch.mul(tensor,10)) +``` + +#### Output +``` +tensor([10, 20, 30]) +``` + +### Matrix multiplication (is all you need) +One of the most common operations in machine learning and deep learning algorithms (like neural networks) is matrix multiplication. + +PyTorch implements matrix multiplication functionality in the `torch.matmul()` method. + +The main two rules for matrix multiplication to remember are: + +The inner dimensions must match: +* (3, 2) @ (3, 2) won't work +* (2, 3) @ (3, 2) will work +* (3, 2) @ (2, 3) will work +The resulting matrix has the shape of the outer dimensions: +* (2, 3) @ (3, 2) -> (2, 2) +* (3, 2) @ (2, 3) -> (3, 3) + + +Note: "@" in Python is the symbol for matrix multiplication. + + +```python +# let's perform the matrix multiplication +tensor1 = torch.tensor([[[1,2,3], + [4,5,6], + [7,8,9]]]) +tensor2 = torch.tensor([[[1,1,1], + [2,2,2], + [3,3,3]]]) + +print(tensor1) , print(tensor2) + +``` + +#### Output +``` +tensor([[[1, 2, 3], + [4, 5, 6], + [7, 8, 9]]]) +tensor([[[1, 1, 1], + [2, 2, 2], + [3, 3, 3]]]) +``` + +Let's check the shape +```python +print(tensor1.shape) , print(tensor2.shape) +``` + +#### Output +``` +torch.Size([1, 3, 3]) +torch.Size([1, 3, 3]) +``` + +Matrix multiplication +```python +print(torch.matmul(tensor1, tensor2)) +``` + +#### Output +``` +tensor([[[14, 14, 14], + [32, 32, 32], + [50, 50, 50]]]) +``` + +Can also use the "@" symbol for matrix multiplication, though not recommended +```python +print(tensor1 @ tensor2) +``` + +#### Output +``` +tensor([[[14, 14, 14], + [32, 32, 32], + [50, 50, 50]]]) +``` + +Note: + +If shape is not perfect you can transpose the tensor and perform the matrix multiplication. diff --git a/contrib/machine-learning/pytorch.md b/contrib/machine-learning/pytorch.md new file mode 100644 index 00000000..ccfee216 --- /dev/null +++ b/contrib/machine-learning/pytorch.md @@ -0,0 +1,113 @@ +# PyTorch: A Comprehensive Overview + +## Introduction +PyTorch is an open-source deep learning framework developed by Facebook's AI Research lab. It provides a flexible and efficient platform for building and deploying machine learning models. PyTorch is known for its dynamic computational graph, ease of use, and strong support for GPU acceleration. + +## Key Features +- **Dynamic Computational Graphs**: PyTorch's dynamic computation graph (or define-by-run) allows you to change the network architecture during runtime. This feature makes debugging and experimenting with different model architectures easier. +- **GPU Acceleration**: PyTorch supports CUDA, enabling efficient computation on GPUs. +- **Extensive Libraries and Tools**: PyTorch has a rich ecosystem of libraries and tools such as torchvision for computer vision, torchtext for natural language processing, and more. +- **Community Support**: PyTorch has a large and active community, providing extensive resources, tutorials, and forums for support. + +## Installation +To install PyTorch, you can use pip: + +```sh +pip install torch torchvision +``` + +For detailed installation instructions, including GPU support, visit the [official PyTorch installation guide](https://pytorch.org/get-started/locally/). + +## Basic Usage + +### Tensors +Tensors are the fundamental building blocks in PyTorch. They are similar to NumPy arrays but can run on GPUs. + +```python +import torch + +# Creating a tensor +x = torch.tensor([1.0, 2.0, 3.0]) +print(x) + +# Performing basic operations +y = torch.tensor([4.0, 5.0, 6.0]) +z = x + y +print(z) +``` + +### Autograd +Autograd is PyTorch's automatic differentiation engine that powers neural network training. It tracks operations on tensors to automatically compute gradients. + +```python +# Requires gradient +x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True) + +# Perform operations +y = x ** 2 +z = y.sum() + +# Compute gradients +z.backward() +print(x.grad) +``` + +### Building Neural Networks +PyTorch provides the `torch.nn` module to build neural networks. + +```python +import torch +import torch.nn as nn +import torch.optim as optim + +# Define a simple neural network +class SimpleNN(nn.Module): + def __init__(self): + super(SimpleNN, self).__init__() + self.fc1 = nn.Linear(3, 1) + + def forward(self, x): + x = self.fc1(x) + return x + +# Create the network, define the criterion and optimizer +model = SimpleNN() +criterion = nn.MSELoss() +optimizer = optim.SGD(model.parameters(), lr=0.01) + +# Dummy input and target +inputs = torch.tensor([[1.0, 2.0, 3.0]]) +targets = torch.tensor([[0.5]]) + +# Forward pass +outputs = model(inputs) +loss = criterion(outputs, targets) + +# Backward pass and optimization +loss.backward() +optimizer.step() + +print(f'Loss: {loss.item()}') +``` + +## When to Use PyTorch +### Use PyTorch When: +1. **Research and Development**: PyTorch's dynamic computation graph makes it ideal for experimentation and prototyping. +2. **Computer Vision and NLP**: With extensive libraries like torchvision and torchtext, PyTorch is well-suited for these domains. +3. **Custom Operations**: If your work involves custom layers or operations, PyTorch provides the flexibility to implement and integrate them easily. +4. **Community and Ecosystem**: If you prefer a strong community support and extensive third-party resources, PyTorch is a good choice. + +### Consider Alternatives When: +1. **Production Deployment**: While PyTorch has made strides in deployment (e.g., TorchServe), TensorFlow's TensorFlow Serving is more mature for large-scale deployment. +2. **Static Graphs**: If your model architecture doesn't change frequently and you prefer static computation graphs, TensorFlow might be more suitable. +3. **Multi-Language Support**: If you need integration with languages other than Python (e.g., Java, JavaScript), TensorFlow offers better support. + +## Conclusion +PyTorch is a powerful and flexible deep learning framework that caters to both researchers and practitioners. Its ease of use, dynamic computation graph, and strong community support make it an excellent choice for many machine learning tasks. However, for certain production scenarios or specific requirements, alternatives like TensorFlow may be more appropriate. + +## Additional Resources +- [PyTorch Official Documentation](https://pytorch.org/docs/stable/index.html) +- [PyTorch Tutorials](https://pytorch.org/tutorials/) +- [PyTorch Forum](https://discuss.pytorch.org/) + +Feel free to explore and experiment with PyTorch to harness the full potential of this versatile framework! diff --git a/contrib/machine-learning/random-forest.md b/contrib/machine-learning/random-forest.md new file mode 100644 index 00000000..feaaa7a7 --- /dev/null +++ b/contrib/machine-learning/random-forest.md @@ -0,0 +1,171 @@ +# Random Forest + +Random Forest is a versatile machine learning algorithm capable of performing both regression and classification tasks. It is an ensemble method that operates by constructing a multitude of decision trees during training and outputting the average prediction of the individual trees (for regression) or the mode of the classes (for classification). + +## Introduction +Random Forest is an ensemble learning method used for classification and regression tasks. It is built from multiple decision trees and combines their outputs to improve the model's accuracy and control over-fitting. + +## How Random Forest Works +### 1. Bootstrap Sampling: +* Random subsets of the training dataset are created with replacement. Each subset is used to train an individual tree. +### 2. Decision Trees: +* Multiple decision trees are trained on these subsets. +### 3. Feature Selection: +* At each split in the decision tree, a random selection of features is chosen. This randomness helps create diverse trees. +### 4. Voting/Averaging: +For classification, the mode of the classes predicted by individual trees is taken (majority vote). +For regression, the average of the outputs of the individual trees is taken. +### Detailed Working Mechanism +#### Step 1: Bootstrap Sampling: + Each tree is trained on a random sample of the original data, drawn with replacement (bootstrap sample). This means some data points may appear multiple times in a sample while others may not appear at all. +#### Step 2: Tree Construction: + Each node in the tree is split using the best split among a random subset of the features. This process adds an additional layer of randomness, contributing to the robustness of the model. +#### Step 3: Aggregation: + For classification tasks, the final prediction is based on the majority vote from all the trees. For regression tasks, the final prediction is the average of all the tree predictions. +### Advantages and Disadvantages +#### Advantages +* Robustness: Reduces overfitting and generalizes well due to the law of large numbers. +* Accuracy: Often provides high accuracy because of the ensemble method. +* Versatility: Can be used for both classification and regression tasks. +* Handles Missing Values: Can handle missing data better than many other algorithms. +* Feature Importance: Provides estimates of feature importance, which can be valuable for understanding the model. +#### Disadvantages +* Complexity: More complex than individual decision trees, making interpretation difficult. +* Computational Cost: Requires more computational resources due to multiple trees. +* Training Time: Can be slow to train compared to simpler models, especially with large datasets. +### Hyperparameters +#### Key Hyperparameters +* n_estimators: The number of trees in the forest. +* max_features: The number of features to consider when looking for the best split. +* max_depth: The maximum depth of the tree. +* min_samples_split: The minimum number of samples required to split an internal node. +* min_samples_leaf: The minimum number of samples required to be at a leaf node. +* bootstrap: Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree. +##### Tuning Hyperparameters +Hyperparameter tuning can significantly improve the performance of a Random Forest model. Common techniques include Grid Search and Random Search. + +### Code Examples +#### Classification Example +Below is a simple example of using Random Forest for a classification task with the Iris dataset. + +```python +import numpy as np +import pandas as pd +from sklearn.datasets import load_iris +from sklearn.ensemble import RandomForestClassifier +from sklearn.model_selection import train_test_split +from sklearn.metrics import accuracy_score, classification_report + + +# Load dataset +iris = load_iris() +X, y = iris.data, iris.target + +# Split dataset +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) + +# Initialize Random Forest model +clf = RandomForestClassifier(n_estimators=100, random_state=42) + +# Train the model +clf.fit(X_train, y_train) + +# Make predictions +y_pred = clf.predict(X_test) + +# Evaluate the model +accuracy = accuracy_score(y_test, y_pred) +print(f"Accuracy: {accuracy * 100:.2f}%") +print("Classification Report:\n", classification_report(y_test, y_pred)) + +``` + +#### Feature Importance +Random Forest provides a way to measure the importance of each feature in making predictions. + + +```python +import matplotlib.pyplot as plt + +# Get feature importances +importances = clf.feature_importances_ +indices = np.argsort(importances)[::-1] + +# Print feature ranking +print("Feature ranking:") +for f in range(X.shape[1]): + print(f"{f + 1}. Feature {indices[f]} ({importances[indices[f]]})") + +# Plot the feature importances +plt.figure() +plt.title("Feature importances") +plt.bar(range(X.shape[1]), importances[indices], align='center') +plt.xticks(range(X.shape[1]), indices) +plt.xlim([-1, X.shape[1]]) +plt.show() +``` +#### Hyperparameter Tuning +Using Grid Search for hyperparameter tuning. + +```python +from sklearn.model_selection import GridSearchCV + +# Define the parameter grid +param_grid = { + 'n_estimators': [100, 200, 300], + 'max_features': ['auto', 'sqrt', 'log2'], + 'max_depth': [4, 6, 8, 10, 12], + 'criterion': ['gini', 'entropy'] +} + +# Initialize the Grid Search model +grid_search = GridSearchCV(estimator=clf, param_grid=param_grid, cv=3, n_jobs=-1, verbose=2) + +# Fit the model +grid_search.fit(X_train, y_train) + +# Print the best parameters +print("Best parameters found: ", grid_search.best_params_) +``` +#### Regression Example +Below is a simple example of using Random Forest for a regression task with the Boston housing dataset. + +```python +import numpy as np +import pandas as pd +from sklearn.datasets import load_boston +from sklearn.ensemble import RandomForestRegressor +from sklearn.model_selection import train_test_split +from sklearn.metrics import mean_squared_error, r2_score + +# Load dataset +boston = load_boston() +X, y = boston.data, boston.target + +# Split dataset +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) + +# Initialize Random Forest model +regr = RandomForestRegressor(n_estimators=100, random_state=42) + +# Train the model +regr.fit(X_train, y_train) + +# Make predictions +y_pred = regr.predict(X_test) + +# Evaluate the model +mse = mean_squared_error(y_test, y_pred) +r2 = r2_score(y_test, y_pred) +print(f"Mean Squared Error: {mse:.2f}") +print(f"R^2 Score: {r2:.2f}") +``` +## Conclusion +Random Forest is a powerful and flexible machine learning algorithm that can handle both classification and regression tasks. Its ability to create an ensemble of decision trees leads to robust and accurate models. However, it is important to be mindful of the computational cost associated with training multiple trees. + +## References +Scikit-learn Random Forest Documentation +Wikipedia: Random Forest +Machine Learning Mastery: Introduction to Random Forest +Kaggle: Random Forest Guide +Towards Data Science: Understanding Random Forests diff --git a/contrib/machine-learning/regression.md b/contrib/machine-learning/regression.md new file mode 100644 index 00000000..6ff6d285 --- /dev/null +++ b/contrib/machine-learning/regression.md @@ -0,0 +1,171 @@ +# Regression + + +* Regression is a supervised machine learning technique which is used to predict continuous values. + + +> Now, Supervised learning is a category of machine learning that uses labeled datasets to train algorithms to predict outcomes and recognize patterns. + +* Regression is a statistical method used to model the relationship between a dependent variable (often denoted as 'y') and one or more independent variables (often denoted as 'x'). The goal of regression analysis is to understand how the dependent variable changes as the independent variables change. + # Types Of Regression + +1. Linear Regression +2. Polynomial Regression +3. Stepwise Regression +4. Decision Tree Regression +5. Random Forest Regression +6. Ridge Regression +7. Lasso Regression +8. ElasticNet Regression +9. Bayesian Linear Regression +10. Support Vector Regression + +But, we'll first start with Linear Regression +# Linear Regression + +* Linear regression is a fundamental statistical method used to model the relationship between a dependent variable (often denoted as +𝑌) and one or more independent variables (often denoted as +𝑋). The relationship is assumed to be linear, meaning that changes in the independent variables are associated with changes in the dependent variable in a straight-line fashion. + +The basic form of linear regression for a single independent variable is: + +**𝑌=𝛽0+𝛽1𝑋+𝜖** + +Where: + +* Y is the dependent variable. +* X is the independent variable. +* 𝛽0 is the intercept, representing the value of Y when X is zero +* 𝛽1 is the slope coefficient, representing the change in Y for a one-unit change in X +* ϵ is the error term, representing the variability in Y that is not explained by the linear relationship with X. + +# Basic Code of Linear Regression + +* This line imports the numpy library, which is widely used for numerical operations in Python. We use np as an alias for numpy, making it easier to reference functions and objects from the library. +``` +import numpy as np +``` + +* This line imports the LinearRegression class from the linear_model module of the scikit-learn library.scikit-learn is a powerful library for machine learning tasks in Python, and LinearRegression is a class provided by it for linear regression. +``` +from sklearn.linear_model import LinearRegression +``` +* This line creates a NumPy array X containing the independent variable values. In this example, we have a simple one-dimensional array representing the independent variable. The reshape(-1, 1) method reshapes the array into a column vector, necessary for use with scikit-learn + +``` +X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) +``` +* This line creates a NumPy array Y containing the corresponding dependent variable values. These are the observed values of the dependent variable corresponding to the independent variable values in X. +``` +Y = np.array([2, 4, 5, 8, 5]) +``` + +* This line creates an instance of the LinearRegression class, which represents the linear regression model. We'll use this object to train the model and make predictions. +``` +model = LinearRegression() +``` + +* This line fits the linear regression model to the data. The fit() method takes two arguments: the independent variable (X) and the dependent variable (Y). This method estimates the coefficients of the linear regression equation that best fit the given data. +``` +model.fit(X, Y) +``` +* These lines print out the intercept (beta_0) and coefficient (beta_1) of the linear regression model. model.intercept_ gives the intercept value, and model.coef_ gives an array of coefficients, where model.coef_[0] corresponds to the coefficient of the first independent variable (in this case, there's only one). +``` +print("Intercept:", model.intercept_) +print("Coefficient:", model.coef_[0]) +``` + +* These lines demonstrate how to use the trained model to make predictions for new data. +* We create a new NumPy array new_data containing the values of the independent variable for which we want to predict the dependent variable values. +* We then use the predict() method of the model to obtain the predictions for these new data points. Finally, we print out the predicted values. +``` +new_data = np.array([[6], [7]]) +predictions = model.predict(new_data) +print("Predictions:", predictions) +``` +# Assumptions of Linear Regression + +# Linearity: + +* To assess the linearity assumption, we can visually inspect a scatter plot of the observed values versus the predicted values. +* If the relationship between them appears linear, it suggests that the linearity assumption is reasonable. +``` +import matplotlib.pyplot as plt +predictions = model.predict(X) +plt.scatter(predictions,Y) +plt.xlabel("Predicted Values") +plt.ylabel("Observed Values") +plt.title("Linearity Check: Observed vs Predicted") +plt.show() +``` +# Homoscedasticity: +* Homoscedasticity refers to the constant variance of the residuals across all levels of the independent variable(s). We can visually inspect a plot of residuals versus predicted values to check for homoscedasticity. +``` +residuals = Y - predictions +plt.scatter(predictions, residuals) +plt.xlabel("Predicted Values") +plt.ylabel("Residuals") +plt.title("Homoscedasticity Check: Residuals vs Predicted Values") +plt.axhline(y=0, color='red', linestyle='--') # Add horizontal line at y=0 +plt.show() + +``` +# Normality of Residuals: +* To assess the normality of residuals, we can visually inspect a histogram or a Q-Q plot of the residuals. +``` +import seaborn as sns + +sns.histplot(residuals, kde=True) +plt.xlabel("Residuals") +plt.ylabel("Frequency") +plt.title("Normality of Residuals: Histogram") +plt.show() + +import scipy.stats as stats + +stats.probplot(residuals, dist="norm", plot=plt) +plt.title("Normal Q-Q Plot") +plt.show() + +``` +# Metrics for Regression + + +# Mean Absolute Error (MAE) + +* MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average of the absolute differences between predicted and actual values. +``` +from sklearn.metrics import mean_absolute_error + +mae = mean_absolute_error(Y, predictions) +print(f"Mean Absolute Error (MAE): {mae}") + +``` +# Mean Squared Error (MSE) + +* MSE measures the average of the squares of the errors. It gives more weight to larger errors, making it sensitive to outliers. +``` +from sklearn.metrics import mean_squared_error + +mse = mean_squared_error(Y, predictions) +print(f"Mean Squared Error (MSE): {mse}") +``` +# Root Mean Squared Error (RMSE) +* RMSE is the square root of the MSE. It provides an error metric that is in the same units as the dependent variable, making it more interpretable. +``` +rmse = np.sqrt(mse) +print(f"Root Mean Squared Error (RMSE): {rmse}") + +``` +# R-squared (Coefficient of Determination) +* R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where 1 indicates a perfect fit. +``` +from sklearn.metrics import r2_score + +r2 = r2_score(Y, predictions) +print(f"R-squared (R^2): {r2}") +``` + +> In this tutorial, The sample dataset is there for learning purpose only + + diff --git a/contrib/machine-learning/reinforcement-learning.md b/contrib/machine-learning/reinforcement-learning.md new file mode 100644 index 00000000..c5529fc9 --- /dev/null +++ b/contrib/machine-learning/reinforcement-learning.md @@ -0,0 +1,233 @@ +# Reinforcement Learning: A Comprehensive Guide + +Reinforcement Learning (RL) is a field of Machine Learing which focuses on goal-directed learning from interaction with the environment. In RL, an agent learns to make decisions by performing actions in an environment to maximize cumulative numerical reward signal. This README aims to provide a thorough understanding of RL, covering key concepts, algorithms, applications, and resources. + +## What is Reinforcement Learning? + +Reinforcement learning involves determining the best actions to take in various situations to maximize a numerical reward signal. Instead of being instructed on which actions to take, the learner must explore and identify the actions that lead to the highest rewards through trial and error. After each action performed in its environment, a trainer may give feedback in the form of rewards or penalties to indicate the desirability of the resulting state. Unlike supervised learning, reinforcement learning does not depend on labeled data but instead learns from the outcomes of its actions. + +## Key Concepts and Terminology + +### Agent +Agent is a system or entity that learns to make decisions by interacting with an environment. The agent improves its performance by trial and error, receiving feedback from the environment in the form of rewards or punishments. + +### Environment +Environment is the setting or world in which the agent operates and interacts with. It provides the agent with states and feedback based on the agent's actions. + +### State +State represents the current situation of the environment, encapsulating all the relevant information needed for decision-making. + +### Action +Action represents a move that can be taken by the agent, which would affect the state of the environment. The set of all possible actions is called the action space. + +### Reward +Reward is the feedback from the environment in response to the agent’s action, thereby defining what are good and bad actions. Agent aims to maximize the total reward over time. + +### Policy +Policy is a strategy used by the agent to determine its actions based on the current state. In some cases the policy may be a simple function or lookup table, whereas in others it may involve extensive computation such as a search process. + +### Value Function +The value function of a state is the expected total amount of reward an agent can expect to accumulate over the future, starting from that state. There are two main types of value functions: + - **State Value Function (V)**: The expected reward starting from a state and following a certain policy thereafter. + - **Action Value Function (Q)**: The expected reward starting from a state, taking a specific action, and following a certain policy thereafter. + +### Model +Model mimics the behavior of the environment, or more generally, that allows inferences to be made about how the environment will behave. + +### Exploration vs. Exploitation +To accumulate substantial rewards, a reinforcement learning agent needs to favor actions that have previously yielded high rewards. However, to identify these effective actions, the agent must also attempt actions it hasn't tried before. This means the agent must *exploit* its past experiences to gain rewards, while also *exploring* new actions to improve its future decision-making. + +## Types of Reinforcement Learning + +### Model-Based vs Model-Free + +**Model-Based Reinforcement Learning:** Model-based methods involve creating a model of the environment to predict future states and rewards, allowing the agent to plan its actions by simulating various scenarios. These methods often involve two main components: + +**Model-Free Reinforcement Learning:** Model-free methods do not explicitly learn a model of the environment. Instead, they learn a policy or value function directly from the interactions with the environment. These methods can be further divided into two categories: value-based and policy-based methods. + +### Value-Based Methods: +Value-based methods focus on estimating the value function, and the policy is indirectly derived from the value function. + +### Policy-Based Methods: +Policy-based methods directly optimize the policy by maximizing the expected cumulative rewardto find the optimal parameters. + +### Actor-Critic Methods: +Actor-Critic methods combine the strengths of both value-based and policy-based methods. Actor learns the policy that maps states to actions and Critic learns the value function that evaluates the action chosen by the actor. + +## Important Algorithms + +### Q-Learning +Q-Learning is a model-free algorithm used in reinforcement learning to learn the value of an action in a particular state. It aims to find the optimal policy by iteratively updating the Q-values, which represent the expected cumulative reward of taking a particular action in a given state and following the optimal policy thereafter. + +#### Algorithm: +1. Initialize Q-values arbitrarily for all state-action pairs. +2. Repeat for each episode: + - Choose an action using an exploration strategy (e.g., epsilon-greedy). + - Take the action, observe the reward and the next state. + - Update the Q-value of the current state-action pair using the Bellman equation: + $$Q(s, a) \leftarrow Q(s, a) + \alpha \left( r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right)$$ + + where: + - $Q(s, a)$ is the Q-value of state $s$ and action $a$. + - $r$ is the observed reward. + - $s'$ is the next state. + - $\alpha$ is the learning rate. + - $\gamma$ is the discount factor. +3. Until convergence or a maximum number of episodes. + +### SARSA +SARSA (State-Action-Reward-State-Action) is an on-policy temporal difference algorithm used for learning the Q-function. Unlike Q-learning, SARSA directly updates the Q-values based on the current policy. + +#### Algorithm: +1. Initialize Q-values arbitrarily for all state-action pairs. +2. Repeat for each episode: + - Initialize the environment state $s$. + - Choose an action $a$ using the current policy (e.g., epsilon-greedy). + - Repeat for each timestep: + - Take action $a$, observe the reward $r$ and the next state $s'$. + - Choose the next action $a'$ using the current policy. + - Update the Q-value of the current state-action pair using the SARSA update rule: + $$Q(s, a) \leftarrow Q(s, a) + \alpha \left( r + \gamma Q(s', a') - Q(s, a) \right)$$ +3. Until convergence or a maximum number of episodes. + +### REINFORCE Algorithm: +REINFORCE (Monte Carlo policy gradient) is a simple policy gradient method that updates the policy parameters in the direction of the gradient of expected rewards. + +### Proximal Policy Optimization (PPO): +PPO is an advanced policy gradient method that improves stability by limiting the policy updates within a certain trust region. + +### A2C/A3C: +Advantage Actor-Critic (A2C) and Asynchronous Advantage Actor-Critic (A3C) are variants of actor-critic methods that utilize multiple parallel agents to improve sample efficiency. + +## Mathematical Background + +### Markov Decision Processes (MDPs) +A Markov Decision Process (MDP) is a mathematical framework used to model decision-making problems. It consists of states, actions, rewards and transition probabilities. + +### Bellman Equations +Bellman equations are fundamental recursive equations in dynamic programming and reinforcement learning. They express the value of a decision at one point in time in terms of the expected value of the subsequent decisions. + +## Applications of Reinforcement Learning + +### Gaming +Reinforcement learning is extensively used in gaming for developing AI agents capable of playing complex games like AlphaGo, Chess, and video games. RL algorithms enable these agents to learn optimal strategies by interacting with the game environment and receiving feedback in the form of rewards. + +### Robotics +In robotics, reinforcement learning is employed to teach robots various tasks such as navigation, manipulation, and control. RL algorithms allow robots to learn from their interactions with the environment, enabling them to adapt and improve their behavior over time without explicit programming. + +### Finance +Reinforcement learning plays a crucial role in finance, particularly in algorithmic trading and portfolio management. RL algorithms are utilized to optimize trading strategies, automate decision-making processes, and manage investment portfolios dynamically based on changing market conditions and objectives. + +### Healthcare +In healthcare, reinforcement learning is utilized for various applications such as personalized treatment, drug discovery, and optimizing healthcare operations. RL algorithms can assist in developing personalized treatment plans for patients, identifying effective drug candidates, and optimizing resource allocation in hospitals to improve patient care and outcomes. + +## Tools and Libraries +- **OpenAI Gym:** A toolkit for developing and comparing RL algorithms. +- **TensorFlow/TF-Agents:** A library for RL in TensorFlow. +- **PyTorch:** Popular machine learning library with RL capabilities. +- **Stable Baselines3:** A set of reliable implementations of RL algorithms in PyTorch. + +## How to Start with Reinforcement Learning + +### Prerequisites +- Basic knowledge of machine learning and neural networks. +- Proficiency in Python. + +### Beginner Project +The provided Python code implements the Q-learning algorithm for a basic grid world environment. It defines the grid world, actions, and parameters such as discount factor and learning rate. The algorithm iteratively learns the optimal action-value function (Q-values) by updating them based on rewards obtained from actions taken in each state. Finally, the learned Q-values are printed for each state-action pair. + +```python +import numpy as np + +# Define the grid world environment +# 'S' represents the start state +# 'G' represents the goal state +# 'H' represents the hole (negative reward) +# '.' represents empty cells (neutral reward) +# 'W' represents walls (impassable) +grid_world = np.array([ + ['S', '.', '.', '.', '.'], + ['.', 'W', '.', 'H', '.'], + ['.', '.', '.', 'W', '.'], + ['.', 'W', '.', '.', 'G'] +]) + +# Define the actions (up, down, left, right) +actions = ['UP', 'DOWN', 'LEFT', 'RIGHT'] + +# Define parameters +gamma = 0.9 # discount factor +alpha = 0.1 # learning rate +epsilon = 0.1 # exploration rate + +# Initialize Q-values +num_rows, num_cols = grid_world.shape +num_actions = len(actions) +Q = np.zeros((num_rows, num_cols, num_actions)) + +# Define helper function to get possible actions in a state +def possible_actions(state): + row, col = state + possible_actions = [] + for i, action in enumerate(actions): + if action == 'UP' and row > 0 and grid_world[row - 1, col] != 'W': + possible_actions.append(i) + elif action == 'DOWN' and row < num_rows - 1 and grid_world[row + 1, col] != 'W': + possible_actions.append(i) + elif action == 'LEFT' and col > 0 and grid_world[row, col - 1] != 'W': + possible_actions.append(i) + elif action == 'RIGHT' and col < num_cols - 1 and grid_world[row, col + 1] != 'W': + possible_actions.append(i) + return possible_actions + +# Q-learning algorithm +num_episodes = 1000 +for episode in range(num_episodes): + # Initialize the starting state + state = (0, 0) # start state + while True: + # Choose an action using epsilon-greedy policy + if np.random.uniform(0, 1) < epsilon: + action = np.random.choice(possible_actions(state)) + else: + action = np.argmax(Q[state[0], state[1]]) + + # Perform the action and observe the next state and reward + if actions[action] == 'UP': + next_state = (state[0] - 1, state[1]) + elif actions[action] == 'DOWN': + next_state = (state[0] + 1, state[1]) + elif actions[action] == 'LEFT': + next_state = (state[0], state[1] - 1) + elif actions[action] == 'RIGHT': + next_state = (state[0], state[1] + 1) + + # Get the reward + if grid_world[next_state] == 'G': + reward = 1 # goal state + elif grid_world[next_state] == 'H': + reward = -1 # hole + else: + reward = 0 + + # Update Q-value using the Bellman equation + best_next_action = np.argmax(Q[next_state[0], next_state[1]]) + Q[state[0], state[1], action] += alpha * ( + reward + gamma * Q[next_state[0], next_state[1], best_next_action] - Q[state[0], state[1], action]) + + # Move to the next state + state = next_state + + # Check if the episode is terminated + if grid_world[state] in ['G', 'H']: + break + +# Print the learned Q-values +print("Learned Q-values:") +for i in range(num_rows): + for j in range(num_cols): + print(f"State ({i}, {j}):", Q[i, j]) +``` + +## Conclusion +Congratulations on completing your journey through this comprehensive guide to reinforcement learning! Armed with this knowledge, you are well-equipped to dive deeper into the exciting world of RL, whether it's for gaming, robotics, finance, healthcare, or any other domain. Keep exploring, experimenting, and learning, and remember, the only limit to what you can achieve with reinforcement learning is your imagination. diff --git a/contrib/machine-learning/sklearn-introduction.md b/contrib/machine-learning/sklearn-introduction.md new file mode 100644 index 00000000..7bb5aa8d --- /dev/null +++ b/contrib/machine-learning/sklearn-introduction.md @@ -0,0 +1,144 @@ +# scikit-learn (sklearn) Python Library + +## Overview + +scikit-learn, also known as sklearn, is a popular open-source Python library that provides simple and efficient tools for data mining and data analysis. It is built on NumPy, SciPy, and matplotlib. The library is designed to interoperate with the Python numerical and scientific libraries. + +## Key Features + +- **Classification**: Identifying which category an object belongs to. Example algorithms include SVM, nearest neighbors, random forest. +- **Regression**: Predicting a continuous-valued attribute associated with an object. Example algorithms include support vector regression (SVR), ridge regression, Lasso. +- **Clustering**: Automatic grouping of similar objects into sets. Example algorithms include k-means, spectral clustering, mean-shift. +- **Dimensionality Reduction**: Reducing the number of random variables to consider. Example algorithms include PCA, feature selection, non-negative matrix factorization. +- **Model Selection**: Comparing, validating, and choosing parameters and models. Example methods include grid search, cross-validation, metrics. +- **Preprocessing**: Feature extraction and normalization. + +## When to Use scikit-learn + +- **Use scikit-learn if**: + - You are working on machine learning tasks such as classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. + - You need an easy-to-use, well-documented library. + - You require tools that are compatible with NumPy and SciPy. + +- **Do not use scikit-learn if**: + - You need to perform deep learning tasks. In such cases, consider using TensorFlow or PyTorch. + - You need out-of-the-box support for large-scale data. scikit-learn is designed to work with in-memory data, so for very large datasets, you might want to consider libraries like Dask-ML. + +## Installation + +You can install scikit-learn using pip: + +```bash +pip install scikit-learn +``` + +Or via conda: + +```bash +conda install scikit-learn +``` + +## Basic Usage with Code Snippets + +### Importing the Library + +```python +import numpy as np +from sklearn.model_selection import train_test_split +from sklearn.preprocessing import StandardScaler +from sklearn.linear_model import LogisticRegression +from sklearn.metrics import accuracy_score +``` + +### Loading Data + +For illustration, let's create a simple synthetic dataset: + +```python +from sklearn.datasets import make_classification + +X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42) +``` + +### Splitting Data + +Split the dataset into training and testing sets: + +```python +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) +``` + +### Preprocessing + +Standardizing the features: + +```python +scaler = StandardScaler() +X_train = scaler.fit_transform(X_train) +X_test = scaler.transform(X_test) +``` + +### Training a Model + +Train a Logistic Regression model: + +```python +model = LogisticRegression() +model.fit(X_train, y_train) +``` + +### Making Predictions + +Make predictions on the test set: + +```python +y_pred = model.predict(X_test) +``` + +### Evaluating the Model + +Evaluate the accuracy of the model: + +```python +accuracy = accuracy_score(y_test, y_pred) +print(f"Accuracy: {accuracy * 100:.2f}%") +``` + +### Putting it All Together + +Here is a complete example from data loading to model evaluation: + +```python +import numpy as np +from sklearn.datasets import make_classification +from sklearn.model_selection import train_test_split +from sklearn.preprocessing import StandardScaler +from sklearn.linear_model import LogisticRegression +from sklearn.metrics import accuracy_score + +# Load data +X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42) + +# Split data +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) + +# Preprocess data +scaler = StandardScaler() +X_train = scaler.fit_transform(X_train) +X_test = scaler.transform(X_test) + +# Train model +model = LogisticRegression() +model.fit(X_train, y_train) + +# Make predictions +y_pred = model.predict(X_test) + +# Evaluate model +accuracy = accuracy_score(y_test, y_pred) +print(f"Accuracy: {accuracy * 100:.2f}%") +``` + +## Conclusion + +scikit-learn is a powerful and versatile library that can be used for a wide range of machine learning tasks. It is particularly well-suited for beginners due to its easy-to-use interface and extensive documentation. Whether you are working on a simple classification task or a more complex clustering problem, scikit-learn provides the tools you need to build and evaluate your models effectively. diff --git a/contrib/machine-learning/support-vector-machine.md b/contrib/machine-learning/support-vector-machine.md new file mode 100644 index 00000000..0117e9f4 --- /dev/null +++ b/contrib/machine-learning/support-vector-machine.md @@ -0,0 +1,62 @@ +## Support Vector Machine + +Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning. + +SVM can be of two types - +1. Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier. +2. Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier. + +Working of SVM - The goal of SVM is to find a hyperplane that separates the data points into different classes. A hyperplane is a line in 2D space, a plane in 3D space, or a higher-dimensional surface in n-dimensional space. The hyperplane is chosen in such a way that it maximizes the margin, which is the distance between the hyperplane and the closest data points of each class. The closest data points are called the support vectors. + +The distance between the hyperplane and a data point "x" can be calculated using the formula − +``` +distance = (w . x + b) / ||w|| +``` +where "w" is the weight vector, "b" is the bias term, and "||w||" is the Euclidean norm of the weight vector. The weight vector "w" is perpendicular to the hyperplane and determines its orientation, while the bias term "b" determines its position. + +The optimal hyperplane is found by solving an optimization problem, which is to maximize the margin subject to the constraint that all data points are correctly classified. In other words, we want to find the hyperplane that maximizes the margin between the two classes while ensuring that no data point is misclassified. This is a convex optimization problem that can be solved using quadratic programming. If the data points are not linearly separable, we can use a technique called kernel trick to map the data points into a higher-dimensional space where they become separable. The kernel function computes the inner product between the mapped data points without computing the mapping itself. This allows us to work with the data points in the higherdimensional space without incurring the computational cost of mapping them. + +1. Hyperplane: +There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to find out the best decision boundary that helps to classify the data points. This best boundary is known as the hyperplane of SVM. +The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features, then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane. We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points. +2. Support Vectors: +The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector. +3. Margin: +It may be defined as the gap between two lines on the closet data points of different classes. It can be calculated as the perpendicular distance from the line to the support vectors. Large margin is considered as a good margin and small margin is considered as a bad margin. + +We will use the famous Iris dataset, which contains the sepal length, sepal width, petal length, and petal width of three species of iris flowers: Iris setosa, Iris versicolor, and Iris virginica. The goal is to classify the flowers into their respective species based on these four features. We load the iris dataset using load_iris and split the data into training and testing sets using train_test_split. We use a test size of 0.2, which means that 20% of the data will be used for testing and 80% for training. We set the random state to 42 to ensure reproducibility of the results. + +### Implemetation of SVM in Python + +```python +from sklearn.datasets import load_iris +from sklearn.model_selection import train_test_split +from sklearn.svm import SVC +from sklearn.metrics import accuracy_score + +# load the iris dataset +iris = load_iris() + +# split the data into training and testing sets +X_train, X_test, y_train, y_test = train_test_split(iris.data, +iris.target, test_size=0.2, random_state=42) + +# create an SVM classifier with a linear kernel +svm = SVC(kernel='linear') + +# train the SVM classifier on the training set +svm.fit(X_train, y_train) + +# make predictions on the testing set +y_pred = svm.predict(X_test) + +# calculate the accuracy of the classifier +accuracy = accuracy_score(y_test, y_pred) +print("Accuracy:", accuracy) +``` + +#### Output +``` +Accuracy: 1 +``` + diff --git a/contrib/machine-learning/tensorflow.md b/contrib/machine-learning/tensorflow.md new file mode 100644 index 00000000..b2c847c6 --- /dev/null +++ b/contrib/machine-learning/tensorflow.md @@ -0,0 +1,64 @@ +# TensorFlow + +Developed by the Google Brain team, TensorFlow is an open-source library that provides a comprehensive ecosystem for building and deploying machine learning models. It supports deep learning and neural networks and offers tools for both beginners and experts. + +## Key Features + +- **Flexible and comprehensive ecosystem** +- **Scalable for both production and research** +- **Supports CPUs, GPUs, and TPUs** + +## Basic Example: Linear Regression + +Let's start with a simple linear regression example in TensorFlow. + +```python +import tensorflow as tf +import numpy as np +import matplotlib.pyplot as plt + +# Generate synthetic data +X = np.array([1, 2, 3, 4, 5], dtype=np.float32) +Y = np.array([2, 4, 6, 8, 10], dtype=np.float32) + +# Define the model +model = tf.keras.Sequential([ + tf.keras.layers.Dense(units=1, input_shape=[1]) +]) + +# Compile the model +model.compile(optimizer='sgd', loss='mean_squared_error') + +# Train the model +history = model.fit(X, Y, epochs=500) + +# Predict +predictions = model.predict(X) + +# Plot the results +plt.plot(X, Y, 'ro', label='Original data') +plt.plot(X, predictions, 'b-', label='Fitted line') +plt.legend() +plt.show() +``` + +In this example: + +1. We define a simple dataset with a linear relationship. +2. We build a sequential model with one dense layer (linear regression). +3. We compile the model with stochastic gradient descent (SGD) optimizer and mean squared error loss. +4. We train the model for 500 epochs and then plot the original data and the fitted line. + +## When to Use TensorFlow + +TensorFlow is a great choice if you: + +- **Need to deploy machine learning models in production:** TensorFlow’s robust deployment options, including TensorFlow Serving, TensorFlow Lite, and TensorFlow.js, make it ideal for production environments. +- **Work on large-scale deep learning projects:** TensorFlow’s comprehensive ecosystem supports distributed training and has tools like TensorBoard for visualization. +- **Require high performance and scalability:** TensorFlow is optimized for performance and can leverage GPUs and TPUs for accelerated computing. +- **Want extensive support and documentation:** TensorFlow has a large community and extensive documentation, which can be very helpful for both beginners and advanced users. + +## Example Use Cases + +- Building and deploying complex neural networks for image recognition, natural language processing, or recommendation systems. +- Developing models that need to be run on mobile or embedded devices. diff --git a/contrib/machine-learning/transformers.md b/contrib/machine-learning/transformers.md new file mode 100644 index 00000000..5a276885 --- /dev/null +++ b/contrib/machine-learning/transformers.md @@ -0,0 +1,443 @@ +# Transformers +## Introduction +A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism. It is based on the softmax-based attention +mechanism. Before transformers, predecessors of attention mechanism were added to gated recurrent neural networks, such as LSTMs and gated recurrent units (GRUs), which processed datasets sequentially. Dependency on previous token computations prevented them from being able to parallelize the attention mechanism. + +Transformers are a revolutionary approach to natural language processing (NLP). Unlike older models, they excel at understanding long-range connections between words. This "attention" mechanism lets them grasp the context of a sentence, making them powerful for tasks like machine translation, text summarization, and question answering. Introduced in 2017, transformers are now the backbone of many large language models, including tools you might use every day. Their ability to handle complex relationships in language is fueling advancements in AI across various fields. + +## Model Architecture + +![Model Architecture](assets/transformer-architecture.png) + +Source: [Attention Is All You Need](https://arxiv.org/pdf/1706.03762) + + +### Encoder +The encoder is composed of a stack of identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, positionwise fully connected feed-forward network. Each encoder consists of two major components: a self-attention mechanism and a feed-forward neural network. The self-attention mechanism accepts input encodings from the previous encoder and weights their relevance to each other to generate output encodings. The feed-forward neural network further processes each output encoding individually. These output encodings are then passed to the next encoder as its input, as well as to the decoders. + +### Decoder +The decoder is also composed of a stack of identical layers. In addition to the two sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head attention over the output of the encoder stack. The decoder functions in a similar fashion to the encoder, but an additional attention mechanism is inserted which instead draws relevant information from the encodings generated by the encoders. This mechanism can also be called the encoder-decoder attention. + +### Attention +#### Scaled Dot-Product Attention +The input consists of queries and keys of dimension $d_k$ , and values of dimension $d_v$. We compute the dot products of the query with all keys, divide each by $\sqrt {d_k}$ , and apply a softmax function to obtain the weights on the values. + +$$Attention(Q, K, V) = softmax(\dfrac{QK^T}{\sqrt{d_k}}) \times V$$ + +#### Multi-Head Attention +Instead of performing a single attention function with $d_{model}$-dimensional keys, values and queries, it is beneficial to linearly project the queries, keys and values h times with different, learned linear projections to $d_k$ , $d_k$ and $d_v$ dimensions, respectively. + +Multi-head attention allows the model to jointly attend to information from different representation +subspaces at different positions. With a single attention head, averaging inhibits this. + +$$MultiHead(Q, K, V) = Concat(head_1, _{...}, head_h) \times W^O$$ + +where, + +$$head_i = Attention(QW_i^Q, KW_i^K, VW_i^V)$$ + +where the projections are parameter matrices. + +#### Masked Attention +It may be necessary to cut out attention links between some word-pairs. For example, the decoder for token position +$t$ should not have access to token position $t+1$. + +$$MaskedAttention(Q, K, V) = softmax(M + \dfrac{QK^T}{\sqrt{d_k}}) \times V$$ + +### Feed-Forward Network +Each of the layers in the encoder and decoder contains a fully connected feed-forward network, which is applied to each position separately and identically. This +consists of two linear transformations with a ReLU activation in between. + +$$FFN(x) = max(0, xW_1 + b_1)W_2 + b_2$$ + +### Positional Encoding +A positional encoding is a fixed-size vector representation that encapsulates the relative positions of tokens within a target sequence: it provides the transformer model with information about where the words are in the input sequence. + +The sine and cosine functions of different frequencies: + +$$PE(pos,2i) = \sin({\dfrac{pos}{10000^{\dfrac{2i}{d_{model}}}}})$$ + +$$PE(pos,2i) = \cos({\dfrac{pos}{10000^{\dfrac{2i}{d_{model}}}}})$$ + +## Implementation +### Theory +Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. +At each layer, each token is then contextualized within the scope of the context window with other tokens via a parallel multi-head attention mechanism +allowing the signal for key tokens to be amplified and less important tokens to be diminished. + +The transformer uses an encoder-decoder architecture. The encoder extracts features from an input sentence, and the decoder uses the features to produce an output sentence. Some architectures use full encoders and decoders, autoregressive encoders and decoders, or combination of both. This depends on the usage and context of the input. + +### Tensorflow +TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. It was developed by the Google Brain team for Google's internal use in research and production. + +Tensorflow provides the transformer encoder and decoder block that can be implemented by the specification of the user. Although, the transformer is not provided as a standalone to be imported and executed, the user has to create the model first. They also have a tutorial on how to implement the transformer from scratch for machine translation and can be found [here](https://www.tensorflow.org/text/tutorials/transformer). + +More information on [encoder](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers/TransformerEncoderBlock) and [decoder](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers/TransformerDecoderBlock) block mentioned in the code. + +Imports: +```python +import tensorflow as tf +import tensorflow_models as tfm +``` + +Adding word embeddings and positional encoding: +```python +class PositionalEmbedding(tf.keras.layers.Layer): + def __init__(self, vocab_size, d_model): + super().__init__() + self.d_model = d_model + self.embedding = tf.keras.layers.Embedding(vocab_size, d_model, mask_zero=True) + self.pos_encoding = tfm.nlp.layers.RelativePositionEmbedding(hidden_size=d_model) + + def compute_mask(self, *args, **kwargs): + return self.embedding.compute_mask(*args, **kwargs) + + def call(self, x): + length = tf.shape(x)[1] + x = self.embedding(x) + x = x + self.pos_encoding[tf.newaxis, :length, :] + return x +``` + +Creating the encoder for the transformer: +```python +class Encoder(tf.keras.layers.Layer): + def __init__(self, num_layers, d_model, num_heads, + dff, vocab_size, dropout_rate=0.1): + super().__init__() + + self.d_model = d_model + self.num_layers = num_layers + + self.pos_embedding = PositionalEmbedding( + vocab_size=vocab_size, d_model=d_model) + + self.enc_layers = [ + tfm.nlp.layers.TransformerEncoderBlock(output_last_dim=d_model, + num_attention_heads=num_heads, + inner_dim=dff, + inner_activation="relu", + inner_dropout=dropout_rate) + for _ in range(num_layers)] + self.dropout = tf.keras.layers.Dropout(dropout_rate) + + def call(self, x): + x = self.pos_embedding(x, length=2048) + x = self.dropout(x) + + for i in range(self.num_layers): + x = self.enc_layers[i](x) + + return x +``` + +Creating the decoder for the transformer: +```python +class Decoder(tf.keras.layers.Layer): + def __init__(self, num_layers, d_model, num_heads, dff, vocab_size, + dropout_rate=0.1): + super(Decoder, self).__init__() + + self.d_model = d_model + self.num_layers = num_layers + + self.pos_embedding = PositionalEmbedding(vocab_size=vocab_size, + d_model=d_model) + self.dropout = tf.keras.layers.Dropout(dropout_rate) + self.dec_layers = [ + tfm.nlp.layers.TransformerDecoderBlock(num_attention_heads=num_heads, + intermediate_size=dff, + intermediate_activation="relu", + dropout_rate=dropout_rate) + for _ in range(num_layers)] + + def call(self, x, context): + x = self.pos_embedding(x) + x = self.dropout(x) + + for i in range(self.num_layers): + x = self.dec_layers[i](x, context) + + return x +``` + +Combining the encoder and decoder to create the transformer: +```python +class Transformer(tf.keras.Model): + def __init__(self, num_layers, d_model, num_heads, dff, + input_vocab_size, target_vocab_size, dropout_rate=0.1): + super().__init__() + self.encoder = Encoder(num_layers=num_layers, d_model=d_model, + num_heads=num_heads, dff=dff, + vocab_size=input_vocab_size, + dropout_rate=dropout_rate) + + self.decoder = Decoder(num_layers=num_layers, d_model=d_model, + num_heads=num_heads, dff=dff, + vocab_size=target_vocab_size, + dropout_rate=dropout_rate) + + self.final_layer = tf.keras.layers.Dense(target_vocab_size) + + def call(self, inputs): + context, x = inputs + + context = self.encoder(context) + x = self.decoder(x, context) + logits = self.final_layer(x) + + return logits +``` + +Model initialization that be used for training and inference: +```python +transformer = Transformer( + num_layers=num_layers, + d_model=d_model, + num_heads=num_heads, + dff=dff, + input_vocab_size=64, + target_vocab_size=64, + dropout_rate=dropout_rate +) +``` + +Sample: +```python +src = tf.random.uniform((64, 40)) +tgt = tf.random.uniform((64, 50)) + +output = transformer((src, tgt)) +``` + +O/P: +``` + +``` +``` +>>> output.shape +TensorShape([64, 50, 64]) +``` + +### PyTorch +PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. + +Unlike Tensorflow, PyTorch provides the full implementation of the transformer model that can be executed on the go. More information can be found [here](https://pytorch.org/docs/stable/_modules/torch/nn/modules/transformer.html#Transformer). A full implementation of the model can be found [here](https://github.com/pytorch/examples/tree/master/word_language_model). + +Imports: +```python +import torch +import torch.nn as nn +``` + +Initializing the model: +```python +transformer = nn.Transformer(nhead=16, num_encoder_layers=8) +``` + +Sample: +```python +src = torch.rand((10, 32, 512)) +tgt = torch.rand((20, 32, 512)) + +output = transformer(src, tgt) +``` + +O/P: +``` +tensor([[[ 0.2938, -0.4824, -0.7816, ..., 0.0742, 0.5162, 0.3632], + [-0.0786, -0.5241, 0.6384, ..., 0.3462, -0.0618, 0.9943], + [ 0.7827, 0.1067, -0.1637, ..., -1.7730, -0.3322, -0.0029], + ..., + [-0.3202, 0.2341, -0.0896, ..., -0.9714, -0.1251, -0.0711], + [-0.1663, -0.5047, -0.0404, ..., -0.9339, 0.3963, 0.1018], + [ 1.2834, -0.4400, 0.0486, ..., -0.6876, -0.4752, 0.0180]], + + [[ 0.9869, -0.7384, -1.0704, ..., -0.9417, 1.3279, -0.1665], + [ 0.3445, -0.2454, -0.3644, ..., -0.4856, -1.1004, -0.6819], + [ 0.7568, -0.3151, -0.5034, ..., -1.2081, -0.7119, 0.3775], + ..., + [-0.0451, -0.7596, 0.0168, ..., -0.8267, -0.3272, 1.0457], + [ 0.3150, -0.6588, -0.1840, ..., 0.1822, -0.0653, 0.9053], + [ 0.8692, -0.3519, 0.3128, ..., -1.8446, -0.2325, -0.8662]], + + [[ 0.9719, -0.3113, 0.4637, ..., -0.4422, 1.2348, 0.8274], + [ 0.3876, -0.9529, -0.7810, ..., -0.5843, -1.1439, -0.3366], + [-0.5774, 0.3789, -0.2819, ..., -1.4057, 0.4352, 0.1474], + ..., + [ 0.6899, -0.1146, -0.3297, ..., -1.7059, -0.1750, 0.4203], + [ 0.3689, -0.5174, -0.1253, ..., 0.1417, 0.4159, 0.7560], + [ 0.5024, -0.7996, 0.1592, ..., -0.8344, -1.1125, 0.4736]], + + ..., + + [[ 0.0704, -0.3971, -0.2768, ..., -1.9929, 0.8608, 1.2264], + [ 0.4013, -0.0962, -0.0965, ..., -0.4452, -0.8682, -0.4593], + [ 0.1656, 0.5224, -0.1723, ..., -1.5785, 0.3219, 1.1507], + ..., + [-0.9443, 0.4653, 0.2936, ..., -0.9840, -0.0142, -0.1595], + [-0.6544, -0.3294, -0.0803, ..., 0.1623, -0.5061, 0.9824], + [-0.0978, -1.0023, -0.6915, ..., -0.2296, -0.0594, -0.4715]], + + [[ 0.6531, -0.9285, -0.0331, ..., -1.1481, 0.7768, -0.7321], + [ 0.3325, -0.6683, -0.6083, ..., -0.4501, 0.2289, 0.3573], + [-0.6750, 0.4600, -0.8512, ..., -2.0097, -0.5159, 0.2773], + ..., + [-1.4356, -1.0135, 0.0081, ..., -1.2985, -0.3715, -0.2678], + [ 0.0546, -0.2111, -0.0965, ..., -0.3822, -0.4612, 1.6217], + [ 0.7700, -0.5309, -0.1754, ..., -2.2807, -0.0320, -1.5551]], + + [[ 0.2399, -0.9659, 0.1086, ..., -1.1756, 0.4063, 0.0615], + [-0.2202, -0.7972, -0.5024, ..., -0.9126, -1.5248, 0.2418], + [ 0.5215, 0.4540, 0.0036, ..., -0.2135, 0.2145, 0.6638], + ..., + [-0.2190, -0.4967, 0.7149, ..., -0.3324, 0.3502, 1.0624], + [-0.0108, -0.9205, -0.1315, ..., -1.0153, 0.2989, 1.1415], + [ 1.1284, -0.6560, 0.6755, ..., -1.2157, 0.8580, -0.5022]]], + grad_fn=) +``` +``` +>> output.shape +torch.Size([20, 32, 512]) +``` + +### HuggingFace +Hugging Face, Inc. is a French-American company incorporated under the Delaware General Corporation Law and based in New York City that develops computation tools for building applications using machine learning. + +It has a wide-range of models that can implemented in Tensorflow, PyTorch and other development backends as well. The models are already trained on a dataset and can be pretrained on custom dataset for customized use, according to the user. The information for training the model and loading the pretrained model can be found [here](https://huggingface.co/docs/transformers/en/training). + +In HuggingFace, `pipeline` is used to run inference from the trained model available in the Hub. This is very beginner friendly. The model is downloaded to the local system on running the script before running the inference. It has to be made sure that the model downloaded does not exceed your available data plan. + +Imports: +```python +from transformers import pipeline +``` + +Initialization: + +The model used here is BART (large) which was trained on MultiNLI dataset, which consist of sentence paired with its textual entailment. +```python +classifier = pipeline(model="facebook/bart-large-mnli") +``` + +Sample: + +The first argument is the sentence which needs to be analyzed. The second argument, `candidate_labels`, is the list of labels which most likely the first argument sentence belongs to. The output dictionary will have a key as `score`, where the highest index is the textual entailment of the sentence with the index of the label in the list. + +```python +output = classifier( + "I need to leave but later", + candidate_labels=["urgent", "not urgent", "sleep"], +) +``` + +O/P: + +``` +{'sequence': 'I need to leave but later', + 'labels': ['not urgent', 'urgent', 'sleep'], + 'scores': [0.8889380097389221, 0.10631518065929413, 0.00474683940410614]} +``` + +## Application +The transformer has had great success in natural language processing (NLP). Many large language models such as GPT-2, GPT-3, GPT-4, Claude, BERT, XLNet, RoBERTa and ChatGPT demonstrate the ability of transformers to perform a wide variety of such NLP-related tasks, and have the potential to find real-world applications. + +These may include: +- Machine translation +- Document summarization +- Text generation +- Biological sequence analysis +- Computer code generation + +## Bibliography +- [Attention Is All You Need](https://arxiv.org/pdf/1706.03762) +- [Tensorflow Tutorial](https://www.tensorflow.org/text/tutorials/transformer) +- [Tensorflow Models Docs](https://www.tensorflow.org/api_docs/python/tfm/nlp/layers) +- [Wikipedia](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)) +- [HuggingFace](https://huggingface.co/docs/transformers/en/index) +- [PyTorch](https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html) diff --git a/contrib/machine-learning/types-of-optimizers.md b/contrib/machine-learning/types-of-optimizers.md new file mode 100644 index 00000000..ae2759d1 --- /dev/null +++ b/contrib/machine-learning/types-of-optimizers.md @@ -0,0 +1,357 @@ + +--- +# Optimizers in Machine Learning + +Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate in order to reduce the losses. Optimization algorithms help to minimize (or maximize) an objective function (also called a loss function) which is simply a mathematical function dependent on the model's internal learnable parameters which are used in computing the target values from the set of features. + +## Types of Optimizers + + + +### 1. Gradient Descent + +**Explanation:** +Gradient Descent is the simplest and most commonly used optimization algorithm. It works by iteratively updating the model parameters in the opposite direction of the gradient of the objective function with respect to the parameters. The idea is to find the minimum of a function by taking steps proportional to the negative of the gradient of the function at the current point. + +**Mathematical Formulation:** + +The update rule for the parameter vector θ in gradient descent is represented by the equation: + +- $$\theta_{\text{new}} = \theta_{\text{old}} - \alpha \cdot \nabla J(\theta)$$ + +Where: +- θold is the old parameter vector. +- θnew is the updated parameter vector. +- alpha(α) is the learning rate. +- ∇J(θ) is the gradient of the objective function with respect to the parameters. + + + +**Intuition:** +- At each iteration, we calculate the gradient of the cost function. +- The parameters are updated in the opposite direction of the gradient. +- The size of the step is controlled by the learning rate α. + +**Advantages:** +- Simple to implement. +- Suitable for convex problems. + +**Disadvantages:** +- Can be slow for large datasets. +- May get stuck in local minima for non-convex problems. +- Requires careful tuning of the learning rate. + +**Python Implementation:** +```python +import numpy as np + +def gradient_descent(X, y, lr=0.01, epochs=1000): + m, n = X.shape + theta = np.zeros(n) + for epoch in range(epochs): + gradient = np.dot(X.T, (np.dot(X, theta) - y)) / m + theta -= lr * gradient + return theta +``` + +### 2. Stochastic Gradient Descent (SGD) + +**Explanation:** +SGD is a variation of gradient descent where we use only one training example to calculate the gradient and update the parameters. This introduces noise into the parameter updates, which can help to escape local minima but may cause the loss to fluctuate. + +**Mathematical Formulation:** + +- $$θ = θ - α \cdot \frac{∂J (θ; xᵢ, yᵢ)}{∂θ}$$ + + +- xᵢ, yᵢ are a single training example and its target. + +**Intuition:** +- At each iteration, a random training example is selected. +- The gradient is calculated and the parameters are updated for this single example. +- This process is repeated for a specified number of epochs. + +**Advantages:** +- Faster updates compared to batch gradient descent. +- Can handle large datasets. +- Helps to escape local minima due to the noise in updates. + +**Disadvantages:** +- Loss function may fluctuate. +- Requires more iterations to converge. + +**Python Implementation:** +```python +def stochastic_gradient_descent(X, y, lr=0.01, epochs=1000): + m, n = X.shape + theta = np.zeros(n) + for epoch in range(epochs): + for i in range(m): + rand_index = np.random.randint(0, m) + xi = X[rand_index:rand_index+1] + yi = y[rand_index:rand_index+1] + gradient = np.dot(xi.T, (np.dot(xi, theta) - yi)) + theta -= lr * gradient + return theta +``` + +### 3. Mini-Batch Gradient Descent + +**Explanation:** +Mini-Batch Gradient Descent is a variation where instead of a single training example or the whole dataset, a mini-batch of examples is used to compute the gradient. This reduces the variance of the parameter updates, leading to more stable convergence. + +**Mathematical Formulation:** + +- $$θ = θ - α \cdot \frac{1}{k} \sum_{i=1}^{k} \frac{∂J (θ; xᵢ, yᵢ)}{∂θ}$$ + + +Where: +- \( k \) is the batch size. + +**Intuition:** +- At each iteration, a mini-batch of training examples is selected. +- The gradient is calculated for this mini-batch. +- The parameters are updated based on the average gradient of the mini-batch. + +**Advantages:** +- More stable updates compared to SGD. +- Faster convergence than batch gradient descent. +- Efficient on large datasets. + +**Disadvantages:** +- Requires tuning of batch size. +- Computationally more expensive than SGD per iteration. + +**Python Implementation:** +```python +def mini_batch_gradient_descent(X, y, lr=0.01, epochs=1000, batch_size=32): + m, n = X.shape + theta = np.zeros(n) + for epoch in range(epochs): + indices = np.random.permutation(m) + X_shuffled = X[indices] + y_shuffled = y[indices] + for i in range(0, m, batch_size): + X_i = X_shuffled[i:i+batch_size] + y_i = y_shuffled[i:i+batch_size] + gradient = np.dot(X_i.T, (np.dot(X_i, theta) - y_i)) / batch_size + theta -= lr * gradient + return theta +``` + +### 4. Momentum + +**Explanation:** +Momentum helps accelerate gradient vectors in the right directions, thus leading to faster converging. It accumulates a velocity vector in directions of persistent reduction in the objective function, which helps to smooth the path towards the minimum. + +**Mathematical Formulation:** + +- $$v_t = γ \cdot v_{t-1} + α \cdot ∇J(θ)$$ +- $$θ = θ - v_t$$ + +where: + +- \( v_t \) is the velocity. +- γ is the momentum term, typically set between 0.9 and 0.99. + +**Intuition:** +- At each iteration, the gradient is calculated. +- The velocity is updated based on the current gradient and the previous velocity. +- The parameters are updated based on the velocity. + +**Advantages:** +- Faster convergence. +- Reduces oscillations in the parameter updates. + +**Disadvantages:** +- Requires tuning of the momentum term. + +**Python Implementation:** +```python +def momentum_gradient_descent(X, y, lr=0.01, epochs=1000, gamma=0.9): + m, n = X.shape + theta = np.zeros(n) + v = np.zeros(n) + for epoch in range(epochs): + gradient = np.dot(X.T, (np.dot(X, theta) - y)) / m + v = gamma * v + lr * gradient + theta -= v + return theta +``` + +### 5. Nesterov Accelerated Gradient (NAG) + +**Explanation:** +NAG is a variant of the gradient descent with momentum. It looks ahead by a step and calculates the gradient at that point, thus providing more accurate updates. This method helps to correct the overshooting problem seen in standard momentum. + +**Mathematical Formulation:** + +- $$v_t = γv_{t-1} + α \cdot ∇J(θ - γ \cdot v_{t-1})$$ + +- $$θ = θ - v_t$$ + + + + +**Intuition:** +- At each iteration, the parameters are temporarily updated using the previous velocity. +- The gradient is calculated at this lookahead position. +- The velocity and parameters are then updated based on this gradient. + +**Advantages:** +- More accurate updates compared to standard momentum. +- Faster convergence. + +**Disadvantages:** +- Requires tuning of the momentum term. + +**Python Implementation:** +```python +def nesterov_accelerated_gradient(X, y, lr=0.01, epochs=1000, gamma=0.9): + m, n = X.shape + theta = np.zeros(n) + v = np.zeros(n) + for epoch in range(epochs): + lookahead_theta = theta - gamma * v + gradient = np.dot(X.T, (np.dot(X, lookahead_theta) - y)) / m + v = gamma * v + lr * gradient + theta -= v + return theta +``` + +### 6. AdaGrad + +**Explanation:** +AdaGrad adapts the learning rate to the parameters, performing larger updates for infrequent and smaller updates for frequent parameters. It scales the learning rate inversely proportional to the square root of the sum of all historical squared values of the gradient. + +**Mathematical Formulation:** + +- $$G_t = G_{t-1} + (∂J(θ)/∂θ)^2$$ + +- $$θ = θ - \frac{α}{\sqrt{G_t + ε}} \cdot ∇J(θ)$$ + +Where: +- \(G_t\) is the sum of squares of the gradients up to time step \( t \). +- ε is a small constant to avoid division by zero. + +**Intuition:** +- Accumulates the sum of the squares of the gradients for each parameter. +- Uses this accumulated + + sum to scale the learning rate. +- Parameters with large gradients in the past have smaller learning rates. + +**Advantages:** +- Effective for sparse data. +- Automatically adjusts learning rate. + +**Disadvantages:** +- Learning rate decreases continuously, which can lead to premature convergence. + +**Python Implementation:** +```python +def adagrad(X, y, lr=0.01, epochs=1000, epsilon=1e-8): + m, n = X.shape + theta = np.zeros(n) + G = np.zeros(n) + for epoch in range(epochs): + gradient = np.dot(X.T, (np.dot(X, theta) - y)) / m + G += gradient**2 + adjusted_lr = lr / (np.sqrt(G) + epsilon) + theta -= adjusted_lr * gradient + return theta +``` + +### 7. RMSprop + +**Explanation:** +RMSprop modifies AdaGrad to perform well in non-convex settings by using a moving average of squared gradients to scale the learning rate. It helps to keep the learning rate in check, especially in the presence of noisy gradients. + +**Mathematical Formulation:** + +- E[g²]ₜ = βE[g²]ₜ₋₁ + (1 - β)(∂J(θ) / ∂θ)² + +- $$θ = θ - \frac{α}{\sqrt{E[g^2]_t + ε}} \cdot ∇J(θ)$$ + +Where: +- \( E[g^2]_t \) is the exponentially decaying average of past squared gradients. +- β is the decay rate. + +**Intuition:** +- Keeps a running average of the squared gradients. +- Uses this average to scale the learning rate. +- Parameters with large gradients have their learning rates reduced. + +**Advantages:** +- Effective for non-convex problems. +- Reduces oscillations in parameter updates. + +**Disadvantages:** +- Requires tuning of the decay rate. + +**Python Implementation:** +```python +def rmsprop(X, y, lr=0.01, epochs=1000, beta=0.9, epsilon=1e-8): + m, n = X.shape + theta = np.zeros(n) + E_g = np.zeros(n) + for epoch in range(epochs): + gradient = np.dot(X.T, (np.dot(X, theta) - y)) / m + E_g = beta * E_g + (1 - beta) * gradient**2 + adjusted_lr = lr / (np.sqrt(E_g) + epsilon) + theta -= adjusted_lr * gradient + return theta +``` + +### 8. Adam + +**Explanation:** +Adam (Adaptive Moment Estimation) combines the advantages of both RMSprop and AdaGrad by keeping an exponentially decaying average of past gradients and past squared gradients. + +**Mathematical Formulation:** + +- $$m_t = β_1m_{t-1} + (1 - β_1)(∂J(θ)/∂θ)$$ +- $$v_t = β_2v_{t-1} + (1 - β_2)(∂J(θ)/∂θ)^2$$ +- $$\hat{m}_t = \frac{m_t}{1 - β_1^t}$$ +- $$\hat{v}_t = \frac{v_t}{1 - β_2^t}$$ +- $$θ = θ - \frac{α\hat{m}_t}{\sqrt{\hat{v}_t} + ε}$$ + +Where: +- \( mt \) is the first moment (mean) of the gradient. +- \( vt \) is the second moment (uncentered variance) of the gradient. +- β_1.β_2 are the decay rates for the moment estimates. + +**Intuition:** +- Keeps track of both the mean and the variance of the gradients. +- Uses these to adaptively scale the learning rate. +- Provides a balance between AdaGrad and RMSprop. + +**Advantages:** +- Efficient for large datasets. +- Well-suited for non-convex optimization. +- Handles sparse gradients well. + +**Disadvantages:** +- Requires careful tuning of hyperparameters. +- Can be computationally intensive. + +**Python Implementation:** +```python +def adam(X, y, lr=0.01, epochs=1000, beta1=0.9, beta2=0.999, epsilon=1e-8): + m, n = X.shape + theta = np.zeros(n) + m_t = np.zeros(n) + v_t = np.zeros(n) + for epoch in range(1, epochs+1): + gradient = np.dot(X.T, (np.dot(X, theta) - y)) / m + m_t = beta1 * m_t + (1 - beta1) * gradient + v_t = beta2 * v_t + (1 - beta2) * gradient**2 + m_t_hat = m_t / (1 - beta1**epoch) + v_t_hat = v_t / (1 - beta2**epoch) + theta -= lr * m_t_hat / (np.sqrt(v_t_hat) + epsilon) + return theta +``` + +These implementations are basic examples of how these optimizers can be implemented in Python using NumPy. In practice, libraries like TensorFlow and PyTorch provide highly optimized and more sophisticated implementations of these and other optimization algorithms. + +--- diff --git a/contrib/machine-learning/xgboost.md b/contrib/machine-learning/xgboost.md new file mode 100644 index 00000000..1eb7f09a --- /dev/null +++ b/contrib/machine-learning/xgboost.md @@ -0,0 +1,92 @@ +# XGBoost +XGBoost is an implementation of gradient boosted decision trees designed for speed and performance. + +## Introduction to Gradient Boosting +Gradient boosting is a powerful technique for building predictive models that has seen widespread success in various applications. +- **Boosting Concept**: Boosting originated from the idea of modifying weak learners to improve their predictive capability. +- **AdaBoost**: The first successful boosting algorithm was Adaptive Boosting (AdaBoost), which utilizes decision stumps as weak learners. +- **Gradient Boosting Machines (GBM)**: AdaBoost and related algorithms were later reformulated as Gradient Boosting Machines, casting boosting as a numerical optimization problem. +- **Algorithm Elements**: + - _Loss function_: Determines the objective to minimize (e.g., cross-entropy for classification, mean squared error for regression). + - _Weak learner_: Typically, decision trees are used as weak learners. + - _Additive model_: New weak learners are added iteratively to minimize the loss function, correcting the errors of previous models. + +## Introduction to XGBoost +- eXtreme Gradient Boosting (XBGoost): a more **regularized form** of Gradient Boosting, as it uses **advanced regularization (L1&L2)**, improving the model’s **generalization capabilities.** +- It’s suitable when there is **a large number of training samples and a small number of features**; or when there is **a mixture of categorical and numerical features**. +- **Development**: Created by Tianqi Chen, XGBoost is designed for computational speed and model performance. +- **Key Features**: + - _Speed_: Achieved through careful engineering, including parallelization of tree construction, distributed computing, and cache optimization. + - _Support for Variations_: XGBoost supports various techniques and optimizations. + - _Out-of-Core Computing_: Can handle very large datasets that don't fit into memory. +- **Advantages**: + - _Sparse Optimization_: Suitable for datasets with many zero values. + - _Regularization_: Implements advanced regularization techniques (L1 and L2), enhancing generalization capabilities. + - _Parallel Training_: Utilizes all CPU cores during training for faster processing. + - _Multiple Loss Functions_: Supports different loss functions based on the problem type. + - _Bagging and Early Stopping_: Additional techniques for improving performance and efficiency. +- **Pre-Sorted Decision Tree Algorithm**: + 1. Features are pre-sorted by their values. + 2. Traversing segmentation points involves finding the best split point on a feature with a cost of O(#data). + 3. Data is split into left and right child nodes after finding the split point. + 4. Pre-sorting allows for accurate split point determination. + - **Limitations**: + 1. Iterative Traversal: Each iteration requires traversing the entire training data multiple times. + 2. Memory Consumption: Loading the entire training data into memory limits size, while not loading it leads to time-consuming read/write operations. + 3. Space Consumption: Pre-sorting consumes space, storing feature sorting results and split gain calculations. + XGBoosting: + ![image](assets/XG_1.webp) + +## Develop Your First XGBoost Model +This code uses the XGBoost library to train a model on the Iris dataset, splitting the data, setting hyperparameters, training the model, making predictions, and evaluating accuracy, achieving an accuracy score of X on the testing set. + +```python +# XGBoost with Iris Dataset +# Importing necessary libraries +import numpy as np +import xgboost as xgb +from sklearn.datasets import load_iris +from sklearn.model_selection import train_test_split +from sklearn.metrics import accuracy_score + +# Loading a sample dataset (Iris dataset) +data = load_iris() +X = data.data +y = data.target + +# Splitting the dataset into training and testing sets +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) + +# Converting the dataset into DMatrix format +dtrain = xgb.DMatrix(X_train, label=y_train) +dtest = xgb.DMatrix(X_test, label=y_test) + +# Setting hyperparameters for XGBoost +params = { + 'max_depth': 3, + 'eta': 0.1, + 'objective': 'multi:softmax', + 'num_class': 3 +} + +# Training the XGBoost model +num_round = 50 +model = xgb.train(params, dtrain, num_round) + +# Making predictions on the testing set +y_pred = model.predict(dtest) + +# Evaluating the model +accuracy = accuracy_score(y_test, y_pred) +print("Accuracy:", accuracy) +``` + +### Output + + Accuracy: 1.0 + +## **Conclusion** +XGBoost's focus on speed, performance, and scalability has made it one of the most widely used and powerful predictive modeling algorithms available. Its ability to handle large datasets efficiently, along with its advanced features and optimizations, makes it a valuable tool in machine learning and data science. + +## Reference +- [Machine Learning Prediction of Turning Precision Using Optimized XGBoost Model](https://www.mdpi.com/2076-3417/12/15/7739) diff --git a/contrib/mini-projects/Rock_Paper_Scissors_Game.md b/contrib/mini-projects/Rock_Paper_Scissors_Game.md new file mode 100644 index 00000000..36326b0d --- /dev/null +++ b/contrib/mini-projects/Rock_Paper_Scissors_Game.md @@ -0,0 +1,84 @@ +# Rock Paper Scissors Game + +This is a simple implementation of the classic rock-paper-scissors game in Python. + +## Code Explanation: + +In this section, we import the required libraries (`tkinter` for GUI and `random` for generating computer choices) and define two functions: + +- `determine_winner(user_choice, computer_choice)`: + - This function determines the winner of the game based on the choices made by the user and the computer. + - It returns a tuple containing the result of the game and the computer's choice. + +- `play_game()`: + - This function handles the gameplay logic. + - It gets the user's choice from the radio buttons, generates a random choice for the computer, determines the winner using the `determine_winner()` function, and updates the result and computer pick labels accordingly. + +### Imports and Function Definitions: +```python +import tkinter as tk +import random + +def determine_winner(user_choice, computer_choice): + """Determine the winner of the game.""" + if user_choice == computer_choice: + return "It's a tie!", computer_choice + elif (user_choice == "rock" and computer_choice == "scissors") or \ + (user_choice == "paper" and computer_choice == "rock") or \ + (user_choice == "scissors" and computer_choice == "paper"): + return "You win!", computer_choice + else: + return "Computer wins!", computer_choice + +def play_game(): + """Play the game and display the result.""" + user_choice = user_var.get() + computer_choice = random.choice(["rock", "paper", "scissors"]) + result, computer_pick = determine_winner(user_choice, computer_choice) + result_label.config(text=result) + computer_label.config(text=f"Computer picked: {computer_pick}") +``` +### GUI Setup: +```python +# Create main window +root = tk.Tk() +root.title("Rock Paper Scissors") + +# User choice options +user_var = tk.StringVar() +user_var.set("rock") # Default choice +choices = ["rock", "paper", "scissors"] +for choice in choices: + rb = tk.Radiobutton(root, text=choice, variable=user_var, value=choice) + rb.pack() +``` +- Here, we create the main window for the game using `tkinter.Tk()`. We set the title to "Rock Paper Scissors". +- We define a `StringVar` to store the user's choice and set the default choice to "rock". +- We create radio buttons for the user to choose from ("rock", "paper", "scissors") and pack them into the main window. +``` +``` +### Play Button and Result Labels: +```python +# Play button +play_button = tk.Button(root, text="Play", command=play_game) +play_button.pack() + +# Result label +result_label = tk.Label(root, text="", font=("Helvetica", 16)) +result_label.pack() + +# Computer pick label +computer_label = tk.Label(root, text="", font=("Helvetica", 12)) +computer_label.pack() +``` +- We create a "Play" button that triggers the `play_game()` function when clicked, using `tkinter.Button`. +- We create two labels to display the result of the game (`result_label`) and the computer's choice (`computer_label`). Both labels initially display no text and are packed into the main window. +``` +``` + +### Mainloop: +```python +root.mainloop() +``` +- Finally, we start the Tkinter event loop using `root.mainloop()`, which keeps the GUI window open and responsive until the user closes it. +- diff --git a/contrib/mini-projects/dice_roller.md b/contrib/mini-projects/dice_roller.md new file mode 100644 index 00000000..0f4e5f93 --- /dev/null +++ b/contrib/mini-projects/dice_roller.md @@ -0,0 +1,36 @@ +## Dice Roller + +The aim of this project is to replicate a dice and generate a random number from the numbers 1 to 6. + +For this first we will import the random library which will help make random choices. + +``` +import random +def dice(): + dice_no = random.choice([1,2,3,4,5,6]) + return "You got " + str(dice_no) +``` + +The above snippet of code defines a function called `dice()` which makes the random choice and returns the number that is generated. + +``` +def roll_dice(): + print("Hey Guys, you will now roll a single dice using Python!") + while True: + start=input("Type \'k\' to roll the dice: ").lower() + if start != 'k': + print("Invalid input. Please try again.") + continue + print(dice()) + roll_again = input("Do you want to reroll? (Yes/No): ").lower() + if roll_again != 'yes': + break + print("Thanks for rolling the dice.") +roll_dice() +``` + +The above code defines a function called `roll_dice()` which interacts with the user. + +It prompts the user to give an input and if the input is `k`,the code proceeds further to generate a random number or gives the message of invalid input and asks the user to try again. + +After the dice has been rolled once, the function asks the user whether they want a reroll in the form of a `yes` or `no` question. The dice is rolled again if the user gives `yes` as an answer and exits the code if the user replies with anything other than yes. diff --git a/contrib/mini-projects/hangman_game.md b/contrib/mini-projects/hangman_game.md new file mode 100644 index 00000000..d62db912 --- /dev/null +++ b/contrib/mini-projects/hangman_game.md @@ -0,0 +1,220 @@ +# Hangman - Movies Edition +The Hangman game script is a simple Python program designed to let players guess movie titles. It starts by importing the random module to select a movie from a predefined list. The game displays the movie title as underscores and reveals correctly guessed letters. Players have six attempts to guess the entire title, entering one letter at a time. The script checks if the input is valid, updates the list of guessed letters, and adjusts the number of attempts based on the correctness of the guess. The game continues until the player either guesses the title correctly or runs out of attempts. Upon completion, it congratulates the player for a correct guess or reveals the movie title if the attempts are exhausted. The main execution block ensures the game runs only when the script is executed directly.Below is first the code and then an explanation of the code and its components. + +## Code + +``` +import random + +def choose_movie(): + movies = ['avatar', 'titanic', 'inception', 'jurassicpark', 'thegodfather', 'forrestgump', 'interstellar', 'pulpfiction', 'shawshank'] + return random.choice(movies) + +def display_word(movie, guessed_letters): + display = "" + for letter in movie: + if letter in guessed_letters: + display += letter + " " + else: + display += "_ " + return display + +def hangman_movies(): + movie = choose_movie() + guessed_letters = [] + attempts = 6 + + print("Welcome to Hangman - Movies Edition!") + print("Try to guess the name of the movie. You have 6 attempts.") + + while attempts > 0: + print("\n" + display_word(movie, guessed_letters)) + guess = input("Guess a letter: ").lower() + + if len(guess) != 1 or not guess.isalpha(): + print("Please enter a single letter.") + continue + + if guess in guessed_letters: + print("You've already guessed that letter.") + continue + + guessed_letters.append(guess) + + if guess not in movie: + attempts -= 1 + print(f"Sorry, '{guess}' is not in the movie name. You have {attempts} attempts left.") + else: + print(f"Good guess! '{guess}' is in the movie name.") + + if "_" not in display_word(movie, guessed_letters): + print(f"\nCongratulations! You guessed the movie '{movie.capitalize()}' correctly!") + break + + if attempts == 0: + print(f"\nSorry, you ran out of attempts. The movie was '{movie.capitalize()}'.") + +if __name__ == "__main__": + hangman_movies() +``` + +## Code Explanation + +### Importing the Random Module + +```python + +import random + +``` + +The `random` module is imported to use the `choice` function, which will help in selecting a random movie from a predefined list. + +### Choosing a Movie + +```python + +def choose_movie(): + +movies = ['avatar', 'titanic', 'inception', 'jurassicpark', 'thegodfather', 'forrestgump', 'interstellar', 'pulpfiction', 'shawshank'] + +return random.choice(movies) + +``` + +The `choose_movie` function returns a random movie title from the `movies` list. + +### Displaying the Word + +```python + +def display_word(movie, guessed_letters): + +display = "" + +for letter in movie: + +if letter in guessed_letters: + +display += letter + " " + +else: + +display += "_ " + +return display + +``` + +The `display_word` function takes the movie title and a list of guessed letters as arguments. It constructs a string where correctly guessed letters are shown in their positions, and unknown letters are represented by underscores (`_`). + +### Hangman Game Logic + +```python + +def hangman_movies(): + +movie = choose_movie() + +guessed_letters = [] + +attempts = 6 + +print("Welcome to Hangman - Movies Edition!") + +print("Try to guess the name of the movie. You have 6 attempts.") + +while attempts > 0: + +print("\n" + display_word(movie, guessed_letters)) + +guess = input("Guess a letter: ").lower() + +if len(guess) != 1 or not guess.isalpha(): + +print("Please enter a single letter.") + +continue + +if guess in guessed_letters: + +print("You've already guessed that letter.") + +continue + +guessed_letters.append(guess) + +if guess not in movie: + +attempts -= 1 + +print(f"Sorry, '{guess}' is not in the movie name. You have {attempts} attempts left.") + +else: + +print(f"Good guess! '{guess}' is in the movie name.") + +if "_" not in display_word(movie, guessed_letters): + +print(f"\nCongratulations! You guessed the movie '{movie.capitalize()}' correctly!") + +break + +if attempts == 0: + +print(f"\nSorry, you ran out of attempts. The movie was '{movie.capitalize()}'.") + +``` + +The `hangman_movies` function manages the game's flow: + +1. It selects a random movie title using `choose_movie`. + +2. Initializes an empty list `guessed_letters` and sets the number of attempts to 6. + +3. Prints a welcome message and the initial game state. + +4. Enters a loop that continues until the player runs out of attempts or guesses the movie title. + +5. Displays the current state of the movie title with guessed letters revealed. + +6. Prompts the player to guess a letter. + +7. Validates the player's input: + +- Ensures it is a single alphabetic character. + +- Checks if the letter has already been guessed. + +8. Adds the guessed letter to `guessed_letters`. + +9. Updates the number of attempts if the guessed letter is not in the movie title. + +10. Congratulates the player if they guess the movie correctly. + +11. Informs the player of the correct movie title if they run out of attempts. + +### Main Execution Block + +```python + +if __name__ == "__main__": + +hangman_movies() + +``` +## Conclusion +This block ensures that the game runs only when the script is executed directly, not when it is imported as a module. + +## Output Screenshots: + +![image](https://github.com/Aditi22Bansal/learn-python/assets/142652964/a7af1f7e-c80e-4f83-b1f7-c7c5c72158b4) +![image](https://github.com/Aditi22Bansal/learn-python/assets/142652964/082e54dc-ce68-48fd-85da-3252d7629df8) + + + +## Conclusion + +This script provides a simple yet entertaining Hangman game focused on guessing movie titles. It demonstrates the use of functions, loops, conditionals, and user input handling in Python. + + diff --git a/contrib/mini-projects/index.md b/contrib/mini-projects/index.md new file mode 100644 index 00000000..d7a22172 --- /dev/null +++ b/contrib/mini-projects/index.md @@ -0,0 +1,8 @@ +# List of sections + +- [Dice Roller](dice_roller.md) +- [Rock Paper Scissors Game](Rock_Paper_Scissors_Game.md) +- [Password strength checker](password_strength_checker.md) +- [Path Finder](path-finder.md) +- [Hangman Game Based on Movies](hangman_game.md) +- [Tic-tac-toe](tic-tac-toe.md) diff --git a/contrib/mini-projects/password_strength_checker.md b/contrib/mini-projects/password_strength_checker.md new file mode 100644 index 00000000..ca65b8eb --- /dev/null +++ b/contrib/mini-projects/password_strength_checker.md @@ -0,0 +1,100 @@ +# about password strength + +> This code is a simple password strength checker. +It evaluates the strength of a user's password based on the presence of +uppercase letters, lowercase letters, digits, spaces, and special characters. + +### About the code: + +- The codebase is break down in two file `password_strength_checker.py` and `main.py`. + +`password_strength_checker.py` The function evaluates password strength based on character types (uppercase, lowercase, digits, spaces, special characters) and provides feedback on its security. +and `main.py` contains basic code. + +``` +import string + + +class password_checker: + def __init__(self, password): + self.password = password + + def check_password_strength(self): + """This function prompts the user to enter a password and then evaluates its strength.""" + + password_strength = 0 + upper_count = 0 + lower_count = 0 + num_count = 0 + space_count = 0 + specialcharacter_count = 0 + review = "" + + for char in list(password): + if char in string.ascii_uppercase: + upper_count += 1 + elif char in string.ascii_lowercase: + lower_count += 1 + elif char in string.digits: + num_count += 1 + elif char == " ": + space_count += 1 + else: + specialcharacter_count += 1 + + if upper_count >= 1: + password_strength += 1 + if lower_count >= 1: + password_strength += 1 + if num_count >= 1: + password_strength += 1 + if space_count >= 1: + password_strength += 1 + if specialcharacter_count >= 1: + password_strength += 1 + + if password_strength == 1: + review = "That's a very easy password, Not good for use" + elif password_strength == 2: + review = ( + "That's a weak password, You should change it to some strong password." + ) + elif password_strength == 3: + review = "Your password is just okay, you may change it." + elif password_strength == 4: + review = "Your password is hard to guess." + elif password_strength == 5: + review = "Its the strong password, No one can guess this password " + + about_password = { + "uppercase_letters ": upper_count, + "lowercase_letters": lower_count, + "space_count": space_count, + "specialcharacter_count": specialcharacter_count, + "password_strength": password_strength, + "about_password_strength": review, + } + print(about_password) + + def check_password(): + """This function prompts the user to decide if they want to check their password strength.""" + + choice = input("Do you want to check your password's strength? (Y/N): ") + if choice.upper() == "Y": + return True + elif choice.upper() == "N": + return False + else: + print("Invalid input. Please enter 'Y' for Yes or 'N' for No.") + return password_checker.check_password() + +``` +### Here's the implementation of 'main.py' +``` +import password_checker from password_strength_checker + +while password_checker.check_password(): + password = input("Enter your password: ") + p = password_checker(password) + p.check_password_strength() +``` \ No newline at end of file diff --git a/contrib/mini-projects/path-finder.md b/contrib/mini-projects/path-finder.md new file mode 100644 index 00000000..98e9b87d --- /dev/null +++ b/contrib/mini-projects/path-finder.md @@ -0,0 +1,120 @@ +# Path Finder +This Python script uses the curses library to visualize the process of finding a path through a maze in real-time within a terminal window. The program represents the maze as a list of lists, where each list represents a row in the maze, and each string element in the lists represents a cell in the maze. The maze includes walls (#), a start point (O), and an end point (X), with empty spaces ( ) that can be traversed. +## The script includes the following main components: +- Visualization Functions:
+ print_maze(maze, stdscr, path=[]): This function is used to display the maze in the terminal. It utilizes color pairs to distinguish between the maze walls, the path, and unexplored spaces. The current path being explored is displayed with a different color to make it stand out. + +- Utility Functions:
+ find_start(maze, start): This function searches the maze for the starting point (marked as O) and returns its position as a tuple (row, col).
+ find_neighbors(maze, row, col): This function identifies the valid adjacent cells (up, down, left, right) that can be moved to from the current position, + ignoring any walls or out-of-bound positions. + +- Pathfinding Logic:
+ find_path(maze, stdscr): This function implements a Breadth-First Search (BFS) algorithm to find a path from the start point to the end point (X). It uses a + queue to explore each possible path sequentially. As it explores the maze, it updates the display in real-time, allowing the viewer to follow the progress + visually. Each visited position is marked and not revisited, ensuring the algorithm efficiently covers all possible paths without repetition. + +Overall, the script demonstrates an effective use of the curses library to create a dynamic visual representation of the BFS algorithm solving a maze, providing both an educational tool for understanding pathfinding and an example of real-time data visualization in a terminal. + +#### Below is the code of the path finder + + +```python +import curses +from curses import wrapper +import queue +import time + +# Define the structure of the maze as a list of lists where each inner list represents a row. +maze = [ + ["#", "O", "#", "#", "#", "#", "#", "#", "#"], + ["#", " ", " ", " ", " ", " ", " ", " ", "#"], + ["#", " ", "#", "#", " ", "#", "#", " ", "#"], + ["#", " ", "#", " ", " ", " ", "#", " ", "#"], + ["#", " ", "#", " ", "#", " ", "#", " ", "#"], + ["#", " ", "#", " ", "#", " ", "#", " ", "#"], + ["#", " ", "#", " ", "#", " ", "#", "#", "#"], + ["#", " ", " ", " ", " ", " ", " ", " ", "#"], + ["#", "#", "#", "#", "#", "#", "#", "X", "#"] +] + +# Function to print the current state of the maze in the terminal. +def print_maze(maze, stdscr, path=[]): + BLUE = curses.color_pair(1) # Color pair for walls and free paths + RED = curses.color_pair(2) # Color pair for the current path + + for i, row in enumerate(maze): + for j, value in enumerate(row): + if (i, j) in path: + stdscr.addstr(i, j*2, "X", RED) # Print path character with red color + else: + stdscr.addstr(i, j*2, value, BLUE) # Print walls and free paths with blue color + +# Function to locate the starting point (marked 'O') in the maze. +def find_start(maze, start): + for i, row in enumerate(maze): + for j, value in enumerate(row): + if value == start: + return i, j + return None + +# Function to find a path from start ('O') to end ('X') using BFS. +def find_path(maze, stdscr): + start = "O" + end = "X" + start_pos = find_start(maze, start) # Get the start position + + q = queue.Queue() + q.put((start_pos, [start_pos])) # Initialize the queue with the start position + + visited = set() # Set to keep track of visited positions + + while not q.empty(): + current_pos, path = q.get() # Get the current position and path + row, col = current_pos + + stdscr.clear() # Clear the screen + print_maze(maze, stdscr, path) # Print the current state of the maze + time.sleep(0.2) # Delay for visibility + stdscr.refresh() # Refresh the screen + + if maze[row][col] == end: # Check if the current position is the end + return path # Return the path if end is reached + + # Get neighbors (up, down, left, right) that are not walls + neighbors = find_neighbors(maze, row, col) + for neighbor in neighbors: + if neighbor not in visited: + r, c = neighbor + if maze[r][c] != "#": + new_path = path + [neighbor] + q.put((neighbor, new_path)) + visited.add(neighbor) + +# Function to find the valid neighboring cells (not walls or out of bounds). +def find_neighbors(maze, row, col): + neighbors = [] + if row > 0: # UP + neighbors.append((row - 1, col)) + if row + 1 < len(maze): # DOWN + neighbors.append((row + 1, col)) + if col > 0: # LEFT + neighbors.append((row, col - 1)) + if col + 1 < len(maze[0]): # RIGHT + neighbors.append((row, col + 1)) + return neighbors + +# Main function to setup curses and run the pathfinding algorithm. +def main(stdscr): + curses.init_pair(1, curses.COLOR_BLUE, curses.COLOR_BLACK) # Initialize color pair for blue + curses.init_pair(2, curses.COLOR_RED, curses.COLOR_BLACK) # Initialize color pair for red + + find_path(maze, stdscr) # Find the path using BFS + stdscr.getch() # Wait for a key press before exiting + +wrapper(main) # Use the wrapper to initialize and finalize curses automatically. + +``` + + + diff --git a/contrib/mini-projects/tic-tac-toe.md b/contrib/mini-projects/tic-tac-toe.md new file mode 100644 index 00000000..8589e64c --- /dev/null +++ b/contrib/mini-projects/tic-tac-toe.md @@ -0,0 +1,91 @@ +# Python Code For The Tic Tac Toe Game +# Tic Tac Toe Game + +## Overview + +### Objective +- Get three of your symbols (X or O) in a row (horizontally, vertically, or diagonally) on a 3x3 grid. + +### Gameplay +- Two players take turns. +- Player 1 uses X, Player 2 uses O. +- Players mark an empty square in each turn. + +### Winning +- The first player to align three of their symbols wins. +- If all squares are filled without any player aligning three symbols, the game is a draw. + +```python +print("this game should be played by two people player1 takes x player2 takes o") +board = [['1','2','3'],['4','5','6'],['7','8','9']] +x = 'X' +o = 'O' +def displayBoard(): + print(f" {board[0][0]} | {board[0][1]} | {board[0][2]}") + print("----------------------------------------") + print(f" {board[1][0]} | {board[1][1]} | {board[1][2]}") + print("----------------------------------------") + print(f" {board[2][0]} | {board[2][1]} | {board[2][2]}") + print("----------------------------------------") +def updateBoard(character,position): + row = (position-1)//3 + column = (position-1)%3 + board[row][column] = character +def check_win(): + for i in range(3): + if board[i][0] == board[i][1] == board[i][2]: + return 1 + elif board[0][i] == board[1][i] == board[2][i]: + return 1 + if board[0][2] == board[1][1] == board[2][0]: + return 1 + elif board[0][0] == board[1][1] == board[2][2]: + return 1 + return 0 +def check_position(position): + row = (position-1)//3 + column = (position-1)%3 + if board[row][column] == x or board[row][column] == o: + return 0 + return 1 +print("==============================welcome to tic tac toe game =====================") +counter = 0 +while 1: + if counter % 2 == 0: + displayBoard() + while 1: + choice = int(input(f"player{(counter%2)+1},enter your position('{x}');")) + if choice < 1 or choice > 9: + print("invalid input oplease try againn") + if check_position(choice): + updateBoard(x,choice) + if check_win(): + print(f"Congratulations !!!!!!!!!!!Player {(counter % 2)+1} won !!!!!!!!!!") + exit(0) + else : + counter += 1 + break + else: + print(f"position{choice} is already occupied.Choose another position") + if counter == 9: + print("the match ended with draw better luck next time") + exit(0) + else: + displayBoard() + while 1: + choice = int(input(f"player{(counter%2)+1},enter your position('{o}'):")) + if choice < 1 or choice > 9: + print("invalid input please try again") + if check_position(choice): + updateBoard(o,choice) + if check_win(): + print(f"congratulations !!!!!!!!!!!!!!! player{(counter%2)+1} won !!!!!!!!!!!!!1") + exit(0) + else: + counter += 1 + break + else: + print(f"position {choice} is already occupied.choose another position") + print() +``` + diff --git a/contrib/numpy/array-iteration.md b/contrib/numpy/array-iteration.md new file mode 100644 index 00000000..b0a499f5 --- /dev/null +++ b/contrib/numpy/array-iteration.md @@ -0,0 +1,120 @@ +# NumPy Array Iteration + +Iterating over arrays in NumPy is a common task when processing data. NumPy provides several ways to iterate over elements of an array efficiently. +Understanding these methods is crucial for performing operations on array elements effectively. + +## 1. Basic Iteration + +- Iterating using basic `for` loop. + +### Single-dimensional array + +Iterating over a single-dimensional array is straightforward using a basic `for` loop + +```python +import numpy as np + +arr = np.array([1, 2, 3, 4, 5]) +for i in arr: + print(i) +``` + +#### Output + +```python +1 +2 +3 +4 +5 +``` + +### Multi-dimensional array + +Iterating over multi-dimensional arrays, each iteration returns a sub-array along the first axis. + +```python +marr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) + +for arr in marr: + print(arr) +``` + +#### Output + +```python +[1 2 3] +[4 5 6] +[7 8 9] +``` + +## 2. Iterating with `nditer` + +- `nditer` is a powerful iterator provided by NumPy for iterating over multi-dimensional arrays. +- In each interation it gives each element. + +```python +import numpy as np + +arr = np.array([[1, 2, 3], [4, 5, 6]]) +for i in np.nditer(arr): + print(i) +``` + +#### Output + +```python +1 +2 +3 +4 +5 +6 +``` + +## 3. Iterating with `ndenumerate` + +- `ndenumerate` allows you to iterate with both the index and the value of each element. +- It gives index and value as output in each iteration + +```python +import numpy as np + +arr = np.array([[1, 2], [3, 4]]) +for index,value in np.ndenumerate(arr): + print(index,value) +``` + +#### Output + +```python +(0, 0) 1 +(0, 1) 2 +(1, 0) 3 +(1, 1) 4 +``` + +## 4. Iterating with flat + +- The `flat` attribute returns a 1-D iterator over the array. + +```python +import numpy as np + +arr = np.array([[1, 2], [3, 4]]) +for element in arr.flat: + print(element) +``` + +#### Output + +```python +1 +2 +3 +4 +``` + +Understanding the various ways to iterate over NumPy arrays can significantly enhance your data processing efficiency. + +Whether you are working with single-dimensional or multi-dimensional arrays, NumPy provides versatile tools to iterate and manipulate array elements effectively. diff --git a/contrib/numpy/basic_math.md b/contrib/numpy/basic_math.md new file mode 100644 index 00000000..d19f2f88 --- /dev/null +++ b/contrib/numpy/basic_math.md @@ -0,0 +1,371 @@ +# Basic Mathematics +## What is a Matrix? +A matrix is a collection of numbers ordered in rows and columns. Here is one. + + + + + + + + + + + + + + + + + + +
123
456
789
+ + + +A matrix is generally written within square brackets[]. The dimensions of a matrix is represented by (Number of rows x Number of columns).The dimensions of the above matrix is 3x3. + +Matrices are the main characters in mathematical operations like addition, subtraction etc, especially those used in Pandas and NumPy. They can contain only numbers, symbols or expressions. + +In order to refer to a particular element in the matrix we denote it by : +Aij + +where i represents the ith row and j represents the jth column of the matrix. + +## Scalars and Vectors +### Scalars +There exists specific cases of matrices which are also widely used. + +A matrix with only one row and column i.e. containing only one element is commonly referred to as a scalar. + +The numbers ```[12] ; [-5] ; [0] ; [3.14]``` all represent scalars. Scalars have 0 dimensions. + +### Vectors +Vectors are objects with 1 dimension. They sit somewhere between scalars and matrices. They can also be referred to as one dimensional matrices. + +```[1 3 5]``` represents a vector with dimension 1x3. + +A vector is the simplest linear algebraic object.A matrix can be refered to as a collection of vectors + +Vectors are broadly classified into 2 types: +- Row Vectors: Is of the form 1 x n where n refers to the number of columns the vector has. +- Column Vectors: Is of the form m x 1 where m refers to the number of rows the vector has. + +m or n are also called as the length of the column and row vector respectively. + +## Arrays in Python + +To understand arrays, first let us start by declaring scalars, vectors and matrices in Python. + +First we need to import numpy. We do so by importing it as 'np' as it provides better readability, namespace clarity and also aligns with the community guidelines. + +```python +import numpy as np +``` +Next up, we declare a scalar s as, +``` +s = 5 +``` + + + +Now we declare a vector, +```python +v = np.array([5,-2,4]) +``` +On printing v we get the following output, +```python +array([5,-2,4]) +``` +By default, a vector is declared as a **'row vector'**. + +Finally, we declare matrices, +```python +m=np.array([[5,12,6],[-3,0,14]]) +``` +On printing m we get, +```python +array([[5,12,6], + [-3,0,14]]) +``` +> The type() function is used to return the data type of a given variable. + +* The type(s) will return **'int'**. + +* The type(v) will return **'numpy.ndarray'** which represents a **n-dimensional array**, since it is a 1 dimensional array. + + * The type(m) will also return **'numpy.ndarray'** since it is a 2-dimensional array. + +These are some ways in which arrays are useful in python. + +> The shape() function is used to return the shape of a given variable. + +* m.shape() returns (2,3) since we are dealing with a (2,3) matrix. + +* v.shape() returns(3,) indicates it has only one dimensional or that it stores 3 elements in order. + +* However, 'int' objects do not have shape and therefore s.shape() gives an error. + +## What is a Tensor? +A Tensor can be thought of as a collection of matrices. It has dimensions k x m x n. + +**NOTE:** Scalars, vectors and matrices are also tensors of rank 0,1,2 respectively. + +Tensors can be stored in ndarrays. + +Let's create a tensor with 2 matrices, +```python +m1=np.array([[5,12,6],[-3,0,14]]) +m2=np.array([[2,1,8],[-6,2,0]]) +t=np.array([m1,m2]) +``` +Upon printing t we get, +```python +array([[[5,12,6], + [-3,0,14]], + + [[2,1,8], + [-6,2,0]]]) +``` +If we check it's shape, we see that is is a **(2,2,3)** object. + +If we want to manually create a tensor we write, +```python +t=np.array([[[5,12,6], [-3,0,14]],[[2,1,8], [-6,2,0]]]) +``` + ## Addition and Subtraction in Matrices + + ### Addition + For 2 matrices to be added to one another they must have **same dimensions**. + + If we have 2 matrices say, + +```python +A=np.array([[5,12,6],[-3,0,14]]) +B=np.array([[2,1,8],[-6,2,0]]) +C= A+B + ``` +The element at position Aij gets added to the element at position Bij. It's that simple! +The above input will give the resultant C as: +```python +array([[7,13,14], + [-9,2,14]]) +``` +### Subtraction + +As we know, subtraction is a type of addition, the same rules apply here. + + If we have 2 matrices say, + +```python +A=np.array([[5,12,6],[-3,0,14]]) +B=np.array([[2,1,8],[-6,2,0]]) +C= A-B + ``` + +The element at position Bij gets subtracted from the element at position Aij. +The above input will give the resultant C as: +```python +array([[3,11,-2], + [3,-2,14]]) +``` +Similarly the same operations can be done with **floating point numbers** as well. + +In a similar fashion, we can add or subtract vectors as well with the condition that they must be of the **same length**. +```python +A=np.array([1,2,3,4,5]) +B=np.array([6,7,8,9,10]) +C= A+B + ``` +The result is a vector of length 5 with C as, +```python +array([7,9,11,13,15]) +``` + ### Addition of scalars with vectors & matrices + + Scalars show unique behaviour when added to matrices or vectors. + + To demonstrate their behaviour, let's use an example, + Let's declare a matrix, + +```python +A=np.array([[5,12,6],[-3,0,14]]) +A+1 +``` +We see that if we perform the above function, i.e. add scalar [1] to the matrix A we get the output, +```python +array([[6,13,7],[-2,1,15]]) +``` +We see that the scalar is added to the matrix elementwise, i.e. each element gets incremented by 1. + +**The same applies to vectors as well.** + +Mathematically, it is not allowed as the shape of scalars are different from vectors or matrices but while programming in Python it works. + +## Transpose of Matrices & Vectors +### Transposing Vectors + +If X is the vector, then the transpose of the vector is represented as XT. It changes a vector of dimension n x 1 into a vector of dimension 1 x n, i.e. a row vector to a column vector and vice versa. + +> * The values are not changing or transforming ; only their position is. +> * Transposing the same vector (object) twice yields the initial vector (object). + +```python +x=np.array([1,2,3)) +``` +Transposing this in python using ```x.T``` will give +```python +array([1,2,3)) +``` +which is the same vector as the one taken as input. + +> 1-Dimensional arrays don't really get transposed (in the memory of the computer) + +To transpose a vector, we need to reshape it first. +```python +x_new= x.reshape(1,3) +x_new.T +``` +will now result in the vector getting transposed, +```python +array([[1], + [2], + [3]]) +``` + +### Transposing Matrices + +If M is a matrix, then the transpose of the matrix M is represented as MT. When transposed, a m x n matrix becomes a n x m matrix. + +The element Mij of the initial matrix becomes the Nji where N is the transposed matrix of M. + +Let's understand this further with the help of of an example, +```python +A = np.array([[1,5,-6],[8,-2,0]]) +``` +The output for the above code snippet will be, +```python +array([[1,5,-6], + [8,-2,0]]) +``` +> **array.T** returns the transpose of an array (matrix). + +```python +A.T +``` +will give the output as, +```python +array([[1,8], + [5,-2], + [-6,0]]) +``` + +Hope the following examples have cleared your concept on transposing. + +## Dot Product + +> **np.dot()** returns the dot product of two objects +> Dot product is represented by ( * ), for example, x(dot)y = x * y +> +### Scalar * Scalar +Let's start with scalar multiplication first. + +``` [6] * [5] = [30] + [10] * [-2] = [-20] +``` +It is the same multiplication that we are familiar with since learnt as kids. + Therefore, ```np.dot([6]*[5])``` returns ```30```. + +### Vector * Vector +To multiply vectors with one another, they must be of **same length**. + +Now let's understand this with an example, +```python +x = np.array([2,8,-4]) +y = np.array([1,-7,3]) +``` + +The dot product returns the elementwise product of the vector i.e. +x * y = ( x1 * y1 ) + ( x2 * y2 ) + ( x3 * y3 ) in the above example. + +Therefore, ```np.dot(x,y)``` gives ```[-66]``` as the input. + +We observe that **dot product of 2 vectors returns a scalar**. + +### Scalar * Vector + +When we multiply a scalar with a vector, we observe that each element of the vector gets multiplied to the scalar individually. + +A scalar k when multiplied to a vector v([x1,x2,x3]) gives the product = [(k * x1) + (k * x2) + (k * x3)] + +An example would bring further clarity, +```python +y = np.array([1,-7,3]) +y*5 +``` +will give the following output +```python +array[(5,-35,15)] +``` + +We observe that **dot product of 2 vectors returns a scalar**. + +We observe that **dot product of a vector and a scalar returns a vector**. + +## Dot Product of Matrices + +### Scalar * Matrix + Dot product of a scalar with a matrix works similar to dot product of a vector with a scalar. +Now, we come to a very important concept which will be very useful to us while working in Python. + +Each element of the vector gets multiplied to the scalar individually. + +```python +A = np.array([[1,5,-6],[8,-2,0]]) +B = 3 * A +``` +will give the resultant B as +```python +array([[3,15,-18], + [24,-6,0]]) +``` +Thus each element gets multiplied by 3. +> NOTE: The dot product of a scalar and a matrix gives a matrix of the same shape as the input matrix. + +### Matrix * Matrix + A matrix can be multipied to a matrix. However it has certain compatibility measures, + * We can only multiply an m x n matrix with an n x k matrix + * Basically the 2nd dimension of the first matrix has to match the 1st dimension of the 2nd matrix. +> The output of a m x n matrix with a n x k matrix gives a **m x k** matrix. + +**Whenever we have a dot product of 2 matrices, we multiply row vectors within one matrix to the column vector of 2nd matrix.** + +For example, let's use multiply a row vector to a column vector to understand it further. + +``` + ([[1] + ([2 8 4]) * [2] = [(2*1) + (8*2) + (4*3)] = [30] + [3]]) +``` +Now, let's multiply a 2 x 3 matrix with a 3 x 2 matrix. +``` + ([[A1,A2,A3], * ([[B1,B2] ([[(A1 * B1 + A2 * B3 + A3 * B5) , (A1 * B2 + A2 * B4 + A3 * B6)] + [A4,A5,A6]]) [B3,B4], = [ (A4 * B1 + A5 * B3 + A6 * B5) , (A4 * B2 + A5 * B4 + A6 * B6)]]) + [B5,B6]]) +``` +Thus we obtain a 2 x 2 matrix. + +We use the np.dot() method to directly obtain the dot product of the 2 matrices. + +Now let's do an example using python just to solidify our knowledge. + +```python +A=np.array([[5,12,6],[-3,0,14]]) +B=np.array([[2,-1],[8,0],[3,0]]) +np.dot(A,B) +``` +The output we obtain is, +```python +array[[124,-5], + [36, 3]]) +``` diff --git a/contrib/numpy/concatenation-of-arrays.md b/contrib/numpy/concatenation-of-arrays.md new file mode 100644 index 00000000..bf27512d --- /dev/null +++ b/contrib/numpy/concatenation-of-arrays.md @@ -0,0 +1,223 @@ +# Concatenation of Arrays + +Concatenation of arrays in NumPy refers to combining multiple arrays into a single array, either along existing axes or by adding new axes. NumPy provides several functions for this purpose. + +# Functions of Concatenation + +## np.concatenate + +Joins two or more arrays along an existing axis. + +### Syntax + +```python +numpy.concatenate((arr1, arr2, ...), axis) +``` + +Args: +- arr1, arr2, ...: Sequence of arrays to concatenate. +- axis: Axis along which the arrays will be joined. Default is 0. + +### Example + +#### Concatenate along axis 0 + +```python +import numpy as np +#creating 2 arrays +arr1 = np.array([1 2 3],[7 8 9]) +arr2 = np.array([4 5 6],[10 11 12]) + +result_1 = np.concatenate((arr1, arr2), axis=0) +print(result_1) +``` + +#### Output +``` +[[ 1 2 3] + [ 7 8 9] + [ 4 5 6] + [10 11 12]] +``` + +#### Concatenate along axis 1 + +```python +result_2 = np.concatenate((arr1, arr2), axis=1) +print(result_2) +``` + +#### Output +``` +[[ 1 2 3 4 5 6 ] + [ 7 8 9 10 11 12]] +``` + +## np.vstack + +Vertical stacking of arrays (row-wise). + +### Syntax + +```python +numpy.vstack(arrays) +``` + +Args: +- arrays: Sequence of arrays to stack. + +### Example + +```python +import numpy as np +#create arrays +arr1= np.array([1 2 3], [7 8 9]) +arr2 = np.array([4 5 6],[10 11 12]) + +result = np.vstack((arr1, arr2)) +print(result) +``` + +#### Output +``` +[[ 1 2 3] + [ 7 8 9] + [ 4 5 6] + [10 11 12]] +``` + +## 3. np.hstack + +Stacks arrays horizontally (column-wise). + +### Syntax + +```python +numpy.hstack(arrays) +``` + +Args: +- arrays: Sequence of arrays to stack. + +### Example + +```python +import numpy as np +#create arrays +arr1= np.array([1 2 3], [7 8 9]) +arr2 = np.array([4 5 6],[10 11 12]) + +result = np.hstack((arr1, arr2)) +print(result) +``` + +#### Output +``` +[[ 1 2 3] [ 4 5 6] + [ 7 8 9] [10 11 12]] +``` + +## np.dstack + +Stacks arrays along the third axis (depth-wise). + +### Syntax + +```python +numpy.dstack(arrays) +``` + +- arrays: Sequence of arrays to stack. + +### Example + +```python +import numpy as np +#create arrays +arr1= np.array([1 2 3], [7 8 9]) +arr2 = np.array([4 5 6],[10 11 12]) + +result = np.dstack((arr1, arr2)) +print(result) +``` + +#### Output +``` +[[[ 1 4] + [ 2 5] + [ 3 6]] + + [[ 7 10] + [ 8 11] + [ 9 12]]] +``` + +## np.stack + +Joins a sequence of arrays along a new axis. + +```python +numpy.stack(arrays, axis) +``` + +Args: +- arrays: Sequence of arrays to stack. + +### Example + +```python +import numpy as np +#create arrays +arr1= np.array([1 2 3], [7 8 9]) +arr2 = np.array([4 5 6],[10 11 12]) + +result = np.stack((arr1, arr2), axis=0) +print(result) +``` + +#### Output +``` +[[[ 1 2 3] + [ 7 8 9]] + + [[ 4 5 6] + [10 11 12]]] +``` + +# Concatenation with Mixed Dimensions + +When concatenating arrays with different shapes, it's often necessary to reshape them to have compatible dimensions. + +## Example + +#### Concatenate along axis 0 + +```python +arr1 = np.array([[1, 2, 3], [4, 5, 6]]) +arr2 = np.array([7, 8, 9]) + +result_0= np.concatenate((arr1, arr2[np.newaxis, :]), axis=0) +print(result_0) +``` + +#### Output +``` +[[1 2 3] + [4 5 6] + [7 8 9]] +``` + +#### Concatenate along axis 1 + +```python +result_1 = np.concatenate((arr1, arr2[:, np.newaxis]), axis=1) +print(result_1) +``` + +#### Output +``` +[[1 2 3 7] + [4 5 6 8]] +``` + + diff --git a/contrib/numpy/datatypes.md b/contrib/numpy/datatypes.md new file mode 100644 index 00000000..fb9df47b --- /dev/null +++ b/contrib/numpy/datatypes.md @@ -0,0 +1,267 @@ +# Numpy Data Types +In NumPy, data types play a crcial role in representing and manipulating numerical data. + +Numpy supports the following data types: + +- `i` - integer +- `b` - boolean +- `u` - unsigned integer +- `f` - float +- `c` - complex float +- `m` - timedelta +- `M` - datetime +- `O` - object +- `S` - string +- `U` - unicode string + + +_Referred from: W3schools_ + +## dtype() Function +The `dtype()` function returns the type of the NumPy array object. + +Example 1 +``` python + import numpy as np + + arr = np.array([1, 2, 3, 4]) + + print(arr.dtype) + + # Output: int64 +``` + +Example 2 +``` python + import numpy as np + + arr = np.array(['apple', 'banana', 'cherry']) + + print(arr.dtype) + + # Output: + **dtype** : Data type of the resulting array. (By default is float)
+ **delimiter**: String or character separating columns. (By default is whitespace)
+ **converters**: Dictionary mapping column number to a function to convert that column's string to a float.
+ **skiprows**: Number of lines to skip at the beginning of the file.
+ **usecols**: Which columns to read starting from 0. + +- #### Example for `loadtxt`: + + **example.txt**
+ + ![image](https://github.com/Santhosh-Siddhardha/learn-python/assets/103999924/a0148d29-5fba-45fa-b3f4-058406b3016b) + + **Code**
+ ```python + import numpy as np + + arr = np.loadtxt("example.txt", dtype=int) + print(arr) + ``` + + **Output**
+ ```python + [1 2 3 4 5] + ``` + +
+ +### 2. numpy.genfromtxt(): +The `genfromtxt` function is similar to loadtxt but provides more flexibility. It handles missing values (such as NaNs), allows custom converters +for data parsing, and can handle different data types within the same file. It’s particularly useful for handling complex data formats. + +- #### Syntax: + ```python + numpy.genfromtxt(fname, dtype=float, delimiter=None, converters=None, missing_values=None, filling_values=None, usecols=None) + ``` + + **fname** : Name of the file
+ **dtype** : Data type of the resulting array. (By default is float)
+ **delimiter**: String or character separating columns; default is any whitespace.
+ **converters**: Dictionary mapping column number to a function to convert that column's string to a float.
+ **missing_values**: Set of strings corresponding to missing data.
+ **filling_values**: Value used to fill in missing data. Default is NaN.
+ **usecols**: Which columns to read starting from 0. + +- #### Example for `genfromtxt`: + + **example.txt**
+ + ![image](https://github.com/Santhosh-Siddhardha/learn-python/assets/103999924/3f9cdd91-4255-4e30-923d-f29c5f237798) + + + **Code**
+ ```python + import numpy as np + + arr = np.genfromtxt("example.txt", dtype='str', usecols=1) + print(arr) + ``` + + **Output**
+ ```python + ['Name' 'Kohli' 'Dhoni' 'Rohit'] + ``` + +
+ + +### 3. numpy.load(): +`load` method is used to load arrays saved in NumPy’s native binary format (.npy or .npz). These files preserve the array structure, data types, and metadata. +It’s an efficient way to store and load large arrays. + +- #### Syntax: + ```python + numpy.load(fname, mmap_mode=None, encoding='ASCII') + ``` + + **fname** : Name of the file
+ **mmap_mode** : Memory-map the file using the given mode (r, r+, w+, c)(By Default None).Memory-mapping only works with arrays stored in a binary file on disk, not with compressed archives like .npz.
+ **encoding**:Encoding is used when reading Python2 strings only. (By Default ASCII)
+ +- #### Example for `load`: + + **Code**
+ ```python + import numpy as np + + arr = np.array(['a','b','c']) + np.savez('example.npz', array=arr) # stores arr in data.npz in NumPy's native binary format + + data = np.load('example.npz') + print(data['array']) + ``` + + **Output**
+ ```python + ['a' 'b' 'c'] + ``` +
+ +These methods empower users to seamlessly integrate data into their scientific workflows, whether from text files or binary formats. diff --git a/contrib/numpy/operations-on-arrays.md b/contrib/numpy/operations-on-arrays.md new file mode 100644 index 00000000..e3966d27 --- /dev/null +++ b/contrib/numpy/operations-on-arrays.md @@ -0,0 +1,281 @@ +# Operations on Arrays + +## NumPy Arithmetic Operations + +NumPy offers a broad array of operations for arrays, including arithmetic functions. + +The arithmetic operations in NumPy are popular for their simplicity and efficiency in handling array calculations. + +**Addition** + +we can use the `+` operator to perform element-wise addition between two or more NumPy arrays. + +**Code** +```python +import numpy as np +array_1 = np.array([9, 10, 11, 12]) +array_2 = np.array([1, 3, 5, 7]) +result_1 = array_1 + array_2 +print("Utilizing the + operator:", result_1) +``` + +**Output:** +``` +Utilizing the + operator: [10 13 16 19] +``` + +**Subtraction** + +we can use the `-` operator to perform element-wise subtraction between two or more NumPy arrays. + +**Code** +```python +import numpy as np +array_1 = np.array([9, 10, 11, 12]) +array_2 = np.array([1, 3, 5, 7]) +result_1 = array_1 - array_2 +print("Utilizing the - operator:", result_1) +``` + +**Output:** +``` +Utilizing the - operator: [8 7 6 5] +``` + +**Multiplication** + +we can use the `*` operator to perform element-wise multiplication between two or more NumPy arrays. + +**Code** +```python +import numpy as np +array_1 = np.array([9, 10, 11, 12]) +array_2 = np.array([1, 3, 5, 7]) +result_1 = array_1 * array_2 +print("Utilizing the * operator:", result_1) +``` + +**Output:** +``` +Utilizing the * operator: [9 30 55 84] +``` + +**Division** + +we can use the `/` operator to perform element-wise division between two or more NumPy arrays. + +**Code** +```python +import numpy as np +array_1 = np.array([9, 10, 11, 12]) +array_2 = np.array([1, 3, 5, 7]) +result_1 = array_1 / array_2 +print("Utilizing the / operator:", result_1) +``` + +**Output:** +``` +Utilizing the / operator: [9. 3.33333333 2.2 1.71428571] +``` + +**Exponentiation** + +we can use the `**` operator to perform element-wise exponentiation between two or more NumPy arrays. + +**Code** +```python +import numpy as np +array_1 = np.array([9, 10, 11, 12]) +array_2 = np.array([1, 3, 5, 7]) +result_1 = array_1 ** array_2 +print("Utilizing the ** operator:", result_1) +``` + +**Output:** +``` +Utilizing the ** operator: [9 1000 161051 35831808] +``` + +**Modulus** + +We can use the `%` operator to perform element-wise modulus operations between two or more NumPy arrays. + +**Code** +```python +import numpy as np +array_1 = np.array([9, 10, 11, 12]) +array_2 = np.array([1, 3, 5, 7]) +result_1 = array_1 % array_2 +print("Utilizing the % operator:", result_1) +``` + +**Output:** +``` +Utilizing the % operator: [0 1 1 5] +``` + +
+ +## NumPy Comparision Operations + +
+ +NumPy provides various comparison operators that can compare elements across multiple NumPy arrays. + +**less than operator** + +The `<` operator returns `True` if the value of operand on left is less than the value of operand on right. + +**Code** +```python +import numpy as np +array_1 = np.array([12,15,20]) +array_2 = np.array([20,15,12]) +result_1 = array_1 < array_2 +print("array_1 < array_2:",result_1) +``` +**Output:** +``` +array_1 < array_2 : [True False False] +``` + +**less than or equal to operator** + +The `<=` operator returns `True` if the value of operand on left is lesser than or equal to the value of operand on right. + +**Code** +```python +import numpy as np +array_1 = np.array([12,15,20]) +array_2 = np.array([20,15,12]) +result_1 = array_1 <= array_2 +print("array_1 <= array_2:",result_1) +``` +**Output:** +``` +array_1 <= array_2: [True True False] +``` + +**greater than operator** + +The `>` operator returns `True` if the value of operand on left is greater than the value of operand on right. + +**Code** +```python +import numpy as np +array_1 = np.array([12,15,20]) +array_2 = np.array([20,15,12]) +result_2 = array_1 > array_2 +print("array_1 > array_2:",result_2) +``` +**Output:** +``` +array_1 > array_2 : [False False True] +``` + +**greater than or equal to operator** + +The `>=` operator returns `True` if the value of operand on left is greater than or equal to the value of operand on right. + +**Code** +```python +import numpy as np +array_1 = np.array([12,15,20]) +array_2 = np.array([20,15,12]) +result_2 = array_1 >= array_2 +print("array_1 >= array_2:",result_2) +``` +**Output:** +``` +array_1 >= array_2: [False True True] +``` + +**equal to operator** + +The `==` operator returns `True` if the value of operand on left is same as the value of operand on right. + +**Code** +```python +import numpy as np +array_1 = np.array([12,15,20]) +array_2 = np.array([20,15,12]) +result_3 = array_1 == array_2 +print("array_1 == array_2:",result_3) +``` +**Output:** +``` +array_1 == array_2: [False True False] +``` + +**not equal to operator** + +The `!=` operator returns `True` if the value of operand on left is not equal to the value of operand on right. + +**Code** +```python +import numpy as np +array_1 = np.array([12,15,20]) +array_2 = np.array([20,15,12]) +result_3 = array_1 != array_2 +print("array_1 != array_2:",result_3) +``` +**Output:** +``` +array_1 != array_2: [True False True] +``` + +
+ +## NumPy Logical Operations + +Logical operators perform Boolean algebra. A branch of algebra that deals with `True` and `False` statements. + +It illustrates the logical operations of AND, OR, and NOT using np.logical_and(), np.logical_or(), and np.logical_not() functions, respectively. + +**Logical AND** + +Evaluates the element-wise truth value of `array_1` AND `array_2` + +**Code** +```python +import numpy as np +array_1 = np.array([True, False, True]) +array_2 = np.array([False, False, True]) +print(np.logical_and(array_1, array_2)) +``` +**Output:** +``` +[False False True] +``` + +**Logical OR** + +Evaluates the element-wise truth value of `array_1` OR `array_2` + +**Code** +```python +import numpy as np +array_1 = np.array([True, False, True]) +array_2 = np.array([False, False, True]) +print(np.logical_or(array_1, array_2)) +``` +**Output:** +``` +[True False True] +``` + +**Logical NOT** + +Evaluates the element-wise truth value of `array_1` NOT `array_2` + +**Code** +```python +import numpy as np +array_1 = np.array([True, False, True]) +array_2 = np.array([False, False, True]) +print(np.logical_not(array_1)) +``` +**Output:** +``` +[False True False] +``` diff --git a/contrib/numpy/reshape-array.md b/contrib/numpy/reshape-array.md new file mode 100644 index 00000000..91da3660 --- /dev/null +++ b/contrib/numpy/reshape-array.md @@ -0,0 +1,57 @@ +# Numpy Array Shape and Reshape + +In NumPy, the primary data structure is the ndarray (N-dimensional array). An array can have one or more dimensions, and it organizes your data efficiently. + +Let us create a 2D array + +``` python +import numpy as np + +numbers = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) +print(numbers) +``` + +#### Output: + +``` python +array([[1, 2, 3, 4],[5, 6, 7, 8]]) +``` + +## Changing Array Shape using `reshape()` + +The `reshape()` function allows you to rearrange the data within a NumPy array. + +It take 2 arguments, row and columns. The `reshape()` can add or remove the dimensions. For instance, array can convert a 1D array into a 2D array or vice versa. + +``` python +arr_1d = np.array([1, 2, 3, 4, 5, 6]) # 1D array +arr_2d = arr_1d.reshape(2, 3) # Reshaping with 2 rows and 3 cols + +print(arr_2d) +``` + +#### Output: + +``` python +array([[1, 2, 3],[4, 5, 6]]) +``` + +## Changing Array Shape using `resize()` + +The `resize()` function allows you to modify the shape of a NumPy array directly. + +It take 2 arguements, row and columns. + +``` python +import numpy as np +arr_1d = np.array([1, 2, 3, 4, 5, 6]) + +arr_1d.resize((2, 3)) # 2 rows and 3 cols +print(arr_1d) +``` + +#### Output: + +``` python +array([[1, 2, 3],[4, 5, 6]]) +``` diff --git a/contrib/numpy/saving_numpy_arrays_to_files.md b/contrib/numpy/saving_numpy_arrays_to_files.md new file mode 100644 index 00000000..c8935b8f --- /dev/null +++ b/contrib/numpy/saving_numpy_arrays_to_files.md @@ -0,0 +1,126 @@ +# Saving NumPy Arrays to Files + +- Saving arrays in NumPy is important due to its efficiency in storage and speed, maintaining data integrity and precision, and offering convenience and interoperability. +- NumPy provides several methods to save arrays efficiently, either in binary or text formats. +- The primary methods are `save`, `savez`, and `savetxt`. + +### 1. numpy.save(): + +The `np.save` function saves a single NumPy array to a binary file with a `.npy` extension. This format is efficient and preserves the array's data type and shape. + +#### Syntax : + + ```python + numpy.save(file, arr, allow_pickle=True, fix_imports=True) + ``` +- **file** : Name of the file. +- **arr** : Array to be saved. +- **allow_pickle** : This is an Optional parameter, Allows saving object arrays using Python pickles.(By Default True) +- **fix_imports** : This is an Optional parameter, Fixes issues for Python 2 to Python 3 compatibility.(By Default True) + +#### Example : + +```python +import numpy as np + +arr = np.array([1,2,3,4,5]) +np.save("example.npy",arr) #saves arr into example.npy file in binary format +``` + +Inorder to load the array from example.npy + +```python +arr1 = np.load("example.npy") +print(arr1) +``` +**Output** : + +```python +[1,2,3,4,5] +``` +### 2. numpy.savez(): + +The `np.savez` function saves multiple NumPy arrays into a single file with a `.npz` extension. Each array is stored with a unique name. + +#### Syntax : + + ```python +numpy.savez(file, *args, **kwds) + ``` +- **file** : Name of the file. +- **args** : Arrays to be saved.( If arrays are unnamed, they are stored with default names like arr_0, arr_1, etc.) +- **kwds** : Named arrays to be saved. + +#### Example : + +```python +import numpy as np + +arr1 = np.array([1,2,3,4,5]) +arr2 = np.array(['a','b','c','d']) +arr3 = np.array([1.2,3.4,5]) +np.savez('example.npz', a1=arr1, a2=arr2, a3 = arr3) #saves arrays in npz format + +``` + +Inorder to load the array from example.npz + +```python + +arr = np.load('example.npz') +print(arr['a1']) +print(arr['a2']) +print(arr['a3']) + +``` +**Output** : +```python +[1 2 3 4 5] +['a' 'b' 'c' 'd'] +[1.2 3.4 5. ] +``` + +### 3. np.savetxt() + +The `np.savetxt` function saves a NumPy array to a text file, such as `.txt` or `.csv`. This format is human-readable and can be used for interoperability with other tools. + +#### Syntax : + + ```python +numpy.savetxt(fname, X, delimiter=' ', newline='\n', header='', footer='', encoding=None) + ``` +- **fname** : Name of the file. +- **X** : Array to be saved. +- **delimiter** : It is a Optional parameter,This is a character or string that is used to separate columns.(By Default it is " ") +- **newline** : It is a Optional parameter, Character for seperating lines.(By Default it is "\n") +- **header** : It is a Optional parameter, String that is written at beginning of the file. +- **footer** : It is a Optional parameter, String that is written at ending of the file. +- **encoding** : It is a Optional parameter, Encoding of the output file. (By Default it is None) + +#### Example : + +```python +import numpy as np + +arr = np.array([1.1,2.2,3,4.4,5]) +np.savetxt("example.txt",arr) #saves the array in example.txt + +``` + +Inorder to load the array from example.txt + +```python + +arr1 = np.loadtxt("example.txt") +print(arr1) + +``` +**Output** : +```python +[1.1 2.2 3. 4.4 5. ] +``` + + +By using these methods, you can efficiently save and load NumPy arrays in various formats suitable for your needs. + + diff --git a/contrib/numpy/sorting-array.md b/contrib/numpy/sorting-array.md new file mode 100644 index 00000000..65e9c255 --- /dev/null +++ b/contrib/numpy/sorting-array.md @@ -0,0 +1,104 @@ +# Sorting NumPy Arrays +- Sorting arrays is a common operation in data manipulation and analysis. +- NumPy provides various functions to sort arrays efficiently. +- The primary methods are `numpy.sort`,`numpy.argsort`, and `numpy.lexsort` + +### 1. numpy.sort() + +The `numpy.sort` function returns a sorted copy of an array. + +#### Syntax : + +```python +numpy.sort(arr, axis=-1, kind=None, order=None) +``` +- **arr** : Array to be sorted. +- **axis** : Axis along which to sort. (By Default is -1) +- **kind** : Sorting algorithm. Options are 'quicksort', 'mergesort', 'heapsort', and 'stable'. (By Default 'quicksort') +- **order** : When arr is an array with fields defined, this argument specifies which fields to compare first. + +#### Example : + +```python +import numpy as np + +arr = np.array([1,7,0,4,6]) +sarr = np.sort(arr) +print(sarr) +``` + +**Output** : +```python +[0 1 4 6 7] +``` + +### 2. numpy.argsort() + +The `numpy.argsort` function returns the indices that would sort an array. Using those indices you can sort the array. + +#### Syntax : + +```python +numpy.argsort(a, axis=-1, kind=None, order=None) +``` +- **arr** : Array to be sorted. +- **axis** : Axis along which to sort. (By Default is -1) +- **kind** : Sorting algorithm. Options are 'quicksort', 'mergesort', 'heapsort', and 'stable'. (By Default 'quicksort') +- **order** : When arr is an array with fields defined, this argument specifies which fields to compare first. + +#### Example : + +```python +import numpy as np + +arr = np.array([2.1,7,4.2,4.3,6]) +indices = np.argsort(arr) +print(indices) +s_arr = arr[indices] +print(s_arr) +``` + +**Output** : +```python +[0 2 3 4 1] +[2.1 4.2 4.3 6. 7. ] +``` + +### 3. np.lexsort() + +The np.lexsort function performs an indirect stable sort using a sequence of keys. + +#### Syntax : + +```python +numpy.lexsort(keys, axis=-1) +``` +- **keys**: Sequence of arrays to sort by. The last key is the primary sort key. +- **axis**: Axis to be indirectly sorted.(By Default -1) + +#### Example : + +```python +import numpy as np + +a = np.array([5,4,3,2]) +b = np.array(['a','d','c','b']) +indices = np.lexsort((a,b)) +print(indices) + +s_arr = a[indices] +print(s_arr) + +s_arr = b[indices] +print(s_arr) +``` + +**Output** : +```python +[0 3 2 1] +[2 3 4 5] +['a' 'b' 'c' 'd'] +``` + +NumPy provides powerful and flexible functions for sorting arrays, including `np.sort`, `np.argsort`, and `np.lexsort`. +These functions support sorting along different axes, using various algorithms, and sorting by multiple keys, making them suitable for a wide range of data manipulation tasks. diff --git a/contrib/numpy/splitting-arrays.md b/contrib/numpy/splitting-arrays.md new file mode 100644 index 00000000..3228cb71 --- /dev/null +++ b/contrib/numpy/splitting-arrays.md @@ -0,0 +1,135 @@ +# Splitting Arrays + +Splitting a NumPy array refers to dividing the array into smaller sub-arrays. This can be done in various ways, along specific rows, columns, or even based on conditions applied to the elements. + +There are several ways to split a NumPy array in Python using different functions. Some of these methods include: + +- Splitting a NumPy array using `numpy.split()` +- Splitting a NumPy array using `numpy.array_split()` +- Splitting a NumPy array using `numpy.vsplit()` +- Splitting a NumPy array using `numpy.hsplit()` +- Splitting a NumPy array using `numpy.dsplit()` + +## NumPy split() + +The `numpy.split()` function divides an array into equal parts along a specified axis. + +**Code** +```python +import numpy as np +array = np.array([1,2,3,4,5,6]) +#Splitting the array into 3 equal parts along axis=0 +result = np.split(array,3) +print(result) +``` + +**Output** +``` +[array([1, 2]), array([3, 4]), array([5, 6])] +``` + +## NumPy array_split() + +The `numpy.array_split()` function divides an array into equal or nearly equal sub-arrays. Unlike `numpy.split()`, it allows for uneven splitting, making it useful when the array cannot be evenly divided by the specified number of splits. + +**Code** +```python +import numpy as np +array = np.array([1,2,3,4,5,6,7,8]) +#Splitting the array into 3 unequal parts along axis=0 +result = np.array_split(array,3) +print(result) +``` + +**Output** +``` +[array([1, 2, 3]), array([4, 5, 6]), array([7, 8])] +``` + +## NumPy vsplit() + +The `numpy.vsplit()`, which is vertical splitting (row-wise), divides an array along the vertical axis (axis=0). + +**Code** +```python +import numpy as np +array = np.array([[1, 2, 3], + [4, 5, 6], + [7, 8, 9], + [10, 11, 12]]) +#Vertically Splitting the array into 2 subarrays along axis=0 +result = np.vsplit(array,2) +print(result) +``` + +**Output** +``` +[array([[1, 2, 3], + [4, 5, 6]]), array([[ 7, 8, 9], + [10, 11, 12]])] +``` + + +## NumPy hsplit() + +The `numpy.hsplit()`, which is horizontal splitting (column-wise), divides an array along the horizontal axis (axis=1). + +**Code** +```python +import numpy as np +array = np.array([[1, 2, 3, 4], + [5, 7, 8, 9], + [11,12,13,14]]) +#Horizontally Splitting the array into 4 subarrays along axis=1 +result = np.hsplit(array,4) +print(result) +``` + +**Output** +``` +[array([[ 1], + [ 5], + [11]]), array([[ 2], + [ 7], + [12]]), array([[ 3], + [ 8], + [13]]), array([[ 4], + [ 9], + [14]])] +``` + +## NumPy dsplit() + +The`numpy.dsplit()` is employed for splitting arrays along the third axis (axis=2), which is applicable for 3D arrays and beyond. + +**Code** +```python +import numpy as np +#3D array +array = np.array([[[ 1, 2, 3, 4,], + [ 5, 6, 7, 8,], + [ 9, 10, 11, 12]], + [[13, 14, 15, 16,], + [17, 18, 19, 20,], + [21, 22, 23, 24]]]) +#Splitting the array along axis=2 +result = np.dsplit(array,2) +print(result) +``` + +**Output** +``` +[array([[[ 1, 2], + [ 5, 6], + [ 9, 10]], + + [[13, 14], + [17, 18], + [21, 22]]]), array([[[ 3, 4], + [ 7, 8], + [11, 12]], + + [[15, 16], + [19, 20], + [23, 24]]])] +``` diff --git a/contrib/numpy/statistical-functions.md b/contrib/numpy/statistical-functions.md new file mode 100644 index 00000000..06fbae22 --- /dev/null +++ b/contrib/numpy/statistical-functions.md @@ -0,0 +1,154 @@ +# Statistical Operations on Arrays + +Statistics involves collecting data, analyzing it, and drawing conclusions from the gathered information. + +NumPy provides powerful statistical functions to perform efficient data analysis on arrays, including `minimum`, `maximum`, `mean`, `median`, `variance`, `standard deviation`, and more. + +## Minimum + +In NumPy, the minimum value of an array is the smallest element present. + +The smallest element of an array is calculated using the `np.min()` function. + +**Code** +```python +import numpy as np +array = np.array([100,20,300,400]) +#Calculating the minimum +result = np.min(array) +print("Minimum :", result) +``` + +**Output** +``` +Minimum : 20 +``` + +## Maximum + +In NumPy, the maximum value of an array is the largest element present. + +The largest element of an array is calculated using the `np.max()` function. + +**Code** +```python +import numpy as np +array = np.array([100,20,300,400]) +#Calculating the maximum +result = np.max(array) +print("Maximum :", result) +``` + +**Output** +``` +Maximum : 400 +``` + +## Mean + +The mean value of a NumPy array is the average of all its elements. + +It is calculated by summing all the elements and then dividing by the total number of elements. + +The mean of an array is calculated using the `np.mean()` function. + +**Code** +```python +import numpy as np +array = np.array([10,20,30,40]) +#Calculating the mean +result = np.mean(array) +print("Mean :", result) +``` + +**Output** +``` +Mean : 25.0 +``` + +## Median + +The median value of a NumPy array is the middle value in a sorted array. + +It separates the higher half of the data from the lower half. + +The median of an array is calculated using the `np.median()` function. + +It is important to note that: + +- If the number of elements is `odd`, the median is the middle element. +- If the number of elements is `even`, the median is the average of the two middle elements. + +**Code** +```python +import numpy as np +#The number of elements is odd +array = np.array([5,6,7,8,9]) +#Calculating the median +result = np.median(array) +print("Median :", result) +``` + +**Output** +``` +Median : 7.0 +``` + +**Code** +```python +import numpy as np +#The number of elements is even +array = np.array([1,2,3,4,5,6]) +#Calculating the median +result = np.median(array) +print("Median :", result) +``` + +**Output** +``` +Median : 3.5 +``` + +## Variance + +Variance in a NumPy array measures the spread or dispersion of data points. + +Calculated as the average of the squared differences from the mean. + +The variance of an array is calculated using the `np.var()` function. + +**Code** +```python +import numpy as np +array = np.array([10,70,80,50,30]) +#Calculating the variance +result = np.var(array) +print("Variance :", result) +``` + +**Output** +``` +Variance : 656.0 +``` + +## Standard Deviation + +The standard deviation of a NumPy array measures the amount of variation or dispersion of the elements in the array. + +It is calculated as the square root of the average of the squared differences from the mean, providing insight into how spread out the values are around the mean. + +The standard deviation of an array is calculated using the `np.std()` function. + +**Code** +```python +import numpy as np +array = np.array([25,30,40,55,75,100]) +#Calculating the standard deviation +result = np.std(array) +print("Standard Deviation :", result) +``` + +**Output** +``` +Standard Deviation : 26.365486699260625 +``` diff --git a/contrib/numpy/universal-functions.md b/contrib/numpy/universal-functions.md new file mode 100644 index 00000000..090f33c5 --- /dev/null +++ b/contrib/numpy/universal-functions.md @@ -0,0 +1,130 @@ +# Universal functions (ufunc) + +--- + +A `ufunc`, short for "`universal function`," is a fundamental concept in NumPy, a powerful library for numerical computing in Python. Universal functions are highly optimized, element-wise functions designed to perform operations on data stored in NumPy arrays. + + + +## Uses of Ufuncs in NumPy + +Universal functions (ufuncs) in NumPy provide a wide range of functionalities for efficient and powerful numerical computations. Below is a detailed explanation of their uses: + +### 1. **Element-wise Operations** +Ufuncs perform operations on each element of the arrays independently. + +```python +import numpy as np + +A = np.array([1, 2, 3, 4]) +B = np.array([5, 6, 7, 8]) + +# Element-wise addition +np.add(A, B) # Output: array([ 6, 8, 10, 12]) +``` + +### 2. **Broadcasting** +Ufuncs support broadcasting, allowing operations on arrays with different shapes, making it possible to perform operations without explicitly reshaping arrays. + +```python +C = np.array([1, 2, 3]) +D = np.array([[1], [2], [3]]) + +# Broadcasting addition +np.add(C, D) # Output: array([[2, 3, 4], [3, 4, 5], [4, 5, 6]]) +``` + +### 3. **Vectorization** +Ufuncs are vectorized, meaning they are implemented in low-level C code, allowing for fast execution and avoiding the overhead of Python loops. + +```python +# Vectorized square root +np.sqrt(A) # Output: array([1., 1.41421356, 1.73205081, 2.]) +``` + +### 4. **Type Flexibility** +Ufuncs handle various data types and perform automatic type casting as needed. + +```python +E = np.array([1.0, 2.0, 3.0]) +F = np.array([4, 5, 6]) + +# Addition with type casting +np.add(E, F) # Output: array([5., 7., 9.]) +``` + +### 5. **Reduction Operations** +Ufuncs support reduction operations, such as summing all elements of an array or finding the product of all elements. + +```python +# Summing all elements +np.add.reduce(A) # Output: 10 + +# Product of all elements +np.multiply.reduce(A) # Output: 24 +``` + +### 6. **Accumulation Operations** +Ufuncs can perform accumulation operations, which keep a running tally of the computation. + +```python +# Cumulative sum +np.add.accumulate(A) # Output: array([ 1, 3, 6, 10]) +``` + +### 7. **Reduceat Operations** +Ufuncs can perform segmented reductions using the `reduceat` method, which applies the ufunc at specified intervals. + +```python +G = np.array([0, 1, 2, 3, 4, 5, 6, 7]) +indices = [0, 2, 5] +np.add.reduceat(G, indices) # Output: array([ 1, 9, 18]) +``` + +### 8. **Outer Product** +Ufuncs can compute the outer product of two arrays, producing a matrix where each element is the result of applying the ufunc to each pair of elements from the input arrays. + +```python +# Outer product +np.multiply.outer([1, 2, 3], [4, 5, 6]) +# Output: array([[ 4, 5, 6], +# [ 8, 10, 12], +# [12, 15, 18]]) +``` + +### 9. **Out Parameter** +Ufuncs can use the `out` parameter to store results in a pre-allocated array, saving memory and improving performance. + +```python +result = np.empty_like(A) +np.multiply(A, B, out=result) # Output: array([ 5, 12, 21, 32]) +``` + +# Create Your Own Ufunc + +You can create custom ufuncs for specific needs using np.frompyfunc or np.vectorize, allowing Python functions to behave like ufuncs. + +Here, we are using `frompyfunc()` which takes three argument: + +1. function - the name of the function. +2. inputs - the number of input (arrays). +3. outputs - the number of output arrays. + +```python +def my_add(x, y): + return x + y + +my_add_ufunc = np.frompyfunc(my_add, 2, 1) +my_add_ufunc(A, B) # Output: array([ 6, 8, 10, 12], dtype=object) +``` +# Some Common Ufunc are + +Here are some commonly used ufuncs in NumPy: + +- **Arithmetic**: `np.add`, `np.subtract`, `np.multiply`, `np.divide` +- **Trigonometric**: `np.sin`, `np.cos`, `np.tan` +- **Exponential and Logarithmic**: `np.exp`, `np.log`, `np.log10` +- **Comparison**: `np.maximum`, `np.minimum`, `np.greater`, `np.less` +- **Logical**: `np.logical_and`, `np.logical_or`, `np.logical_not` + +For more such Ufunc, address to [Universal functions (ufunc) — NumPy](https://numpy.org/doc/stable/reference/ufuncs.html) diff --git a/contrib/pandas/Datasets/car-sales-missing-data.csv b/contrib/pandas/Datasets/car-sales-missing-data.csv new file mode 100644 index 00000000..21a3157f --- /dev/null +++ b/contrib/pandas/Datasets/car-sales-missing-data.csv @@ -0,0 +1,11 @@ +Make,Colour,Odometer,Doors,Price +Toyota,White,150043,4,"$4,000" +Honda,Red,87899,4,"$5,000" +Toyota,Blue,,3,"$7,000" +BMW,Black,11179,5,"$22,000" +Nissan,White,213095,4,"$3,500" +Toyota,Green,,4,"$4,500" +Honda,,,4,"$7,500" +Honda,Blue,,4, +Toyota,White,60000,, +,White,31600,4,"$9,700" diff --git a/contrib/pandas/Datasets/car-sales.csv b/contrib/pandas/Datasets/car-sales.csv new file mode 100644 index 00000000..81e534a5 --- /dev/null +++ b/contrib/pandas/Datasets/car-sales.csv @@ -0,0 +1,11 @@ +Make,Colour,Odometer (KM),Doors,Price +Toyota,White,150043,4,"$4,000.00" +Honda,Red,87899,4,"$5,000.00" +Toyota,Blue,32549,3,"$7,000.00" +BMW,Black,11179,5,"$22,000.00" +Nissan,White,213095,4,"$3,500.00" +Toyota,Green,99213,4,"$4,500.00" +Honda,Blue,45698,4,"$7,500.00" +Honda,Blue,54738,4,"$7,000.00" +Toyota,White,60000,4,"$6,250.00" +Nissan,White,31600,4,"$9,700.00" diff --git a/contrib/pandas/Datasets/readme.md b/contrib/pandas/Datasets/readme.md new file mode 100644 index 00000000..ea2255c1 --- /dev/null +++ b/contrib/pandas/Datasets/readme.md @@ -0,0 +1 @@ +## This folder contains all the Datasets used in the content. diff --git a/contrib/pandas/datetime.md b/contrib/pandas/datetime.md new file mode 100644 index 00000000..008d6fc5 --- /dev/null +++ b/contrib/pandas/datetime.md @@ -0,0 +1,158 @@ +# Working with Date & Time in Pandas + +While working with data, it is common to come across data containing date and time. Pandas is a very handy tool for dealing with such data and provides a wide range of date and time data processing options. + +- **Parsing dates and times**: Pandas provides a number of functions for parsing dates and times from strings, including `to_datetime()` and `parse_dates()`. These functions can handle a variety of date and time formats, Unix timestamps, and human-readable formats. + +- **Manipulating dates and times**: Pandas provides a number of functions for manipulating dates and times, including `shift()`, `resample()`, and `to_timedelta()`. These functions can be used to add or subtract time periods, change the frequency of a time series, and calculate the difference between two dates or times. + +- **Visualizing dates and times**: Pandas provides a number of functions for visualizing dates and times, including `plot()`, `hist()`, and `bar()`. These functions can be used to create line charts, histograms, and bar charts of date and time data. + +### `Timestamp` function + +The timestamp function in Pandas is used to convert a datetime object to a Unix timestamp. A Unix timestamp is a numerical representation of datetime. + +Example for retrieving day, month and year from given date: + +```python +import pandas as pd + +ts = pd.Timestamp('2024-05-05') +y = ts.year +print('Year is: ', y) +m = ts.month +print('Month is: ', m) +d = ts.day +print('Day is: ', d) +``` + +Output: + +```python +Year is: 2024 +Month is: 5 +Day is: 5 +``` + +Example for extracting time related data from given date: + +```python +import pandas as pd + +ts = pd.Timestamp('2024-10-24 12:00:00') +print('Hour is: ', ts.hour) +print('Minute is: ', ts.minute) +print('Weekday is: ', ts.weekday()) +print('Quarter is: ', ts.quarter) +``` + +Output: + +```python +Hour is: 12 +Minute is: 0 +Weekday is: 1 +Quarter is: 4 +``` + +### `Timestamp.now()` + +Example for getting current date and time: + +```python +import pandas as pd + +ts = pd.Timestamp.now() +print('Current date and time is: ', ts) +``` + +Output: +```python +Current date and time is: 2024-05-25 11:48:25.593213 +``` + +### `date_range` function + +Example for generating dates' for next five days: + +```python +import pandas as pd + +ts = pd.date_range(start = pd.Timestamp.now(), periods = 5) +for i in ts: + print(i.date()) +``` + +Output: + +```python +2024-05-25 +2024-05-26 +2024-05-27 +2024-05-28 +2024-05-29 +``` + +Example for generating dates' for previous five days: + +```python +import pandas as pd + +ts = pd.date_range(end = pd.Timestamp.now(), periods = 5) +for i in ts: + print(i.date()) +``` + +Output: +```python +2024-05-21 +2024-05-22 +2024-05-23 +2024-05-24 +2024-05-25 +``` + +### Built-in vs pandas date & time operations + +In `pandas`, you may add a time delta to a full column of dates in a single action, but Python's datetime requires a loop. + +Example in Pandas: + +```python +import pandas as pd + +dates = pd.DataFrame(pd.date_range('2023-01-01', periods=100000, freq='T')) +dates += pd.Timedelta(days=1) +print(dates) +``` + +Output: +```python + 0 +0 2023-01-02 00:00:00 +1 2023-01-02 00:01:00 +2 2023-01-02 00:02:00 +3 2023-01-02 00:03:00 +4 2023-01-02 00:04:00 +... ... +99995 2023-03-12 10:35:00 +99996 2023-03-12 10:36:00 +99997 2023-03-12 10:37:00 +99998 2023-03-12 10:38:00 +99999 2023-03-12 10:39:00 +``` + +Example using Built-in datetime library: + +```python +from datetime import datetime, timedelta + +dates = [datetime(2023, 1, 1) + timedelta(minutes=i) for i in range(100000)] +dates = [date + timedelta(days=1) for date in dates] +``` + +Why use pandas functions? + +- Pandas employs NumPy's datetime64 dtype, which takes up a set amount of bytes (usually 8 bytes per date), to store datetime data more compactly and efficiently. +- Each datetime object in Python takes up extra memory since it contains not only the date and time but also the additional metadata and overhead associated with Python objects. +- Pandas Offers a wide range of convenient functions and methods for date manipulation, extraction, and conversion, such as `pd.to_datetime()`, `date_range()`, `timedelta_range()`, and more. datetime library requires manual implementation for many of these operations, leading to longer and less efficient code. diff --git a/contrib/pandas/descriptive-statistics.md b/contrib/pandas/descriptive-statistics.md new file mode 100644 index 00000000..abdb33b2 --- /dev/null +++ b/contrib/pandas/descriptive-statistics.md @@ -0,0 +1,573 @@ +## Descriptive Statistics + +In the realm of data science, understanding the characteristics of data is fundamental. Descriptive statistics provide the tools and techniques to succinctly summarize and present the key features of a dataset. It serves as the cornerstone for exploring, visualizing, and ultimately gaining insights from data. + +Descriptive statistics encompasses a range of methods designed to describe the central tendency, dispersion, and shape of a dataset. Through measures such as mean, median, mode, standard deviation, and variance, descriptive statistics offer a comprehensive snapshot of the data's distribution and variability. + +Data scientists utilize descriptive statistics to uncover patterns, identify outliers, and assess the overall structure of data before delving into more advanced analyses. By summarizing large and complex datasets into manageable and interpretable summaries, descriptive statistics facilitate informed decision-making and actionable insights. + + +```python +import pandas as pd +import numpy as np + +df = pd.read_csv("Age-Income-Dataset.csv") +df +``` + +| | Age | Income | +| --- | ----------- | ------ | +| 0 | Young | 25000 | +| 1 | Middle Age | 54000 | +| 2 | Old | 60000 | +| 3 | Young | 15000 | +| 4 | Young | 45000 | +| 5 | Young | 65000 | +| 6 | Young | 70000 | +| 7 | Young | 30000 | +| 8 | Middle Age | 27000 | +| 9 | Young | 23000 | +| 10 | Young | 48000 | +| 11 | Old | 52000 | +| 12 | Young | 33000 | +| 13 | Old | 80000 | +| 14 | Old | 75000 | +| 15 | Old | 35000 | +| 16 | Middle Age | 29000 | +| 17 | Middle Age | 57000 | +| 18 | Old | 43000 | +| 19 | Middle Age | 56000 | +| 20 | Old | 63000 | +| 21 | Old | 32000 | +| 22 | Old | 45000 | +| 23 | Old | 89000 | +| 24 | Middle Age | 90000 | +| 25 | Middle Age | 93000 | +| 26 | Young | 80000 | +| 27 | Young | 87000 | +| 28 | Young | 38000 | +| 29 | Young | 23000 | +| 30 | Middle Age | 38900 | +| 31 | Middle Age | 53200 | +| 32 | Old | 43800 | +| 33 | Middle Age | 25600 | +| 34 | Middle Age | 65400 | +| 35 | Old | 76800 | +| 36 | Old | 89700 | +| 37 | Old | 41800 | +| 38 | Young | 31900 | +| 39 | Old | 25600 | +| 40 | Middle Age | 45700 | +| 41 | Old | 35600 | +| 42 | Young | 54300 | +| 43 | Middle Age | 65400 | +| 44 | Old | 67800 | +| 45 | Old | 24500 | +| 46 | Middle Age | 34900 | +| 47 | Old | 45300 | +| 48 | Young | 68400 | +| 49 | Middle Age | 51700 | + +```python +df.describe() +``` + +| | Income | +|-------|-------------| +| count | 50.000000 | +| mean | 50966.000000 | +| std | 21096.683268 | +| min | 15000.000000 | +| 25% | 33475.000000 | +| 50% | 46850.000000 | +| 75% | 65400.000000 | +| max | 93000.000000 | + + +### Mean + +The mean, also known as the average, is a measure of central tendency in a dataset. It represents the typical value of a set of numbers. The formula to calculate the mean of a dataset is: + +$$ \overline{x} = \frac{\sum\limits_{i=1}^{n} x_i}{n} $$ + +* $\overline{x}$ (pronounced "x bar") represents the mean value. +* $x_i$ represents the individual value in the dataset (where i goes from 1 to n). +* $\sum$ (sigma) represents the summation symbol, indicating we add up all the values from i=1 to n. +* $n$ represents the total number of values in the dataset. + +```python +df['Income'].mean() +``` + +#### Result + +``` +50966.0 +``` + +#### Without pandas + + +```python +def mean_f(df): + for col in df.columns: + if df[col].dtype != 'O': + temp = 0 + for i in df[col]: + temp = temp +i + print("Without pandas Library -> ") + print("Average of {} is {}".format(col,(temp/len(df[col])))) + print() + print("With pandas Library -> ") + print(df[col].mean()) + +mean_f(df) +``` + +Average of Income: + +- Without pandas Library -> 50966.0 +- With pandas Library -> 50966.0 + +### Median + + +The median is another measure of central tendency in a dataset. Unlike the mean, which is the average value of all data points, the median represents the middle value when the dataset is ordered from smallest to largest. If the dataset has an odd number of observations, the median is the middle value. If the dataset has an even number of observations, the median is the average of the two middle values. + +The median represents the "middle" value in a dataset. There are two cases to consider depending on whether the number of observations (n) is odd or even: + +**Odd number of observations (n):** + +In this case, the median (M) is the value located at the middle position when the data is ordered from least to greatest. We can calculate the position using the following formula: + +$$ M = x_{n+1/2} $$ + +**Even number of observations (n):** + +When we have an even number of observations, there isn't a single "middle" value. Instead, the median is the average of the two middle values after ordering the data. Here's the formula to find the median: + +$$ M = \frac{x_{n/2} + x_{(n/2)+1}}{2} $$ + +**Explanation:** + +* M represents the median value. +* n represents the total number of observations in the dataset. +* $x$ represents the individual value. + +```python +df['Income'].median() +``` + +#### Result + +``` +46850.0 +``` + +#### Without pandas + +```python +def median_f(df): + for col in df.columns: + if df[col].dtype != 'O': + sorted_data = sorted(df[col]) + n = len(df[col]) + if n%2 == 0: + x1 =sorted_data[int((n/2))] + x2 =sorted_data[int((n/2))+1] + median=(x1+x2)/2 + else: + median = sorted_data[(n+1)/2] + print("Median without library ->") + print("Median of {} is {} ".format(col,median)) + print("Median with library ->") + print(df[col].median()) +median_f(df) +``` + +Median of Income: + +- Median without library -> 49850.0 +- Median with library -> 46850.0 + +### Mode + +The mode is a measure of central tendency that represents the value or values that occur most frequently in a dataset. Unlike the mean and median, which focus on the average or middle value, the mode identifies the most common value(s) in the dataset. + +```python +def mode_f(df): + for col in df.columns: + if df[col].dtype == 'O': + print("Column:", col) + arr = df[col].sort_values() + + prevcnt = 0 + cnt = 0 + ans = arr[0] + temp = arr[0] + + for i in arr: + if(temp == i) : + cnt += 1 + else: + prevcnt = cnt + cnt = 1 + temp = i + if(cnt > prevcnt): + ans = i + + print("Without pandas Library -> ") + print("Mode of {} is {}".format(col,ans)) + print() + print("With pandas Library -> ") + print(df[col].mode()) +mode_f(df) +``` + +#### Result + +``` +Column: Age +Without pandas Library -> +Mode of Age is Old + +With pandas Library -> +0 Old +Name: Age, dtype: object +``` + +### Standard Deviation + +Standard deviation is a measure of the dispersion or spread of a dataset. It quantifies the amount of variation or dispersion of a set of values from the mean. In other words, it indicates how much individual values in a dataset deviate from the mean. + +$$s = \sqrt{\frac{\sum(x_i-\overline{x})^{2}}{n-1}}$$ + +* $s$ represents the standard deviation. +* $\sum$ (sigma) represents the summation symbol, indicating we add up the values for all data points. +* $x_i$ represents the individual value in the dataset. +* $\overline{x}$ (x bar) represents the mean value of the dataset. +* $n$ represents the total number of values in the dataset. + +```python +df['Income'].std() +``` + +#### Result + +``` +21096.683267707253 +``` + +#### Without pandas + +```python +import math +def std_f(df): + for col in df.columns: + if len(df[col]) == 0: + print("Column is empty") + if df[col].dtype != 'O': + sum = 0 + mean = df[col].mean() + for i in df[col]: + sum = sum + (i - mean)**2 + + std = math.sqrt(sum/len(df[col])) + print("Without pandas library ->") + print("Std : " , std) + print("With pandas library: ->") + print("Std : {}".format(np.std(df[col]))) ##ddof = 1 + +std_f(df) +``` + +Without pandas library -> +Std : 20884.6509187968 \ +With pandas library: -> +Std : 20884.6509187968 + + +### Count + +```python +df['Income'].count() +``` + +#### Result + +``` +50 +``` + +### Minimum + + +```python +df['Income'].min() +``` + +#### Result + +``` +15000 +``` + +#### Without pandas + +```python +def min_f(df): + for col in df.columns: + if df[col].dtype != "O": + sorted_data = sorted(df[col]) + min = sorted_data[0] + print("Without pandas Library->",min) + print("With pandas Library->",df[col].min()) + +min_f(df) +``` + +Without pandas Library-> 15000 \ +With pandas Library-> 15000 + + +### Maximum + + +```python +df['Income'].max() +``` + +#### Result + +``` +93000 +``` + +#### Without pandas + +```python +def max_f(df): + for col in df.columns: + if df[col].dtype != "O": + sorted_data = sorted(df[col]) + max = sorted_data[len(df[col])-1] + print("Without pandas Library->",max) + print("With pandas Library->",df[col].max()) + +max_f(df) +``` + +Without pandas Library-> 93000 +With pandas Library-> 93000 + + +### Percentile + + +```python +df['Income'].quantile(0.25) +``` + +#### Result + +``` +33475.0 +``` + +```python +df['Income'].quantile(0.75) +``` + +#### Result + +``` +65400.0 +``` + +#### Without pandas + +```python +def percentile_f(df,percentile): + for col in df.columns: + if df[col].dtype != 'O': + sorted_data = sorted(df[col]) + index = int(percentile*len(df[col])) + percentile_result = sorted_data[index] + print(f"{percentile} Percentile is : ",percentile_result) + +percentile_f(df,0.25) +``` + +0.25 Percentile is : 33000 + + +We have used the method of nearest rank to calculate percentile manually. + +Pandas uses linear interpolation of data to calculate percentiles. + +## Correlation and Covariance + + +```python +df = pd.read_csv('Iris.csv') +df.head(5) +``` + +| | Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | +|---|----|---------------|--------------|---------------|--------------|-------------| +| 0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa | +| 1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa | +| 2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa | +| 3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa | +| 4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa | + +```python +df.drop(['Id','Species'],axis=1,inplace= True) +``` + +### Covarience + +Covariance measures the degree to which two variables change together. If the covariance between two variables is positive, it means that they tend to increase or decrease together. If the covariance is negative, it means that as one variable increases, the other tends to decrease. However, covariance does not provide a standardized measure, making it difficult to interpret the strength of the relationship between variables, especially if the variables are measured in different units. + +$$ COV(X,Y) = \frac{\sum\limits_{i=1}^{n} (X_i - \overline{X}) (Y_i - \overline{Y})}{n - 1}$$ + +**Explanation:** + +* $COV(X, Y)$ represents the covariance between variables X and Y. +* $X_i$ and $Y_i$ represent the individual values for variables X and Y in the i-th observation. +* $\overline{X}$ and $\overline{Y}$ represent the mean values for variables X and Y, respectively. +* $n$ represents the total number of observations in the dataset. + +```python +df.cov() +``` + +| | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | +|-------------------|-------------- |---------------|-----------------|--------------| +| **SepalLengthCm** | 0.685694 | -0.039268 | 1.273682 | 0.516904 | +| **SepalWidthCm** | -0.039268 | 0.188004 | -0.321713 | -0.117981 | +| **PetalLengthCm** | 1.273682 | -0.321713 | 3.113179 | 1.296387 | +| **PetalWidthCm** | 0.516904 | -0.117981 | 1.296387 | 0.582414 | + +#### Without pandas + +```python +def cov_f(df): + for x in df.columns: + for y in df.columns: + mean_x = df[x].mean() + mean_y = df[y].mean() + + sum = 0 + n = len(df[x]) + + for val in range(n): + sum += (df[x].iloc[val] - mean_x)*(df[y].iloc[val] - mean_y) + print("Covariance of {} and {} is : {}".format(x,y, sum/(n-1))) + print() +cov_f(df) +``` + +#### Result + +``` +Covariance of SepalLengthCm and SepalLengthCm is : 0.6856935123042504 +Covariance of SepalLengthCm and SepalWidthCm is : -0.03926845637583892 +Covariance of SepalLengthCm and PetalLengthCm is : 1.2736823266219246 +Covariance of SepalLengthCm and PetalWidthCm is : 0.5169038031319911 + +Covariance of SepalWidthCm and SepalLengthCm is : -0.03926845637583892 +Covariance of SepalWidthCm and SepalWidthCm is : 0.1880040268456377 +Covariance of SepalWidthCm and PetalLengthCm is : -0.32171275167785235 +Covariance of SepalWidthCm and PetalWidthCm is : -0.11798120805369115 + +Covariance of PetalLengthCm and SepalLengthCm is : 1.2736823266219246 +Covariance of PetalLengthCm and SepalWidthCm is : -0.32171275167785235 +Covariance of PetalLengthCm and PetalLengthCm is : 3.113179418344519 +Covariance of PetalLengthCm and PetalWidthCm is : 1.2963874720357946 + +Covariance of PetalWidthCm and SepalLengthCm is : 0.5169038031319911 +Covariance of PetalWidthCm and SepalWidthCm is : -0.11798120805369115 +Covariance of PetalWidthCm and PetalLengthCm is : 1.2963874720357946 +Covariance of PetalWidthCm and PetalWidthCm is : 0.5824143176733781 +```` + +### Correlation + +Correlation, on the other hand, standardizes the measure of relationship between two variables, making it easier to interpret. It measures both the strength and direction of the linear relationship between two variables. Correlation values range between -1 and 1, where: + +$$r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{n(\sum x^2) - (\sum x)^2} \cdot \sqrt{n(\sum y^2) - (\sum y)^2}}$$ + +* r represents the correlation coefficient. +* n is the number of data points. + +```python +df.corr() +``` + +| | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | +|-------------------|---------------|--------------|---------------|--------------| +| **SepalLengthCm** | 1.000000 | -0.109369 | 0.871754 | 0.817954 | +| **SepalWidthCm** | -0.109369 | 1.000000 | -0.420516 | -0.356544 | +| **PetalLengthCm** | 0.871754 | -0.420516 | 1.000000 | 0.962757 | +| **PetalWidthCm** | 0.817954 | -0.356544 | 0.962757 | 1.000000 | + +#### Without using pandas + +```python +import math +def corr_f(df): + for i in df.columns: + for j in df.columns: + n = len(df[i]) + + sumX = 0 + for x in df[i]: + sumX += x + sumY = 0 + for y in df[j]: + sumY += y + + sumXY = 0 + for xy in range(n): + sumXY += (df[i].iloc[xy] * df[j].iloc[xy]) + + sumX2 = 0 + for x in df[i]: + sumX2 += (x**2) + sumY2 = 0 + for y in df[j]: + sumY2 += (y**2) + + NR = (n * sumXY) - (sumX*sumY) + DR = math.sqrt( ( (n * sumX2) - (sumX**2))*( (n * sumY2) - (sumY ** 2) ) ) + + print("Correlation of {} and {} :{}".format(i,j,NR/DR)) + print() + +corr_f(df) +``` + +#### Result + +``` +Correlation of SepalLengthCm and SepalLengthCm :1.0 +Correlation of SepalLengthCm and SepalWidthCm :-0.10936924995067286 +Correlation of SepalLengthCm and PetalLengthCm :0.8717541573048861 +Correlation of SepalLengthCm and PetalWidthCm :0.8179536333691775 + +Correlation of SepalWidthCm and SepalLengthCm :-0.10936924995067286 +Correlation of SepalWidthCm and SepalWidthCm :1.0 +Correlation of SepalWidthCm and PetalLengthCm :-0.42051609640118826 +Correlation of SepalWidthCm and PetalWidthCm :-0.3565440896138223 + +Correlation of PetalLengthCm and SepalLengthCm :0.8717541573048861 +Correlation of PetalLengthCm and SepalWidthCm :-0.42051609640118826 +Correlation of PetalLengthCm and PetalLengthCm :1.0 +Correlation of PetalLengthCm and PetalWidthCm :0.9627570970509656 + +Correlation of PetalWidthCm and SepalLengthCm :0.8179536333691775 +Correlation of PetalWidthCm and SepalWidthCm :-0.3565440896138223 +Correlation of PetalWidthCm and PetalLengthCm :0.9627570970509656 +Correlation of PetalWidthCm and PetalWidthCm :1.0 +``` diff --git a/contrib/pandas/excel-with-pandas.md b/contrib/pandas/excel-with-pandas.md new file mode 100644 index 00000000..d325e466 --- /dev/null +++ b/contrib/pandas/excel-with-pandas.md @@ -0,0 +1,63 @@ +# Pandas DataFrame + +The Pandas DataFrame is a two-dimensional, size-mutable, and possibly heterogeneous tabular data format with labelled axes. A data frame is a two-dimensional data structure in which the data can be organised in rows and columns. Pandas DataFrames are comprised of three main components: data, rows, and columns. + +In the real world, Pandas DataFrames are formed by importing datasets from existing storage, which can be a Excel file, a SQL database or CSV file. Pandas DataFrames may be constructed from lists, dictionaries, or lists of dictionaries, etc. + + +Features of Pandas `DataFrame`: + +- **Size mutable**: DataFrames are mutable in size, meaning that new rows and columns can be added or removed as needed. +- **Labeled axes**: DataFrames have labeled axes, which makes it easy to keep track of the data. +- **Arithmetic operations**: DataFrames support arithmetic operations on rows and columns. +- **High performance**: DataFrames are highly performant, making them ideal for working with large datasets. + + +### Installation of libraries + +`pip install pandas`
+`pip install xlrd` + +- **Note**: The `xlrd` library is used for Excel operations. + +Example for reading data from an Excel File: + +```python +import pandas as pd + +l = pd.read_excel('example.xlsx') +d = pd.DataFrame(l) +print(d) +``` +Output: +```python + Name Age +0 John 12 +``` + + +Example for Inserting Data into Excel File: + +```python +import pandas as pd + +l = pd.read_excel('file_name.xlsx') +d = {'Name': ['Bob', 'John'], 'Age': [12, 28]} +d = pd.DataFrame(d) +L = pd.concat([l, d], ignore_index = True) +L.to_excel('file_name.xlsx', index = False) +print(L) +``` + +Output: +```python + Name Age +0 Bob 12 +1 John 28 +``` + +### Usage of Pandas DataFrame: + +- Can be used to store and analyze financial data, such as stock prices, trading data, and economic data. +- Can be used to store and analyze sensor data, such as data from temperature sensors, motion sensors, and GPS sensors. +- Can be used to store and analyze log data, such as web server logs, application logs, and system logs diff --git a/contrib/pandas/groupby-functions.md b/contrib/pandas/groupby-functions.md new file mode 100644 index 00000000..00bad796 --- /dev/null +++ b/contrib/pandas/groupby-functions.md @@ -0,0 +1,391 @@ +## Group By Functions + +GroupBy is a powerful function in pandas that allows you to split data into distinct groups based on one or more columns and perform operations on each group independently. It's a fundamental technique for data analysis and summarization. + +Here's a step-by-step breakdown of how groupby functions work in pandas: + +* __Splitting the Data:__ You can group your data based on one or more columns using the .groupby() method. This method takes a column name or a list of column names as input and splits the DataFrame into groups according to the values in those columns. + +* __Applying a Function:__ Once the data is grouped, you can apply various functions to each group. Pandas offers a variety of built-in aggregation functions like sum(), mean(), count(), etc., that can be used to summarize the data within each group. You can also use custom functions or lambda functions for more specific operations. + +* __Combining the Results:__ After applying the function to each group, the results are combined into a new DataFrame or Series, depending on the input data and the function used. This new data structure summarizes the data by group. + + +```python +import pandas as pd +import seaborn as sns +import numpy as np +``` + + +```python +iris_data = sns.load_dataset('iris') +``` + +This code loads the built-in Iris dataset from seaborn and stores it in a pandas DataFrame named iris_data. The Iris dataset contains measurements of flower sepal and petal dimensions for three Iris species (Setosa, Versicolor, Virginica). + + +```python +iris_data +``` + +| | sepal_length | sepal_width | petal_length | petal_width | species | +|----|--------------|-------------|--------------|-------------|-----------| +| 0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | +| 1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa | +| 2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa | +| 3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa | +| 4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa | +| ...| ... | ... | ... | ... | ... | +| 145| 6.7 | 3.0 | 5.2 | 2.3 | virginica | +| 146| 6.3 | 2.5 | 5.0 | 1.9 | virginica | +| 147| 6.5 | 3.0 | 5.2 | 2.0 | virginica | +| 148| 6.2 | 3.4 | 5.4 | 2.3 | virginica | +| 149| 5.9 | 3.0 | 5.1 | 1.8 | virginica | + + + + + +```python +iris_data.groupby(['species']).count() +``` + + + + +| species | sepal_length | sepal_width | petal_length | petal_width | +|------------|--------------|-------------|--------------|-------------| +| setosa | 50 | 50 | 50 | 50 | +| versicolor | 50 | 50 | 50 | 50 | +| virginica | 50 | 50 | 50 | 50 | + + + + +* We group the data by the 'species' column. +count() is applied to each group, which counts the number of occurrences (rows) in each species category. +* The output (species_counts) is a DataFrame showing the count of each species in the dataset. + + +```python +iris_data.groupby(["species"])["sepal_length"].mean() +``` + + + + + species + setosa 5.006\ + versicolor 5.936\ + virginica 6.588\ + Name: sepal_length, dtype: float64 + + + +* This groups the data by 'species' and selects the 'sepal_length' column. +mean() calculates the average sepal length for each species group. +* The output (species_means) is a Series containing the mean sepal length for each species. + + +```python +iris_data.groupby(["species"])["sepal_length"].std() +``` + + + + + species + setosa 0.352490\ + versicolor 0.516171\ + virginica 0.635880\ + Name: sepal_length, dtype: float64 + + + +* Similar to the previous, this groups by 'species' and selects the 'sepal_length' column. +However, it calculates the standard deviation (spread) of sepal length for each species group using std(). +* The output (species_std) is a Series containing the standard deviation of sepal length for each species + + +```python +iris_data.groupby(["species"])["sepal_length"].describe() +``` + + + +| species | count | mean | std | min | 25% | 50% | 75% | max | +|------------|-------|-------|----------|------|--------|------|------|------| +| setosa | 50.0 | 5.006 | 0.352490 | 4.3 | 4.800 | 5.0 | 5.2 | 5.8 | +| versicolor | 50.0 | 5.936 | 0.516171 | 4.9 | 5.600 | 5.9 | 6.3 | 7.0 | +| virginica | 50.0 | 6.588 | 0.635880 | 4.9 | 6.225 | 6.5 | 6.9 | 7.9 | + + + + +* We have used describe() to generate a more comprehensive summary of sepal length for each species group. +* It provides statistics like count, mean, standard deviation, minimum, maximum, percentiles, etc. +The output (species_descriptions) is a DataFrame containing these descriptive statistics for each species. + + +```python +iris_data.groupby(["species"])["sepal_length"].quantile(q=0.25) +``` + + + + + species\ + setosa 4.800\ + versicolor 5.600\ + virginica 6.225\ + Name: sepal_length, dtype: float64 + + + + +```python +iris_data.groupby(["species"])["sepal_length"].quantile(q=0.75) +``` + + + + + species\ + setosa 5.2\ + versicolor 6.3\ + virginica 6.9\ + Name: sepal_length, dtype: float64 + + + +* To calculate the quartiles (25th percentile and 75th percentile) of sepal length for each species group. +* quantile(q=0.25) gives the 25th percentile, which represents the value below which 25% of the data points lie. +* quantile(q=0.75) gives the 75th percentile, which represents the value below which 75% of the data points lie. +* The outputs (species_q1 and species_q3) are Series containing the respective quartile values for each species. + +## Custom Function For Group By + + +```python +nc = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width','species'] +``` + + +```python +nc +``` + + + + + ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'] + + + + +```python +nc = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'] +def species_stats(species_data,species_name): + print("Species Name: {}".format(species_name)) + print() + print("Mean:\n",species_data[nc].mean()) + print() + print("Median:\n",species_data[nc].median()) + print() + print("std:\n",species_data[nc].std()) + print() + print("25% percentile:\n",species_data[nc].quantile(0.25)) + print() + print("75% percentile:\n",species_data[nc].quantile(0.75)) + print() + print("Min:\n",species_data[nc].min()) + print() + print("Max:\n",species_data[nc].max()) + print() +``` + + +```python +setosa_data = iris_data[iris_data['species'] == 'setosa'] +``` + + +```python +versicolor_data = iris_data[iris_data['species'] == 'versicolor'] +``` + + +```python +virginica_data = iris_data[iris_data['species'] == 'virginica'] +``` + + +```python +species_data_names = ['setosa_data','viginica_data','versicolor_data'] +for data in species_data_names: + print("************** Species name {} *****************".format(data)) + species_stats(setosa_data,data) + print("------------------------------------") +``` + + ************** Species name setosa_data *****************\ + Species Name: setosa_data + + Mean:\ + sepal_length 5.006\ + sepal_width 3.428\ + petal_length 1.462\ + petal_width 0.246\ + dtype: float64 + + Median:\ + sepal_length 5.0\ + sepal_width 3.4\ + petal_length 1.5\ + petal_width 0.2\ + dtype: float64 + + std:\ + sepal_length 0.352490\ + sepal_width 0.379064\ + petal_length 0.173664\ + petal_width 0.105386\ + dtype: float64 + + 25% percentile:\ + sepal_length 4.8\ + sepal_width 3.2\ + petal_length 1.4\ + petal_width 0.2\ + Name: 0.25, dtype: float64 + + 75% percentile:\ + sepal_length 5.200\ + sepal_width 3.675\ + petal_length 1.575\ + petal_width 0.300\ + Name: 0.75, dtype: float64 + + Min:\ + sepal_length 4.3\ + sepal_width 2.3\ + petal_length 1.0\ + petal_width 0.1\ + dtype: float64 + + Max: + sepal_length 5.8\ + sepal_width 4.4\ + petal_length 1.9\ + petal_width 0.6\ + dtype: float64 + + ------------------------------------\ + ************** Species name viginica_data *****************\ + Species Name: viginica_data + + Mean:\ + sepal_length 5.006\ + sepal_width 3.428\ + petal_length 1.462\ + petal_width 0.246\ + dtype: float64 + + Median:\ + sepal_length 5.0\ + sepal_width 3.4\ + petal_length 1.5\ + petal_width 0.2\ + dtype: float64 + + std:\ + sepal_length 0.352490\ + sepal_width 0.379064\ + petal_length 0.173664\ + petal_width 0.105386\ + dtype: float64 + + 25% percentile:\ + sepal_length 4.8\ + sepal_width 3.2\ + petal_length 1.4\ + petal_width 0.2\ + Name: 0.25, dtype: float64 + + 75% percentile:\ + sepal_length 5.200\ + sepal_width 3.675\ + petal_length 1.575\ + petal_width 0.300\ + Name: 0.75, dtype: float64 + + Min:\ + sepal_length 4.3\ + sepal_width 2.3\ + petal_length 1.0\ + petal_width 0.1\ + dtype: float64 + + Max: + sepal_length 5.8 + sepal_width 4.4 + petal_length 1.9 + petal_width 0.6 + dtype: float64 + + ------------------------------------\ + ************** Species name versicolor_data *****************\ + Species Name: versicolor_data + + Mean:\ + sepal_length 5.006\ + sepal_width 3.428\ + petal_length 1.462\ + petal_width 0.246\ + dtype: float64 + + Median:\ + sepal_length 5.0\ + sepal_width 3.4\ + petal_length 1.5\ + petal_width 0.2\ + dtype: float64 + + std:\ + sepal_length 0.352490\ + sepal_width 0.379064\ + petal_length 0.173664\ + petal_width 0.105386\ + dtype: float64 + + 25% percentile:\ + sepal_length 4.8\ + sepal_width 3.2\ + petal_length 1.4\ + petal_width 0.2\ + Name: 0.25, dtype: float64 + + 75% percentile:\ + sepal_length 5.200\ + sepal_width 3.675\ + petal_length 1.575\ + petal_width 0.300\ + Name: 0.75, dtype: float64 + + Min: + sepal_length 4.3\ + sepal_width 2.3\ + petal_length 1.0\ + petal_width 0.1\ + dtype: float64 + + Max:\ + sepal_length 5.8\ + sepal_width 4.4\ + petal_length 1.9\ + petal_width 0.6\ + dtype: float64 + + ------------------------------------ + diff --git a/contrib/pandas/handling-missing-values.md b/contrib/pandas/handling-missing-values.md new file mode 100644 index 00000000..da6c377a --- /dev/null +++ b/contrib/pandas/handling-missing-values.md @@ -0,0 +1,264 @@ +# Handling Missing Values in Pandas + +In real life, many datasets arrive with missing data either because it exists and was not collected or it never existed. + +In Pandas missing data is represented by two values: + +* `None` : None is simply is `keyword` refer as empty or none. +* `NaN` : Acronym for `Not a Number`. + +There are several useful functions for detecting, removing, and replacing null values in Pandas DataFrame: + +1. `isnull()` +2. `notnull()` +3. `dropna()` +4. `fillna()` +5. `replace()` + +## 2. Checking for missing values using `isnull()` and `notnull()` + +Let's import pandas and our fancy car-sales dataset having some missing values. + +```python +import pandas as pd + +car_sales_missing_df = pd.read_csv("Datasets/car-sales-missing-data.csv") +print(car_sales_missing_df) +``` + + Make Colour Odometer Doors Price + 0 Toyota White 150043.0 4.0 $4,000 + 1 Honda Red 87899.0 4.0 $5,000 + 2 Toyota Blue NaN 3.0 $7,000 + 3 BMW Black 11179.0 5.0 $22,000 + 4 Nissan White 213095.0 4.0 $3,500 + 5 Toyota Green NaN 4.0 $4,500 + 6 Honda NaN NaN 4.0 $7,500 + 7 Honda Blue NaN 4.0 NaN + 8 Toyota White 60000.0 NaN NaN + 9 NaN White 31600.0 4.0 $9,700 + + + +```python +## Using isnull() + +print(car_sales_missing_df.isnull()) +``` + + Make Colour Odometer Doors Price + 0 False False False False False + 1 False False False False False + 2 False False True False False + 3 False False False False False + 4 False False False False False + 5 False False True False False + 6 False True True False False + 7 False False True False True + 8 False False False True True + 9 True False False False False + + +Note here: +* `True` means for `NaN` values +* `False` means for no `Nan` values + +If we want to find the number of missing values in each column use `isnull().sum()`. + + +```python +print(car_sales_missing_df.isnull().sum()) +``` + + Make 1 + Colour 1 + Odometer 4 + Doors 1 + Price 2 + dtype: int64 + + +You can also check presense of null values in a single column. + + +```python +print(car_sales_missing_df["Odometer"].isnull()) +``` + + 0 False + 1 False + 2 True + 3 False + 4 False + 5 True + 6 True + 7 True + 8 False + 9 False + Name: Odometer, dtype: bool + + + +```python +## using notnull() + +print(car_sales_missing_df.notnull()) +``` + + Make Colour Odometer Doors Price + 0 True True True True True + 1 True True True True True + 2 True True False True True + 3 True True True True True + 4 True True True True True + 5 True True False True True + 6 True False False True True + 7 True True False True False + 8 True True True False False + 9 False True True True True + + +Note here: +* `True` means no `NaN` values +* `False` means for `NaN` values + +`isnull()` means having null values so it gives boolean `True` for NaN values. And `notnull()` means having no null values so it gives `True` for no NaN value. + +## 2. Filling missing values using `fillna()`, `replace()`. + + +```python +## Filling missing values with a single value using `fillna` +print(car_sales_missing_df.fillna(0)) +``` + + Make Colour Odometer Doors Price + 0 Toyota White 150043.0 4.0 $4,000 + 1 Honda Red 87899.0 4.0 $5,000 + 2 Toyota Blue 0.0 3.0 $7,000 + 3 BMW Black 11179.0 5.0 $22,000 + 4 Nissan White 213095.0 4.0 $3,500 + 5 Toyota Green 0.0 4.0 $4,500 + 6 Honda 0 0.0 4.0 $7,500 + 7 Honda Blue 0.0 4.0 0 + 8 Toyota White 60000.0 0.0 0 + 9 0 White 31600.0 4.0 $9,700 + + + +```python +## Filling missing values with the previous value using `ffill()` +print(car_sales_missing_df.ffill()) +``` + + Make Colour Odometer Doors Price + 0 Toyota White 150043.0 4.0 $4,000 + 1 Honda Red 87899.0 4.0 $5,000 + 2 Toyota Blue 87899.0 3.0 $7,000 + 3 BMW Black 11179.0 5.0 $22,000 + 4 Nissan White 213095.0 4.0 $3,500 + 5 Toyota Green 213095.0 4.0 $4,500 + 6 Honda Green 213095.0 4.0 $7,500 + 7 Honda Blue 213095.0 4.0 $7,500 + 8 Toyota White 60000.0 4.0 $7,500 + 9 Toyota White 31600.0 4.0 $9,700 + + + +```python +## illing null value with the next ones using 'bfill()' +print(car_sales_missing_df.bfill()) +``` + + Make Colour Odometer Doors Price + 0 Toyota White 150043.0 4.0 $4,000 + 1 Honda Red 87899.0 4.0 $5,000 + 2 Toyota Blue 11179.0 3.0 $7,000 + 3 BMW Black 11179.0 5.0 $22,000 + 4 Nissan White 213095.0 4.0 $3,500 + 5 Toyota Green 60000.0 4.0 $4,500 + 6 Honda Blue 60000.0 4.0 $7,500 + 7 Honda Blue 60000.0 4.0 $9,700 + 8 Toyota White 60000.0 4.0 $9,700 + 9 NaN White 31600.0 4.0 $9,700 + + +#### Filling a null values using `replace()` method + +Now we are going to replace the all `NaN` value in the data frame with -125 value + +For this we will also need numpy + + +```python +import numpy as np + +print(car_sales_missing_df.replace(to_replace = np.nan, value = -125)) +``` + + Make Colour Odometer Doors Price + 0 Toyota White 150043.0 4.0 $4,000 + 1 Honda Red 87899.0 4.0 $5,000 + 2 Toyota Blue -125.0 3.0 $7,000 + 3 BMW Black 11179.0 5.0 $22,000 + 4 Nissan White 213095.0 4.0 $3,500 + 5 Toyota Green -125.0 4.0 $4,500 + 6 Honda -125 -125.0 4.0 $7,500 + 7 Honda Blue -125.0 4.0 -125 + 8 Toyota White 60000.0 -125.0 -125 + 9 -125 White 31600.0 4.0 $9,700 + + +## 3. Dropping missing values using `dropna()` + +In order to drop a null values from a dataframe, we used `dropna()` function this function drop Rows/Columns of datasets with Null values in different ways. + +#### Dropping rows with at least 1 null value. + + +```python +print(car_sales_missing_df.dropna(axis = 0)) ##Now we drop rows with at least one Nan value (Null value) +``` + + Make Colour Odometer Doors Price + 0 Toyota White 150043.0 4.0 $4,000 + 1 Honda Red 87899.0 4.0 $5,000 + 3 BMW Black 11179.0 5.0 $22,000 + 4 Nissan White 213095.0 4.0 $3,500 + + +#### Dropping rows if all values in that row are missing. + + +```python +print(car_sales_missing_df.dropna(how = 'all',axis = 0)) ## If not have leave the row as it is +``` + + Make Colour Odometer Doors Price + 0 Toyota White 150043.0 4.0 $4,000 + 1 Honda Red 87899.0 4.0 $5,000 + 2 Toyota Blue NaN 3.0 $7,000 + 3 BMW Black 11179.0 5.0 $22,000 + 4 Nissan White 213095.0 4.0 $3,500 + 5 Toyota Green NaN 4.0 $4,500 + 6 Honda NaN NaN 4.0 $7,500 + 7 Honda Blue NaN 4.0 NaN + 8 Toyota White 60000.0 NaN NaN + 9 NaN White 31600.0 4.0 $9,700 + + +#### Dropping columns with at least 1 null value + + +```python +print(car_sales_missing_df.dropna(axis = 1)) +``` + + Empty DataFrame + Columns: [] + Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] + + +Now we drop a columns which have at least 1 missing values. + +Here the dataset becomes empty after `dropna()` because each column as atleast 1 null value so it remove that columns resulting in an empty dataframe. diff --git a/contrib/pandas/import-export.md b/contrib/pandas/import-export.md new file mode 100644 index 00000000..23d1ad82 --- /dev/null +++ b/contrib/pandas/import-export.md @@ -0,0 +1,46 @@ +# Importing and Exporting Data in Pandas + +## Importing Data from a CSV + +We can create `Series` and `DataFrame` in pandas, but often we have to import the data which is in the form of `.csv` (Comma Separated Values), a spreadsheet file or similar tabular data file format. + +`pandas` allows for easy importing of this data using functions such as `read_csv()` and `read_excel()` for Microsoft Excel files. + +*Note: In case you want to get the information from a **Google Sheet** you can export it as a .csv file.* + +The `read_csv()` function can be used to import a CSV file into a pandas DataFrame. The path can be a file system path or a URL where the CSV is available. + +```python +import pandas as pd + +car_sales_df= pd.read_csv("Datasets/car-sales.csv") +print(car_sales_df) +``` + +``` + Make Colour Odometer (KM) Doors Price + 0 Toyota White 150043 4 $4,000.00 + 1 Honda Red 87899 4 $5,000.00 + 2 Toyota Blue 32549 3 $7,000.00 + 3 BMW Black 11179 5 $22,000.00 + 4 Nissan White 213095 4 $3,500.00 + 5 Toyota Green 99213 4 $4,500.00 + 6 Honda Blue 45698 4 $7,500.00 + 7 Honda Blue 54738 4 $7,000.00 + 8 Toyota White 60000 4 $6,250.00 + 9 Nissan White 31600 4 $9,700.00 +``` + +You can find the dataset used above in the `Datasets` folder. + +*Note: If you want to import the data from Github you can't directly use its link, you have to first obtain the raw file URL by clicking on the raw button present in the repo* + +## Exporting Data to a CSV + +`pandas` allows you to export `DataFrame` to `.csv` format using `.to_csv()`, or to a Excel spreadsheet using `.to_excel()`. + +```python +car_sales_df.to_csv("exported_car_sales.csv") +``` + +Running this will save a file called ``exported_car_sales.csv`` to the current folder. diff --git a/contrib/pandas/index.md b/contrib/pandas/index.md new file mode 100644 index 00000000..db008e2f --- /dev/null +++ b/contrib/pandas/index.md @@ -0,0 +1,12 @@ +# List of sections + +- [Pandas Introduction and Dataframes in Pandas](introduction.md) +- [Viewing data in pandas](viewing-data.md) +- [Pandas Series Vs NumPy ndarray](pandas-series-vs-numpy-ndarray.md) +- [Pandas Descriptive Statistics](descriptive-statistics.md) +- [Group By Functions with Pandas](groupby-functions.md) +- [Excel using Pandas DataFrame](excel-with-pandas.md) +- [Working with Date & Time in Pandas](datetime.md) +- [Importing and Exporting Data in Pandas](import-export.md) +- [Handling Missing Values in Pandas](handling-missing-values.md) +- [Pandas Series](pandas-series.md) diff --git a/contrib/pandas/introduction.md b/contrib/pandas/introduction.md new file mode 100644 index 00000000..3552437e --- /dev/null +++ b/contrib/pandas/introduction.md @@ -0,0 +1,244 @@ +# Introduction_to_Pandas_Library_and_DataFrames + +**As you have learnt Python Programming , now it's time for some applications.** + +- Machine Learning and Data Science is the emerging field of today's time , to work in this this field your first step should be `Data Science` as Machine Learning is all about data. +- To begin with Data Science your first tool will be `Pandas Library`. + +## Introduction of Pandas Library + +Pandas is a data analysis and manipulation tool, built on top of the python programming language. Pandas got its name from the term Panel data (‘Pa’ from Panel and ‘da’ from data). Panel data is a data which have rows and columns in it like excel spreadsheets, csv files etc. + +**To use Pandas, first we’ve to import it.** + +## Why pandas? + +* Pandas provides a simple-to-use but very capable set of functions that you can use on your data. +* It is also associate with other machine learning libraries , so it is important to learn it. + +* For example - It is highly used to transform tha data which will be use by machine learning model during the training. + + +```python +# Importing the pandas +import pandas as pd +``` + +*To import any module in Python use “import 'module name' ” command, I used “pd” as pandas abbreviation because we don’t need to type pandas every time only type “pd” to use pandas.* + + +```python +# To check available pandas version +print(f"Pandas Version is : {pd.__version__}") +``` + + Pandas Version is : 2.1.4 + + +## Understanding Pandas data types + +Pandas has two main data types : `Series` and `DataFrames` + +* `pandas.Series` is a 1-dimensional column of data. +* `pandas.DataFrames` is 2 -dimensional data table having rows and columns. + +### 1. Series datatype + +**To creeate a series you can use `pd.Series()` and passing a python list inside()**. + +Note: S in Series is capital if you use small s it will give you an error. + +> Let's create a series + + + +```python +# Creating a series of car companies +cars = pd.Series(["Honda","Audi","Thar","BMW"]) +cars +``` + + + + + 0 Honda + 1 Audi + 2 Thar + 3 BMW + dtype: object + + + +The above code creates a Series of cars companies the name of series is “cars” the code “pd.Series([“Honda” , “Audi” , “Thar”, "BMW"])” means Hey! pandas (pd) create a Series of cars named "Honda" , "Audi" , "Thar" and "BMW". + +The default index of a series is 0,1,2….(Remember it starts from 0) + +To change the index of any series set the “index” parameter accordingly. It takes the list of index values: + + +```python +cars = pd.Series(["Honda","Audi","Thar","BMW"],index = ["A" , "B" , "C" ,"D"]) +cars +``` + + + + + A Honda + B Audi + C Thar + D BMW + dtype: object + + + +You can see that the index has been changed from numbers to A, B ,C and D. + +And the mentioned ‘dtype’ tells us about the type of data we have in the series. + +### 2. DataFrames datatype + +DataFrame contains rows and columns like a csv file have. + +You can also create a DataFrame by using `pd.DataFrame()` and passing it a Python dictionary. + + +```python +# Let's create +cars_with_colours = pd.DataFrame({"Cars" : ["BMW","Audi","Thar","Honda"], + "Colour" : ["Black","White","Red","Green"]}) +print(cars_with_colours) +``` + + Cars Colour + 0 BMW Black + 1 Audi White + 2 Thar Red + 3 Honda Green + + +The dictionary key is the `column name` and value are the `column data`. + +*You can also create a DataFrame with the help of series.* + + +```python +# Let's create two series +students = pd.Series(["Ram","Mohan","Krishna","Shivam"]) +age = pd.Series([19,20,21,24]) + +students +``` + + + + + 0 Ram + 1 Mohan + 2 Krishna + 3 Shivam + dtype: object + + + + +```python +age +``` + + + + + 0 19 + 1 20 + 2 21 + 3 24 + dtype: int64 + + + + +```python +# Now let's create a dataframe with the help of above series +# pass the series name to the dictionary value + +record = pd.DataFrame({"Student_Name":students , + "Age" :age}) +print(record) +``` + + Student_Name Age + 0 Ram 19 + 1 Mohan 20 + 2 Krishna 21 + 3 Shivam 24 + + + +```python +# To print the list of columns names +record.columns +``` + + + + + Index(['Student_Name', 'Age'], dtype='object') + + + +### Describe Data + +**The good news is that pandas has many built-in functions which allow you to quickly get information about a DataFrame.** +Let's explore the `record` dataframe + +#### 1. Use `.dtypes` to find what datatype a column contains + + +```python +record.dtypes +``` + + + + + Student_Name object + Age int64 + dtype: object + + + +#### 2. use `.describe()` for statistical overview. + + +```python +print(record.describe()) # It only display the results for numeric data +``` + + Age + count 4.000000 + mean 21.000000 + std 2.160247 + min 19.000000 + 25% 19.750000 + 50% 20.500000 + 75% 21.750000 + max 24.000000 + + +#### 3. Use `.info()` to find information about the dataframe + + +```python +record.info() +``` + + + RangeIndex: 4 entries, 0 to 3 + Data columns (total 2 columns): + # Column Non-Null Count Dtype + --- ------ -------------- ----- + 0 Student_Name 4 non-null object + 1 Age 4 non-null int64 + dtypes: int64(1), object(1) + memory usage: 196.0+ bytes diff --git a/contrib/pandas/pandas-series-vs-numpy-ndarray.md b/contrib/pandas/pandas-series-vs-numpy-ndarray.md new file mode 100644 index 00000000..e739766b --- /dev/null +++ b/contrib/pandas/pandas-series-vs-numpy-ndarray.md @@ -0,0 +1,72 @@ +# Pandas Series Vs NumPy ndarray + +NumPy ndarray and Pandas Series are two fundamental data structures in Python for handling and manipulating data. While they share some similarities, they also have distinct characteristics that make them suitable for different tasks. + +Both NumPy ndarray and Pandas Series are essential tools for data manipulation in Python, Choosing between them depends on the nature of your data and the specific tasks you need to perform. + +## NumPy ndarray + +NumPy is short form for Numerical Python, provides a powerful array object called `ndarray`, It is very important for many scientific and mathematical Python libraries. ndarray is also called n-dimensional array. Indexing in ndarray is integer based indexing (like arr[0], arr[3], etc.). + +Features of NumPy `ndarray`: + +- **Homogeneous Data**: All elements in a NumPy array are of the same data type, which allows for efficient storage and computation. +- **Efficient Computation and Performance**: NumPy arrays are designed for numerical operations and are highly efficient. They support vectorized operations, allowing you to perform operations on entire arrays rather than individual elements. +- **Multi-dimensional**: NumPy arrays can be multi-dimensional, making them suitable for representing complex numerical data structures like matrices and n-dimensional arrays. + +Example of creating a NumPy array: + +```python +import numpy as np + +narr = np.array(['A', 'B', 'C', 'D', 'E']) +print(narr) +``` +Output: +```python +['A' 'B' 'C' 'D' 'E'] +``` +### Usage of NumPy ndarray: + +- When you need to perform mathematical operations on numerical data. +- When you’re working with multi-dimensional data. +- When computational efficiency is important. +- When you need to store data of same data type. + +## Pandas Series + +Pandas is a Python library used for data manipulation and analysis, introduces the `Series` data structure, which is designed for handling labeled one-dimensional data efficiently. Indexing in Pandas Series is Label-based. It effectively handles heterogeneous data. + +Features of Pandas `Series`: + +- **Labeled Data**: Pandas Series associates a label (or index) with each element of the array, making it easier to work with heterogeneous or labeled data. + +- **Heterogeneous Data**: Unlike NumPy arrays, Pandas Series can hold data of different types (integers, floats, strings, etc.) within the same object. + +- **Data Alignment**: One of the powerful features of Pandas Series is its ability to automatically align data based on label. + +Example of creating a Pandas Series: + +```python +import pandas as pd + +series = pd.Series([1,'B', 5, 7, 6, 8], index = ['a','b','c','d','e','f']) +print(series) +``` +Output: +```python +a 1 +b B +c 5 +d 7 +e 6 +f 8 +dtype: object +``` + +### Usage of Pandas Series: + +- When you need to manipulate and analyze labeled data. +- When you’re dealing with heterogeneous data or missing values. +- When you need more high-level, flexible data manipulation functions. +- When you are dealing with One-dimensional data. diff --git a/contrib/pandas/pandas-series.md b/contrib/pandas/pandas-series.md new file mode 100644 index 00000000..88b12351 --- /dev/null +++ b/contrib/pandas/pandas-series.md @@ -0,0 +1,317 @@ +# Pandas Series + +A series is a Panda data structures that represents a one dimensional array-like object containing an array of data and an associated array of data type labels, called index. + +## Creating a Series object: + +### Basic Series +To create a basic Series, you can pass a list or array of data to the `pd.Series()` function. + +```python +import pandas as pd + +s1 = pd.Series([4, 5, 2, 3]) +print(s1) +``` + +#### Output +``` +0 4 +1 5 +2 2 +3 3 +dtype: int64 +``` + +### Series from a Dictionary + +If you pass a dictionary to `pd.Series()`, the keys become the index and the values become the data of the Series. +```python +import pandas as pd + +s2 = pd.Series({'A': 1, 'B': 2, 'C': 3}) +print(s2) +``` + +#### Output +``` +A 1 +B 2 +C 3 +dtype: int64 +``` + + +## Additional Functionality + + +### Specifying Data Type and Index +You can specify the data type and index while creating a Series. +```python +import pandas as pd + +s4 = pd.Series([1, 2, 3], index=['a', 'b', 'c'], dtype='float64') +print(s4) +``` + +#### Output +``` +a 1.0 +b 2.0 +c 3.0 +dtype: float64 +``` + +### Specifying NaN Values: +* Sometimes you need to create a series object of a certain size but you do not have complete data available so in such cases you can fill missing data with a NaN(Not a Number) value. +* When you store NaN value in series object, the data type must be floating pont type. Even if you specify an integer type , pandas will promote it to floating point type automatically because NaN is not supported by integer type. + +```python +import pandas as pd +s3=pd.Series([1,np.Nan,2]) +print(s3) +``` + +#### Output +``` +0 1.0 +1 NaN +2 2.0 +dtype: float64 +``` + + +### Creating Data from Expressions +You can create a Series using an expression or function. + +``=np.Series(data=,index=None) + +```python +import pandas as pd +a=np.arange(1,5) # [1,2,3,4] +s5=pd.Series(data=a**2,index=a) +print(s5) +``` + +#### Output +``` +1 1 +2 4 +3 9 +4 16 +dtype: int64 +``` + +## Series Object Attributes + +| **Attribute** | **Description** | +|--------------------------|---------------------------------------------------| +| `.index` | Array of index of the Series | +| `.values` | Array of values of the Series | +| `.dtype` | Return the dtype of the data | +| `.shape` | Return a tuple representing the shape of the data | +| `.ndim` | Return the number of dimensions of the data | +| `.size` | Return the number of elements in the data | +| `.hasnans` | Return True if there is any NaN in the data | +| `.empty` | Return True if the Series object is empty | + +- If you use len() on a series object then it return total number of elements in the series object whereas .count() return only the number of non NaN elements. + +## Accessing a Series object and its elements + +### Accessing Individual Elements +You can access individual elements using their index. +'legal' indexes arte used to access individual element. +```python +import pandas as pd + +s7 = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D']) +print(s7['A']) +``` + +#### Output +``` +13 +``` + +### Slicing a Series + +- Slices are extracted based on their positional index, regardless of the custom index labels. +- Each element in the Series has a positional index starting from 0 (i.e., 0 for the first element, 1 for the second element, and so on). +- `[:]` will return the values of the elements between the start and end positions (excluding the end position). + +#### Example + +```python +import pandas as pd + +s = pd.Series(data=[13, 45, 67, 89], index=['A', 'B', 'C', 'D']) +print(s[:2]) +``` + +#### Output +``` +A 13 +B 45 +dtype: int64 +``` + +This example demonstrates that the first two elements (positions 0 and 1) are returned, regardless of their custom index labels. + +## Operation on series object + +### Modifying elements and indexes +* [indexes]=< new data value > +* [start : end]=< new data value > +* .index=[new indexes] + +```python +import pandas as pd + +s8 = pd.Series([10, 20, 30], index=['a', 'b', 'c']) +s8['a'] = 100 +s8.index = ['x', 'y', 'z'] +print(s8) +``` + +#### Output +``` +x 100 +y 20 +z 30 +dtype: int64 +``` + +**Note: Series object are value-mutable but size immutable objects.** + +### Vector operations +We can perform vector operations such as `+`,`-`,`/`,`%` etc. + +#### Addition +```python +import pandas as pd + +s9 = pd.Series([1, 2, 3]) +print(s9 + 5) +``` + +#### Output +``` +0 6 +1 7 +2 8 +dtype: int64 +``` + +#### Subtraction +```python +print(s9 - 2) +``` + +#### Output +``` +0 -1 +1 0 +2 1 +dtype: int64 +``` + +### Arthmetic on series object + +#### Addition +```python +import pandas as pd + +s10 = pd.Series([1, 2, 3]) +s11 = pd.Series([4, 5, 6]) +print(s10 + s11) +``` + +#### Output +``` +0 5 +1 7 +2 9 +dtype: int64 +``` + +#### Multiplication + +```python +print("s10 * s11) +``` + +#### Output +``` +0 4 +1 10 +2 18 +dtype: int64 +``` + +Here one thing we should keep in mind that both the series object should have same indexes otherwise it will return NaN value to all the indexes of two series object . + + +### Head and Tail Functions + +| **Functions** | **Description** | +|--------------------------|---------------------------------------------------| +| `.head(n)` | return the first n elements of the series | +| `.tail(n)` | return the last n elements of the series | + +```python +import pandas as pd + +s12 = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100]) +print(s12.head(3)) +print(s12.tail(3)) +``` + +#### Output +``` +0 10 +1 20 +2 30 +dtype: int64 +7 80 +8 90 +9 100 +dtype: int64 +``` + +If you dont provide any value to n the by default it give results for `n=5`. + +### Few extra functions + +| **Function** | **Description** | +|----------------------------------------|------------------------------------------------------------------------| +| `.sort_values()` | Return the Series object in ascending order based on its values. | +| `.sort_index()` | Return the Series object in ascending order based on its index. | +| `.sort_drop()` | Return the Series with the deleted index and its corresponding value. | + +```python +import pandas as pd + +s13 = pd.Series([3, 1, 2], index=['c', 'a', 'b']) +print(s13.sort_values()) +print(s13.sort_index()) +print(s13.drop('a')) +``` + +#### Output +``` +a 1 +b 2 +c 3 +dtype: int64 +a 1 +b 2 +c 3 +dtype: int64 +c 3 +b 2 +dtype: int64 +``` + +## Conclusion +In short, Pandas Series is a fundamental data structure in Python for handling one-dimensional data. It combines an array of values with an index, offering efficient methods for data manipulation and analysis. With its ease of use and powerful functionality, Pandas Series is widely used in data science and analytics for tasks such as data cleaning, exploration, and visualization. diff --git a/contrib/pandas/viewing-data.md b/contrib/pandas/viewing-data.md new file mode 100644 index 00000000..8aaa4ae4 --- /dev/null +++ b/contrib/pandas/viewing-data.md @@ -0,0 +1,67 @@ +# Viewing rows of the frame + +## `head()` method + +The pandas library in Python provides a convenient method called `head()` that allows you to view the first few rows of a DataFrame. Let me explain how it works: +- The `head()` function returns the first n rows of a DataFrame or Series. +- By default, it displays the first 5 rows, but you can specify a different number of rows using the n parameter. + +### Syntax + +```python +dataframe.head(n) +``` + +`n` is the Optional value. The number of rows to return. Default value is `5`. + +### Example + +```python +import pandas as pd +df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion','tiger','rabit','dog','fox','monkey','elephant']}) +df.head(n=5) +``` + +#### Output + +``` + animal +0 alligator +1 bee +2 falcon +3 lion +4 tiger +``` + +## `tail()` method + +The `tail()` function in Python displays the last five rows of the dataframe by default. It takes in a single parameter: the number of rows. We can use this parameter to display the number of rows of our choice. +- The `tail()` function returns the last n rows of a DataFrame or Series. +- By default, it displays the last 5 rows, but you can specify a different number of rows using the n parameter. + +### Syntax + +```python +dataframe.tail(n) +``` + +`n` is the Optional value. The number of rows to return. Default value is `5`. + +### Example + +```python +import pandas as pd +df = pd.DataFrame({'fruits': ['mongo', 'orange', 'apple', 'lemon','banana','water melon','papaya','grapes','cherry','coconut']}) +df.tail(n=5) +``` + +#### Output + +``` + fruits +5 water melon +6 papaya +7 grapes +8 cherry +9 coconut +``` diff --git a/contrib/plotting-visualization/images/Subplots.png b/contrib/plotting-visualization/images/Subplots.png new file mode 100644 index 00000000..9a9ceb57 Binary files /dev/null and b/contrib/plotting-visualization/images/Subplots.png differ diff --git a/contrib/plotting-visualization/images/autopct.png b/contrib/plotting-visualization/images/autopct.png new file mode 100644 index 00000000..8030fc0e Binary files /dev/null and b/contrib/plotting-visualization/images/autopct.png differ diff --git a/contrib/plotting-visualization/images/bar_colors_and_legends.png b/contrib/plotting-visualization/images/bar_colors_and_legends.png new file mode 100644 index 00000000..e3af5965 Binary files /dev/null and b/contrib/plotting-visualization/images/bar_colors_and_legends.png differ diff --git a/contrib/plotting-visualization/images/bar_labels.png b/contrib/plotting-visualization/images/bar_labels.png new file mode 100644 index 00000000..9fdc1a6a Binary files /dev/null and b/contrib/plotting-visualization/images/bar_labels.png differ diff --git a/contrib/plotting-visualization/images/barplot.png b/contrib/plotting-visualization/images/barplot.png new file mode 100644 index 00000000..b6c9446e Binary files /dev/null and b/contrib/plotting-visualization/images/barplot.png differ diff --git a/contrib/plotting-visualization/images/basic_bar_plot.png b/contrib/plotting-visualization/images/basic_bar_plot.png new file mode 100644 index 00000000..c54fa0f2 Binary files /dev/null and b/contrib/plotting-visualization/images/basic_bar_plot.png differ diff --git a/contrib/plotting-visualization/images/basic_pie_chart.png b/contrib/plotting-visualization/images/basic_pie_chart.png new file mode 100644 index 00000000..3feb244a Binary files /dev/null and b/contrib/plotting-visualization/images/basic_pie_chart.png differ diff --git a/contrib/plotting-visualization/images/coloring_slices.png b/contrib/plotting-visualization/images/coloring_slices.png new file mode 100644 index 00000000..9c87b70a Binary files /dev/null and b/contrib/plotting-visualization/images/coloring_slices.png differ diff --git a/contrib/plotting-visualization/images/dot-line.png b/contrib/plotting-visualization/images/dot-line.png new file mode 100644 index 00000000..ff9e67f5 Binary files /dev/null and b/contrib/plotting-visualization/images/dot-line.png differ diff --git a/contrib/plotting-visualization/images/explode_slice.png b/contrib/plotting-visualization/images/explode_slice.png new file mode 100644 index 00000000..7bf344d5 Binary files /dev/null and b/contrib/plotting-visualization/images/explode_slice.png differ diff --git a/contrib/plotting-visualization/images/hatch_patterns.png b/contrib/plotting-visualization/images/hatch_patterns.png new file mode 100644 index 00000000..d5fed83e Binary files /dev/null and b/contrib/plotting-visualization/images/hatch_patterns.png differ diff --git a/contrib/plotting-visualization/images/histogram.png b/contrib/plotting-visualization/images/histogram.png new file mode 100644 index 00000000..af35141b Binary files /dev/null and b/contrib/plotting-visualization/images/histogram.png differ diff --git a/contrib/plotting-visualization/images/horizontal_bar_plot_1.png b/contrib/plotting-visualization/images/horizontal_bar_plot_1.png new file mode 100644 index 00000000..fa65b5ad Binary files /dev/null and b/contrib/plotting-visualization/images/horizontal_bar_plot_1.png differ diff --git a/contrib/plotting-visualization/images/horizontal_bar_plot_2.png b/contrib/plotting-visualization/images/horizontal_bar_plot_2.png new file mode 100644 index 00000000..3d2d134c Binary files /dev/null and b/contrib/plotting-visualization/images/horizontal_bar_plot_2.png differ diff --git a/contrib/plotting-visualization/images/img_colorbar.png b/contrib/plotting-visualization/images/img_colorbar.png new file mode 100644 index 00000000..acc1ec5b Binary files /dev/null and b/contrib/plotting-visualization/images/img_colorbar.png differ diff --git a/contrib/plotting-visualization/images/legends.png b/contrib/plotting-visualization/images/legends.png new file mode 100644 index 00000000..880d3233 Binary files /dev/null and b/contrib/plotting-visualization/images/legends.png differ diff --git a/contrib/plotting-visualization/images/line-asymptote.png b/contrib/plotting-visualization/images/line-asymptote.png new file mode 100644 index 00000000..b3d21da0 Binary files /dev/null and b/contrib/plotting-visualization/images/line-asymptote.png differ diff --git a/contrib/plotting-visualization/images/line-curve.png b/contrib/plotting-visualization/images/line-curve.png new file mode 100644 index 00000000..2c2531a0 Binary files /dev/null and b/contrib/plotting-visualization/images/line-curve.png differ diff --git a/contrib/plotting-visualization/images/line-labels.png b/contrib/plotting-visualization/images/line-labels.png new file mode 100644 index 00000000..26505b15 Binary files /dev/null and b/contrib/plotting-visualization/images/line-labels.png differ diff --git a/contrib/plotting-visualization/images/line-ticks.png b/contrib/plotting-visualization/images/line-ticks.png new file mode 100644 index 00000000..3a6ed78a Binary files /dev/null and b/contrib/plotting-visualization/images/line-ticks.png differ diff --git a/contrib/plotting-visualization/images/line-with-text-scale.png b/contrib/plotting-visualization/images/line-with-text-scale.png new file mode 100644 index 00000000..e402a3e5 Binary files /dev/null and b/contrib/plotting-visualization/images/line-with-text-scale.png differ diff --git a/contrib/plotting-visualization/images/plotly-bar-colors-basic.png b/contrib/plotting-visualization/images/plotly-bar-colors-basic.png new file mode 100644 index 00000000..9b940133 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-bar-colors-basic.png differ diff --git a/contrib/plotting-visualization/images/plotly-bar-colors.png b/contrib/plotting-visualization/images/plotly-bar-colors.png new file mode 100644 index 00000000..d7e20a73 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-bar-colors.png differ diff --git a/contrib/plotting-visualization/images/plotly-bar-labels-1.png b/contrib/plotting-visualization/images/plotly-bar-labels-1.png new file mode 100644 index 00000000..2c4d9f5d Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-bar-labels-1.png differ diff --git a/contrib/plotting-visualization/images/plotly-bar-labels-2.png b/contrib/plotting-visualization/images/plotly-bar-labels-2.png new file mode 100644 index 00000000..05fcda6f Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-bar-labels-2.png differ diff --git a/contrib/plotting-visualization/images/plotly-bar-labels-3.png b/contrib/plotting-visualization/images/plotly-bar-labels-3.png new file mode 100644 index 00000000..967b0655 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-bar-labels-3.png differ diff --git a/contrib/plotting-visualization/images/plotly-bar-title.png b/contrib/plotting-visualization/images/plotly-bar-title.png new file mode 100644 index 00000000..6e622abe Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-bar-title.png differ diff --git a/contrib/plotting-visualization/images/plotly-basic-bar-plot.png b/contrib/plotting-visualization/images/plotly-basic-bar-plot.png new file mode 100644 index 00000000..7e1f300b Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-basic-bar-plot.png differ diff --git a/contrib/plotting-visualization/images/plotly-basic-line-chart.png b/contrib/plotting-visualization/images/plotly-basic-line-chart.png new file mode 100644 index 00000000..fa5955f0 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-basic-line-chart.png differ diff --git a/contrib/plotting-visualization/images/plotly-basic-pie-chart.png b/contrib/plotting-visualization/images/plotly-basic-pie-chart.png new file mode 100644 index 00000000..bb827f9a Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-basic-pie-chart.png differ diff --git a/contrib/plotting-visualization/images/plotly-basic-scatter-plot.png b/contrib/plotting-visualization/images/plotly-basic-scatter-plot.png new file mode 100644 index 00000000..64b6234e Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-basic-scatter-plot.png differ diff --git a/contrib/plotting-visualization/images/plotly-horizontal-bar-plot.png b/contrib/plotting-visualization/images/plotly-horizontal-bar-plot.png new file mode 100644 index 00000000..dde43a13 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-horizontal-bar-plot.png differ diff --git a/contrib/plotting-visualization/images/plotly-line-color.png b/contrib/plotting-visualization/images/plotly-line-color.png new file mode 100644 index 00000000..e8dbc1c4 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-line-color.png differ diff --git a/contrib/plotting-visualization/images/plotly-line-dashed.png b/contrib/plotting-visualization/images/plotly-line-dashed.png new file mode 100644 index 00000000..b7e18e2b Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-line-dashed.png differ diff --git a/contrib/plotting-visualization/images/plotly-line-dasheddotted.png b/contrib/plotting-visualization/images/plotly-line-dasheddotted.png new file mode 100644 index 00000000..c3e31fb2 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-line-dasheddotted.png differ diff --git a/contrib/plotting-visualization/images/plotly-line-datapoint-label.png b/contrib/plotting-visualization/images/plotly-line-datapoint-label.png new file mode 100644 index 00000000..6480d654 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-line-datapoint-label.png differ diff --git a/contrib/plotting-visualization/images/plotly-line-dotted.png b/contrib/plotting-visualization/images/plotly-line-dotted.png new file mode 100644 index 00000000..5f92ad4d Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-line-dotted.png differ diff --git a/contrib/plotting-visualization/images/plotly-line-markers.png b/contrib/plotting-visualization/images/plotly-line-markers.png new file mode 100644 index 00000000..1197268f Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-line-markers.png differ diff --git a/contrib/plotting-visualization/images/plotly-line-multiple-lines.png b/contrib/plotting-visualization/images/plotly-line-multiple-lines.png new file mode 100644 index 00000000..68a4139e Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-line-multiple-lines.png differ diff --git a/contrib/plotting-visualization/images/plotly-line-title.png b/contrib/plotting-visualization/images/plotly-line-title.png new file mode 100644 index 00000000..1d7ce85f Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-line-title.png differ diff --git a/contrib/plotting-visualization/images/plotly-line-width.png b/contrib/plotting-visualization/images/plotly-line-width.png new file mode 100644 index 00000000..7cbe21e3 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-line-width.png differ diff --git a/contrib/plotting-visualization/images/plotly-long-format-bar-plot.png b/contrib/plotting-visualization/images/plotly-long-format-bar-plot.png new file mode 100644 index 00000000..5bb67784 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-long-format-bar-plot.png differ diff --git a/contrib/plotting-visualization/images/plotly-pie-color-1.png b/contrib/plotting-visualization/images/plotly-pie-color-1.png new file mode 100644 index 00000000..9ff0ab91 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-pie-color-1.png differ diff --git a/contrib/plotting-visualization/images/plotly-pie-color-2.png b/contrib/plotting-visualization/images/plotly-pie-color-2.png new file mode 100644 index 00000000..d46fea98 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-pie-color-2.png differ diff --git a/contrib/plotting-visualization/images/plotly-pie-labels.png b/contrib/plotting-visualization/images/plotly-pie-labels.png new file mode 100644 index 00000000..3a246591 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-pie-labels.png differ diff --git a/contrib/plotting-visualization/images/plotly-pie-patterns.png b/contrib/plotting-visualization/images/plotly-pie-patterns.png new file mode 100644 index 00000000..a07bb3d0 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-pie-patterns.png differ diff --git a/contrib/plotting-visualization/images/plotly-pie-pull.png b/contrib/plotting-visualization/images/plotly-pie-pull.png new file mode 100644 index 00000000..202314b8 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-pie-pull.png differ diff --git a/contrib/plotting-visualization/images/plotly-pie-title.png b/contrib/plotting-visualization/images/plotly-pie-title.png new file mode 100644 index 00000000..e3d3ae7e Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-pie-title.png differ diff --git a/contrib/plotting-visualization/images/plotly-rounded-bars.png b/contrib/plotting-visualization/images/plotly-rounded-bars.png new file mode 100644 index 00000000..fa3b83b8 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-rounded-bars.png differ diff --git a/contrib/plotting-visualization/images/plotly-scatter-colour-2.png b/contrib/plotting-visualization/images/plotly-scatter-colour-2.png new file mode 100644 index 00000000..c6b3f14f Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-scatter-colour-2.png differ diff --git a/contrib/plotting-visualization/images/plotly-scatter-colour.png b/contrib/plotting-visualization/images/plotly-scatter-colour.png new file mode 100644 index 00000000..ef4819b2 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-scatter-colour.png differ diff --git a/contrib/plotting-visualization/images/plotly-scatter-hover.png b/contrib/plotting-visualization/images/plotly-scatter-hover.png new file mode 100644 index 00000000..20889573 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-scatter-hover.png differ diff --git a/contrib/plotting-visualization/images/plotly-scatter-size.png b/contrib/plotting-visualization/images/plotly-scatter-size.png new file mode 100644 index 00000000..3f8b78c2 Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-scatter-size.png differ diff --git a/contrib/plotting-visualization/images/plotly-scatter-title.png b/contrib/plotting-visualization/images/plotly-scatter-title.png new file mode 100644 index 00000000..39f85d0d Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-scatter-title.png differ diff --git a/contrib/plotting-visualization/images/plotly-wide-format-bar-plot.png b/contrib/plotting-visualization/images/plotly-wide-format-bar-plot.png new file mode 100644 index 00000000..ff3523ca Binary files /dev/null and b/contrib/plotting-visualization/images/plotly-wide-format-bar-plot.png differ diff --git a/contrib/plotting-visualization/images/radius.png b/contrib/plotting-visualization/images/radius.png new file mode 100644 index 00000000..01b4664b Binary files /dev/null and b/contrib/plotting-visualization/images/radius.png differ diff --git a/contrib/plotting-visualization/images/rotating_slices.png b/contrib/plotting-visualization/images/rotating_slices.png new file mode 100644 index 00000000..bca1c978 Binary files /dev/null and b/contrib/plotting-visualization/images/rotating_slices.png differ diff --git a/contrib/plotting-visualization/images/scatter_color.png b/contrib/plotting-visualization/images/scatter_color.png new file mode 100644 index 00000000..17c6ddcc Binary files /dev/null and b/contrib/plotting-visualization/images/scatter_color.png differ diff --git a/contrib/plotting-visualization/images/scatter_coloreachdot.png b/contrib/plotting-visualization/images/scatter_coloreachdot.png new file mode 100644 index 00000000..c6636296 Binary files /dev/null and b/contrib/plotting-visualization/images/scatter_coloreachdot.png differ diff --git a/contrib/plotting-visualization/images/scatter_colormap1.png b/contrib/plotting-visualization/images/scatter_colormap1.png new file mode 100644 index 00000000..212b3680 Binary files /dev/null and b/contrib/plotting-visualization/images/scatter_colormap1.png differ diff --git a/contrib/plotting-visualization/images/scatter_colormap2.png b/contrib/plotting-visualization/images/scatter_colormap2.png new file mode 100644 index 00000000..08c40cce Binary files /dev/null and b/contrib/plotting-visualization/images/scatter_colormap2.png differ diff --git a/contrib/plotting-visualization/images/scatter_compare.png b/contrib/plotting-visualization/images/scatter_compare.png new file mode 100644 index 00000000..f94e18be Binary files /dev/null and b/contrib/plotting-visualization/images/scatter_compare.png differ diff --git a/contrib/plotting-visualization/images/scatterplot.png b/contrib/plotting-visualization/images/scatterplot.png new file mode 100644 index 00000000..94c91484 Binary files /dev/null and b/contrib/plotting-visualization/images/scatterplot.png differ diff --git a/contrib/plotting-visualization/images/seaborn-basics1.png b/contrib/plotting-visualization/images/seaborn-basics1.png new file mode 100644 index 00000000..bedcee91 Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-basics1.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image1.png b/contrib/plotting-visualization/images/seaborn-plotting/image1.png new file mode 100644 index 00000000..a8a6017e Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image1.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image10.png b/contrib/plotting-visualization/images/seaborn-plotting/image10.png new file mode 100644 index 00000000..e6df1bdd Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image10.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image11.png b/contrib/plotting-visualization/images/seaborn-plotting/image11.png new file mode 100644 index 00000000..e485ff71 Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image11.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image12.png b/contrib/plotting-visualization/images/seaborn-plotting/image12.png new file mode 100644 index 00000000..ae2a54dc Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image12.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image13.png b/contrib/plotting-visualization/images/seaborn-plotting/image13.png new file mode 100644 index 00000000..0f3b05cd Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image13.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image14.png b/contrib/plotting-visualization/images/seaborn-plotting/image14.png new file mode 100644 index 00000000..4bcf460e Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image14.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image15.png b/contrib/plotting-visualization/images/seaborn-plotting/image15.png new file mode 100644 index 00000000..de6603cf Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image15.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image16.png b/contrib/plotting-visualization/images/seaborn-plotting/image16.png new file mode 100644 index 00000000..ceb0df69 Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image16.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image2.png b/contrib/plotting-visualization/images/seaborn-plotting/image2.png new file mode 100644 index 00000000..a63d89e2 Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image2.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image3.png b/contrib/plotting-visualization/images/seaborn-plotting/image3.png new file mode 100644 index 00000000..2336257b Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image3.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image4.png b/contrib/plotting-visualization/images/seaborn-plotting/image4.png new file mode 100644 index 00000000..897634b4 Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image4.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image5.png b/contrib/plotting-visualization/images/seaborn-plotting/image5.png new file mode 100644 index 00000000..5b7c14f8 Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image5.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image6.png b/contrib/plotting-visualization/images/seaborn-plotting/image6.png new file mode 100644 index 00000000..ea1bbced Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image6.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image7.png b/contrib/plotting-visualization/images/seaborn-plotting/image7.png new file mode 100644 index 00000000..ff1de854 Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image7.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image8.png b/contrib/plotting-visualization/images/seaborn-plotting/image8.png new file mode 100644 index 00000000..1343cc65 Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image8.png differ diff --git a/contrib/plotting-visualization/images/seaborn-plotting/image9.png b/contrib/plotting-visualization/images/seaborn-plotting/image9.png new file mode 100644 index 00000000..a18193e0 Binary files /dev/null and b/contrib/plotting-visualization/images/seaborn-plotting/image9.png differ diff --git a/contrib/plotting-visualization/images/shadow.png b/contrib/plotting-visualization/images/shadow.png new file mode 100644 index 00000000..d1101a78 Binary files /dev/null and b/contrib/plotting-visualization/images/shadow.png differ diff --git a/contrib/plotting-visualization/images/simple_line.png b/contrib/plotting-visualization/images/simple_line.png new file mode 100644 index 00000000..3097f469 Binary files /dev/null and b/contrib/plotting-visualization/images/simple_line.png differ diff --git a/contrib/plotting-visualization/images/simple_scatter.png b/contrib/plotting-visualization/images/simple_scatter.png new file mode 100644 index 00000000..bfa5b408 Binary files /dev/null and b/contrib/plotting-visualization/images/simple_scatter.png differ diff --git a/contrib/plotting-visualization/images/split-violin-plot.png b/contrib/plotting-visualization/images/split-violin-plot.png new file mode 100644 index 00000000..170d287a Binary files /dev/null and b/contrib/plotting-visualization/images/split-violin-plot.png differ diff --git a/contrib/plotting-visualization/images/stacked_violin_plots.png b/contrib/plotting-visualization/images/stacked_violin_plots.png new file mode 100644 index 00000000..a580a2c9 Binary files /dev/null and b/contrib/plotting-visualization/images/stacked_violin_plots.png differ diff --git a/contrib/plotting-visualization/images/title_and_axis_labels.png b/contrib/plotting-visualization/images/title_and_axis_labels.png new file mode 100644 index 00000000..e97ce40a Binary files /dev/null and b/contrib/plotting-visualization/images/title_and_axis_labels.png differ diff --git a/contrib/plotting-visualization/images/two-lines.png b/contrib/plotting-visualization/images/two-lines.png new file mode 100644 index 00000000..db2c8858 Binary files /dev/null and b/contrib/plotting-visualization/images/two-lines.png differ diff --git a/contrib/plotting-visualization/images/violen-plots1.webp b/contrib/plotting-visualization/images/violen-plots1.webp new file mode 100644 index 00000000..9e842df5 Binary files /dev/null and b/contrib/plotting-visualization/images/violen-plots1.webp differ diff --git a/contrib/plotting-visualization/images/violenplotnormal.png b/contrib/plotting-visualization/images/violenplotnormal.png new file mode 100644 index 00000000..63d7c2d6 Binary files /dev/null and b/contrib/plotting-visualization/images/violenplotnormal.png differ diff --git a/contrib/plotting-visualization/images/violin-hatching.png b/contrib/plotting-visualization/images/violin-hatching.png new file mode 100644 index 00000000..ceab19b6 Binary files /dev/null and b/contrib/plotting-visualization/images/violin-hatching.png differ diff --git a/contrib/plotting-visualization/images/violin-labelling.png b/contrib/plotting-visualization/images/violin-labelling.png new file mode 100644 index 00000000..0bc7f813 Binary files /dev/null and b/contrib/plotting-visualization/images/violin-labelling.png differ diff --git a/contrib/plotting-visualization/images/violin-plot4.png b/contrib/plotting-visualization/images/violin-plot4.png new file mode 100644 index 00000000..12fb04b3 Binary files /dev/null and b/contrib/plotting-visualization/images/violin-plot4.png differ diff --git a/contrib/plotting-visualization/images/violinplotnocolor.png b/contrib/plotting-visualization/images/violinplotnocolor.png new file mode 100644 index 00000000..960dbc31 Binary files /dev/null and b/contrib/plotting-visualization/images/violinplotnocolor.png differ diff --git a/contrib/plotting-visualization/index.md b/contrib/plotting-visualization/index.md new file mode 100644 index 00000000..479ee883 --- /dev/null +++ b/contrib/plotting-visualization/index.md @@ -0,0 +1,17 @@ +# List of sections + +- [Installing Matplotlib](matplotlib-installation.md) +- [Introducing Matplotlib](matplotlib-introduction.md) +- [Bar Plots in Matplotlib](matplotlib-bar-plots.md) +- [Pie Charts in Matplotlib](matplotlib-pie-charts.md) +- [Line Charts in Matplotlib](matplotlib-line-plots.md) +- [Scatter Plots in Matplotlib](matplotlib-scatter-plot.md) +- [Violin Plots in Matplotlib](matplotlib-violin-plots.md) +- [subplots in Matplotlib](matplotlib-sub-plot.md) +- [Introduction to Seaborn and Installation](seaborn-intro.md) +- [Seaborn Plotting Functions](seaborn-plotting.md) +- [Getting started with Seaborn](seaborn-basics.md) +- [Bar Plots in Plotly](plotly-bar-plots.md) +- [Pie Charts in Plotly](plotly-pie-charts.md) +- [Line Charts in Plotly](plotly-line-charts.md) +- [Scatter Plots in Plotly](plotly-scatter-plots.md) diff --git a/contrib/plotting-visualization/matplotlib-bar-plots.md b/contrib/plotting-visualization/matplotlib-bar-plots.md new file mode 100644 index 00000000..97d45cea --- /dev/null +++ b/contrib/plotting-visualization/matplotlib-bar-plots.md @@ -0,0 +1,216 @@ +# Bar Plots in Matplotlib +A bar plot or a bar chart is a type of data visualisation that represents data in the form of rectangular bars, with lengths or heights proportional to the values and data which they represent. The bar plots can be plotted both vertically and horizontally. + +It is one of the most widely used type of data visualisation as it is easy to interpret and is pleasing to the eyes. + +Matplotlib provides a very easy and intuitive method to create highly customized bar plots. + +## Prerequisites + +Before creating bar plots in matplotlib you must ensure that you have Python as well as Matplotlib installed on your system. + +## Creating a simple Bar Plot with `bar()` method + +A very basic Bar Plot can be created with `bar()` method in `matplotlib.pyplot` + +```Python +import matplotlib.pyplot as plt + +# Creating dataset +x = ["A", "B", "C", "D"] +y = [2, 7, 9, 11] + +# Creating bar plot +plt.bar(x,y) +plt.show() # Shows the plot +``` +When executed, this would show the following bar plot: + +![Basic Bar Plot](images/basic_bar_plot.png) + +The `bar()` function takes arguments that describes the layout of the bars. + +Here, `plt.bar(x,y)` is used to specify that the bar chart is to be plotted by taking the `x` array as X-axis and `y` array as Y-axis. You can customize the graph further like adding labels to the axes, color of the bars, etc. These will be explored in the upcoming sections. + +Additionally, you can also use `numpy` arrays for faster generation when handling large datasets. + +```Python +import matplotlib.pyplot as plt +import numpy as np + +# Using numpy array +x = np.array(["A", "B", "C", "D"]) +y = np.array([2, 7, 9, 11]) + +plt.bar(x,y) +plt.show() +``` +Its output would be the same as above. + +## Customizing Bar Plots + +For creating customized bar plots, it is **highly recommended** to create the plots using `matplotlib.pyplot.subplots()`, otherwise it is difficult to apply the customizations in the newer versions of Matplotlib. + +### Adding title to the graph and labeling the axes + +Let us create an imaginary graph of number of cars sold in a various years. + +```Python +import matplotlib.pyplot as plt + +fig, ax = plt.subplots() + +years = ['1999', '2000', '2001', '2002'] +num_of_cars_sold = [300, 500, 700, 1000] + +# Creating bar plot +ax.bar(years, num_of_cars_sold) + +# Adding axis labels +ax.set_xlabel("Years") +ax.set_ylabel("Number of cars sold") + +# Adding plot title +ax.set_title("Number of cars sold in various years") + +plt.show() +``` + +![Title and axis labels](images/title_and_axis_labels.png) + +Here, we have created a `matplotlib.pyplot.subplots()` object which returns a `Figure` object `fig` as well as an `Axes` object `ax` both of which are used for customizing the bar plot. `ax.set_xlabel`, `ax.set_ylabel` and `ax.set_title` are respectively used for adding labels of X, Y axis and adding title to the graph. + +### Adding bar colors and legends + +Let us consider our previous example of number of cars sold in various years and suppose that we want to add different colors to the bars from different centuries and respective legends for better interpretation. + +This can be achieved by creating two separate arrays `bar_colors` for bar colors and `bar_labels` for legend labels and passing them as arguments to parameters color and label respectively in `ax.bar` method. + +```Python +import matplotlib.pyplot as plt + +fig, ax = plt.subplots() + +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] +bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue'] +bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s'] + +# Creating the customized bar plot +ax.bar(years, num_of_cars_sold, color=bar_colors, label=bar_labels) + +# Adding axis labels +ax.set_xlabel("Years") +ax.set_ylabel("Number of cars sold") + +# Adding plot title +ax.set_title("Number of cars sold in various years") + +# Adding legend title +ax.legend(title='Centuries') + +plt.show() +``` + +![Bar colors and Legends](images/bar_colors_and_legends.png) + +Note that the labels with a preceding underscore won't show up in the legend. Legend titles can be added by simply passing `title` argument in `ax.legend()`, as shown. Also, you can have a different color for all the bars by passing the `HEX` value of that color in the `color` parameter. + +### Adding labels to bars + +We may want to add labels to bars representing their absolute (or truncated) values for instant and accurate reading. This can be achieved by passing the `BarContainer` object (returned by `ax.bar()` method) which is basically a aontainer with all the bars and optionally errorbars to `ax.bar_label` method. + +```Python +import matplotlib.pyplot as plt + +fig, ax = plt.subplots() + +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] +bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue'] +bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s'] + +# BarContainer object +bar_container = ax.bar(years, num_of_cars_sold, color=bar_colors, label=bar_labels) + +ax.set_xlabel("Years") +ax.set_ylabel("Number of cars sold") +ax.set_title("Number of cars sold in various years") +ax.legend(title='Centuries') + +# Adding bar labels +ax.bar_label(bar_container) + +plt.show() +``` + +![Bar Labels](images/bar_labels.png) + +**Note:** There are various other methods of adding bar labels in matplotlib. + +## Horizontal Bar Plot + +We can create horizontal bar plots by using the `barh()` method in `matplotlib.pyplot`. All the relevant customizations are applicable here also. + +```Python +import matplotlib.pyplot as plt + +fig, ax = plt.subplots(figsize=(10,5)) # figsize is used to alter the size of figure + +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] +bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue'] +bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s'] + +# Creating horizontal bar plot +bar_container = ax.barh(years, num_of_cars_sold, color=bar_colors, label=bar_labels) + +# Adding axis labels +ax.set_xlabel("Years") +ax.set_ylabel("Number of cars sold") + +# Adding Title +ax.set_title("Number of cars sold in various years") +ax.legend(title='Centuries') + +# Adding bar labels +ax.bar_label(bar_container) + +plt.show() +``` + +![Horizontal Bar Plot-1](images/horizontal_bar_plot_1.png) + +We can also invert the Y-axis labels here to show the top values first. + +```Python +import matplotlib.pyplot as plt + +fig, ax = plt.subplots(figsize=(10,5)) # figsize is used to alter the size of figure + +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] +bar_colors = ['tab:green', 'tab:green', 'tab:blue', 'tab:blue', 'tab:blue'] +bar_labels = ['1900s', '_1900s', '2000s', '_2000s', '_2000s'] + +# Creating horizontal bar plot +bar_container = ax.barh(years, num_of_cars_sold, color=bar_colors, label=bar_labels) + +# Adding axis labels +ax.set_xlabel("Years") +ax.set_ylabel("Number of cars sold") + +# Adding Title +ax.set_title("Number of cars sold in various years") +ax.legend(title='Centuries') + +# Adding bar labels +ax.bar_label(bar_container) + +# Inverting Y-axis +ax.invert_yaxis() + +plt.show() +``` + +![Horizontal Bar Plot-2](images/horizontal_bar_plot_2.png) diff --git a/contrib/plotting-visualization/matplotlib-installation.md b/contrib/plotting-visualization/matplotlib-installation.md new file mode 100644 index 00000000..263e99bd --- /dev/null +++ b/contrib/plotting-visualization/matplotlib-installation.md @@ -0,0 +1,31 @@ +# Matplotlib Installation + +Matplotlib is a widely used Python library for creating static, animated, and interactive visualizations. It can be installed using `pip`, Python's package manager. + +## Prerequisites + +Before installing Matplotlib, ensure you have Python installed on your system. You can download and install Python from the [official Python website](https://www.python.org/). + +## Installation Steps + +1. **Install Matplotlib**: Open your terminal or command prompt and run the following command to install Matplotlib using `pip`: + +```bash +pip install matplotlib +``` + +2. **Verify Installation**: After installation, you can verify if Matplotlib is installed correctly by importing it in a Python environment: + +```python +import matplotlib + +print(matplotlib.__version__) + + +``` + +Output: + +``` +3.4.3 +``` diff --git a/contrib/plotting-visualization/matplotlib-introduction.md b/contrib/plotting-visualization/matplotlib-introduction.md new file mode 100644 index 00000000..691d1f8a --- /dev/null +++ b/contrib/plotting-visualization/matplotlib-introduction.md @@ -0,0 +1,80 @@ +# Introducing MatplotLib + +Data visualisation is the analysing and understanding the data via graphical representation of the data by the means of pie charts, histograms, scatterplots and line graphs. + +To make this process of data visualization easier and clearer, matplotlib library is used. + +## Features of MatplotLib library +- MatplotLib library is one of the most popular python packages for 2D representation of data +- Combination of matplotlib and numpy is used for easier computations and visualization of large arrays and data. Matplotlib along with NumPy can be considered as the open source equivalent of MATLAB. + +- Matplotlib has a procedural interface named the Pylab, which is designed to resemble MATLAB. However, it is completely independent of Matlab. + +## Starting with Matplotlib + +### 1. Install and import the neccasary libraries - mayplotlib.pylplot + +```bash +pip install matplotlib +``` + +```python +import maptplotlib.pyplot as plt +import numpy as np +``` + +### 2. Scatter plot +Scatter plot is a type of plot that uses the cartesian coordinates between x and y to describe the relation between them. It uses dots to represent relation between the data variables of the data set. + +```python +x = [5,4,5,8,9,8,6,7,3,2] +y = [9,1,7,3,5,7,6,1,2,8] + +plt.scatter(x,y, color = "red") + +plt.title("Scatter plot") +plt.xlabel("X values") +plt.ylabel("Y values") + +plt.tight_layout() +plt.show() +``` + +![scatterplot](images/scatterplot.png) + +### 3. Bar plot +Bar plot is a type of plot that plots the frequency distrubution of the categorical variables. Each entity of the categoric variable is represented as a bar. The size of the bar represents its numeric value. + +```python +x = np.array(['A','B','C','D']) +y = np.array([42,50,15,35]) + +plt.bar(x,y,color = "red") + +plt.title("Bar plot") +plt.xlabel("X values") +plt.ylabel("Y values") + +plt.show() +``` + +![barplot](images/barplot.png) + +### 4. Histogram +Histogram is the representation of frequency distribution of qualitative data. The height of each rectangle defines the amount, or how often that variable appears. + +```python +x = [9,1,7,3,5,7,6,1,2,8] + +plt.hist(x, color = "red", edgecolor= "white", bins =5) + +plt.title("Histogram") +plt.xlabel("X values") +plt.ylabel("Frequency Distribution") + +plt.show() +``` + +![histogram](images/histogram.png) + + diff --git a/contrib/plotting-visualization/matplotlib-line-plots.md b/contrib/plotting-visualization/matplotlib-line-plots.md new file mode 100644 index 00000000..b7488e6f --- /dev/null +++ b/contrib/plotting-visualization/matplotlib-line-plots.md @@ -0,0 +1,278 @@ +# Line Chart in Matplotlib + +A line chart is a simple way to visualize data where we connect individual data points. It helps us to see trends and patterns over time or across categories. + +This type of chart is particularly useful for: +- Comparing Data: Comparing multiple datasets on the same axes. +- Highlighting Changes: Illustrating changes and patterns in data. +- Visualizing Trends: Showing trends over time or other continuous variables. + +## Prerequisites + +Line plots can be created in Python with Matplotlib's `pyplot` library. To build a line plot, first import `matplotlib`. It is a standard convention to import Matplotlib's pyplot library as `plt`. + +```python +import matplotlib.pyplot as plt +``` + +## Creating a simple Line Plot + +First import matplotlib and numpy, these are useful for charting. + +You can use the `plot(x,y)` method to create a line chart. + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(-1, 1, 50) +print(x) +y = 2*x + 1 + +plt.plot(x, y) +plt.show() +``` + +When executed, this will show the following line plot: + +![Basic line Chart](images/simple_line.png) + + +## Curved line + +The `plot()` method also works for other types of line charts. It doesn’t need to be a straight line, y can have any type of values. + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(-1, 1, 50) +y = 2**x + 1 + +plt.plot(x, y) +plt.show() +``` + +When executed, this will show the following Curved line plot: + +![Curved line](images/line-curve.png) + + +## Line with Labels + +To know what you are looking at, you need meta data. Labels are a type of meta data. They show what the chart is about. The chart has an `x label`, `y label` and `title`. + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(-1, 1, 50) +y1 = 2*x + 1 +y2 = 2**x + 1 + +plt.figure() +plt.plot(x, y1) + +plt.xlabel("I am x") +plt.ylabel("I am y") +plt.title("With Labels") + +plt.show() +``` + +When executed, this will show the following line with labels plot: + +![line with labels](images/line-labels.png) + +## Multiple lines + +More than one line can be in the plot. To add another line, just call the `plot(x,y)` function again. In the example below we have two different values for `y(y1,y2)` that are plotted onto the chart. + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(-1, 1, 50) +y1 = 2*x + 1 +y2 = 2**x + 1 + +plt.figure(num = 3, figsize=(8, 5)) +plt.plot(x, y2) +plt.plot(x, y1, + color='red', + linewidth=1.0, + linestyle='--' + ) + +plt.show() +``` + +When executed, this will show the following Multiple lines plot: + +![multiple lines](images/two-lines.png) + + +## Dotted line + +Lines can be in the form of dots like the image below. Instead of calling `plot(x,y)` call the `scatter(x,y)` method. The `scatter(x,y)` method can also be used to (randomly) plot points onto the chart. + +```python +import matplotlib.pyplot as plt +import numpy as np + +n = 1024 +X = np.random.normal(0, 1, n) +Y = np.random.normal(0, 1, n) +T = np.arctan2(X, Y) + +plt.scatter(np.arange(5), np.arange(5)) + +plt.xticks(()) +plt.yticks(()) + +plt.show() +``` + +When executed, this will show the following Dotted line plot: + +![dotted lines](images/dot-line.png) + +## Line ticks + +You can change the ticks on the plot. Set them on the `x-axis`, `y-axis` or even change their color. The line can be more thick and have an alpha value. + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(-1, 1, 50) +y = 2*x - 1 + +plt.figure(figsize=(12, 8)) +plt.plot(x, y, color='r', linewidth=10.0, alpha=0.5) + +ax = plt.gca() + +ax.spines['right'].set_color('none') +ax.spines['top'].set_color('none') + +ax.xaxis.set_ticks_position('bottom') +ax.yaxis.set_ticks_position('left') + +ax.spines['bottom'].set_position(('data', 0)) +ax.spines['left'].set_position(('data', 0)) + +for label in ax.get_xticklabels() + ax.get_yticklabels(): + label.set_fontsize(12) + label.set_bbox(dict(facecolor='y', edgecolor='None', alpha=0.7)) + +plt.show() +``` + +When executed, this will show the following line ticks plot: + +![line ticks](images/line-ticks.png) + +## Line with asymptote + +An asymptote can be added to the plot. To do that, use `plt.annotate()`. There’s lso a dotted line in the plot below. You can play around with the code to see how it works. + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(-1, 1, 50) +y1 = 2*x + 1 +y2 = 2**x + 1 + +plt.figure(figsize=(12, 8)) +plt.plot(x, y2) +plt.plot(x, y1, color='red', linewidth=1.0, linestyle='--') + +ax = plt.gca() + +ax.spines['right'].set_color('none') +ax.spines['top'].set_color('none') + +ax.xaxis.set_ticks_position('bottom') +ax.yaxis.set_ticks_position('left') + +ax.spines['bottom'].set_position(('data', 0)) +ax.spines['left'].set_position(('data', 0)) + + +x0 = 1 +y0 = 2*x0 + 1 + +plt.scatter(x0, y0, s = 66, color = 'b') +plt.plot([x0, x0], [y0, 0], 'k-.', lw= 2.5) + +plt.annotate(r'$2x+1=%s$' % + y0, + xy=(x0, y0), + xycoords='data', + + xytext=(+30, -30), + textcoords='offset points', + fontsize=16, + arrowprops=dict(arrowstyle='->',connectionstyle='arc3,rad=.2') + ) + +plt.text(0, 3, + r'$This\ is\ a\ good\ idea.\ \mu\ \sigma_i\ \alpha_t$', + fontdict={'size':16,'color':'r'}) + +plt.show() +``` + +When executed, this will show the following Line with asymptote plot: + +![Line with asymptote](images/line-asymptote.png) + +## Line with text scale + +It doesn’t have to be a numeric scale. The scale can also contain textual words like the example below. In `plt.yticks()` we just pass a list with text values. These values are then show against the `y axis`. + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(-1, 1, 50) +y1 = 2*x + 1 +y2 = 2**x + 1 + +plt.figure(num = 3, figsize=(8, 5)) +plt.plot(x, y2) + +plt.plot(x, y1, + color='red', + linewidth=1.0, + linestyle='--' + ) + +plt.xlim((-1, 2)) +plt.ylim((1, 3)) + +new_ticks = np.linspace(-1, 2, 5) +plt.xticks(new_ticks) +plt.yticks([-2, -1.8, -1, 1.22, 3], + [r'$really\ bad$', r'$bad$', r'$normal$', r'$good$', r'$readly\ good$']) + +ax = plt.gca() +ax.spines['right'].set_color('none') +ax.spines['top'].set_color('none') + +ax.xaxis.set_ticks_position('bottom') +ax.yaxis.set_ticks_position('left') + +ax.spines['bottom'].set_position(('data', 0)) +ax.spines['left'].set_position(('data', 0)) + +plt.show() +``` + +When executed, this will show the following Line with text scale plot: + +![Line with text scale](images/line-with-text-scale.png) + + diff --git a/contrib/plotting-visualization/matplotlib-pie-charts.md b/contrib/plotting-visualization/matplotlib-pie-charts.md new file mode 100644 index 00000000..66f2aa15 --- /dev/null +++ b/contrib/plotting-visualization/matplotlib-pie-charts.md @@ -0,0 +1,233 @@ +# Pie Charts in Matplotlib + +A pie chart is a type of graph that represents the data in the circular graph. The slices of pie show the relative size of the data, and it is a type of pictorial representation of data. A pie chart requires a list of categorical variables and numerical variables. Here, the term "pie" represents the whole, and the "slices" represent the parts of the whole. + +Pie charts are commonly used in business presentations like sales, operations, survey results, resources, etc. as they are pleasing to the eye and provide a quick summary. + +## Prerequisites + +Before creating pie charts in matplotlib you must ensure that you have Python as well as Matplotlib installed on your system. + +## Creating a simple pie chart with `pie()` method + +A basic pie chart can be created with `pie()` method in `matplotlib.pyplot`. + +```Python +import matplotlib.pyplot as plt + +# Creating dataset +labels = ['A','B','C','D','E'] +data = [10,20,30,40,50] + +# Creating Plot +plt.pie(data, labels=labels) + +# Show plot +plt.show() +``` + +When executed, this would show the following pie chart: + +![Basic Pie Chart](images/basic_pie_chart.png) + +Note that the slices of the pie are labelled according to their corresponding proportion in the `data` as a whole. + +The `pie()` function takes arguments that describes the layout of the pie chart. + +Here, `plt.pie(data, labels=labels)` is used to specify that the pie chart is to be plotted by taking the values from array `data` and the fractional area of each slice is represented by **data/sum(data)**. The array `labels` represents the labels of slices corresponding to each value in `data`. + +You can customize the graph further like specifying custom colors for slices, exploding slices, labeling wedges (slices), etc. These will be explored in the upcoming sections. + +## Customizing Pie Chart in Matplotlib + +For creating customized plots, it is highly recommended to create the plots using `matplotlib.pyplot.subplots()`, otherwise it is difficult to apply the customizations in the newer versions of Matplotlib. + +### Coloring Slices + +You can add custom set of colors to the slices by passing an array of colors to `colors` parameter in `pie()` method. + +```Python +import matplotlib.pyplot as plt + +# Creating dataset +labels = ['A','B','C','D','E'] +data = [10,20,30,40,50] +colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink'] + +# Creating plot using matplotlib.pyplot.subplots() +fig, ax = plt.subplots() +ax.pie(data, labels=labels, colors=colors) + +# Show plot +plt.show() +``` +![Coloring Slices](images/coloring_slices.png) + +Here, we have created a `matplotlib.pyplot.subplots()` object which returns a `Figure` object `fig` as well as an `Axes` object `ax` both of which are used for customizing the pie chart. + +**Note:** Each slice of the pie chart is a `patches.Wedge` object; therefore in addition to the customizations shown here, each wedge can be customized using the `wedgeprops` argument which takes Python dictionary as parameter with name values pairs denoting the wedge properties like linewidth, edgecolor, etc. + +### Hatching Slices + +To make the pie chart more pleasing, you can pass a list of hatch patters to `hatch` parameter to set the pattern of each slice. + +```Python +import matplotlib.pyplot as plt + +# Creating dataset +labels = ['A','B','C','D','E'] +data = [10,20,30,40,50] +colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink'] +hatch = ['*O', 'oO', 'OO', '.||.', '|*|'] # Hatch patterns + +# Creating plot +fig, ax = plt.subplots() +ax.pie(data, labels=labels, colors=colors, hatch=hatch) + +# Show plot +plt.show() +``` +![Hatch Patterns](images/hatch_patterns.png) + +You can try and test your own beautiful hatch patters! + +### Labeling Slices + +You can pass a function or format string to `autopct` parameter to label slices. + +An example in shown here: + +```Python +import matplotlib.pyplot as plt + +# Creating dataset +labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +data = [11,9,17,4,7] +colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink'] + +# Creating plot +fig, ax = plt.subplots() +ax.pie(data, labels=labels, colors=colors, autopct='%1.1f%%') + +# Show plot +plt.show() +``` +![Autopct Example](images/autopct.png) + +Here, `autopct='%1.1f%%'` specifies that the wedges (slices) have to be labelled corresponding to the percentage proportion which they occupy out of 100% with precision upto 1 decimal places. + +### Exploding Slices + +The explode parameter separates a portion of the chart. You can explode slices by passing an array of numbers to `explode` parameter. + +```Python +import matplotlib.pyplot as plt + +# Creating dataset +labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +data = [11,9,17,4,7] +colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink'] + +# Explode only the first slice, i.e 'Rose' +explode = [0.1, 0, 0, 0, 0] + +# Creating plot +fig, ax = plt.subplots() +ax.pie(data, labels=labels, colors=colors, explode=explode, autopct='%1.1f%%') + +# Show plot +plt.show() +``` +![Explode Slice](images/explode_slice.png) + +### Shading Slices + +You can add shadow to slices by passing `shadow=True` in `pie()` method. + +```Python +import matplotlib.pyplot as plt + +# Creating dataset +labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +data = [11,9,17,4,7] +colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink'] + +# Explode only the first slice, i.e 'Rose' +explode = [0.1, 0, 0, 0, 0] + +# Creating plot +fig, ax = plt.subplots() +ax.pie(data, labels=labels, colors=colors, explode=explode, shadow=True, autopct='%1.1f%%') + +# Show plot +plt.show() +``` +![Shadow](images/shadow.png) + +### Rotating Slices + +You can rotate slices by passing a custom start angle value to the `startangle` parameter. + +```Python +import matplotlib.pyplot as plt + +# Creating dataset +labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +data = [11,9,17,4,7] +colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink'] + +# Creating plot +fig, ax = plt.subplots() +ax.pie(data, labels=labels, colors=colors, startangle=90, autopct='%1.1f%%') + +# Show plot +plt.show() +``` +![Rotating Slices](images/rotating_slices.png) + +The default `startangle` is 0, which would start the first slice ('Rose') on the positive x-axis. This example sets `startangle=90` such that all the slices are rotated counter-clockwise by 90 degrees, and the `'Rose'` slice starts on the positive y-axis. + +### Controlling Size of Pie Chart + +In addition to the size of figure, you can also control the size of pie chart using the `radius` parameter. + +```Python +import matplotlib.pyplot as plt + +# Creating dataset +labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +data = [11,9,17,4,7] +colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink'] + +# Creating plot +fig, ax = plt.subplots() +ax.pie(data, labels=labels, colors=colors, startangle=90, autopct='%1.1f%%', textprops={'size': 'smaller'}, radius=0.7) + +# Show plot +plt.show() +``` +![Controlling Size](images/radius.png) + +Note that `textprops` is an additional argument which can be used for controlling the propoerties of any text in the pie chart. In this case, we have specified that the size of text should be smaller. There are many more such properties available in `textprops`. + +### Adding Legends + +You can also use legends to act like a label to slices, like this: + +```Python +import matplotlib.pyplot as plt + +# Creating dataset +labels = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +data = [11,9,17,4,7] +colors=['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:pink'] + +# Creating plot +fig, ax = plt.subplots(figsize=(7,7)) +ax.pie(data, colors=colors, startangle=90, autopct='%1.1f%%', radius=0.7) +plt.legend(labels, title="Flowers") + +# Show plot +plt.show() +``` +![Legends](images/legends.png) diff --git a/contrib/plotting-visualization/matplotlib-scatter-plot.md b/contrib/plotting-visualization/matplotlib-scatter-plot.md new file mode 100644 index 00000000..535a3a35 --- /dev/null +++ b/contrib/plotting-visualization/matplotlib-scatter-plot.md @@ -0,0 +1,160 @@ +# Scatter() plot in matplotlib +* A scatter plot is a type of data visualization that uses dots to show values for two variables, with one variable on the x-axis and the other on the y-axis. It's useful for identifying relationships, trends, and correlations, as well as spotting clusters and outliers. +* The dots on the plot shows how the variables are related. A scatter plot is made with the matplotlib library's `scatter() method`. +## Syntax +**Here's how to write code for the `scatter() method`:** +``` +matplotlib.pyplot.scatter (x_axis_value, y_axis_value, s = None, c = None, vmin = None, vmax = None, marker = None, cmap = None, alpha = None, linewidths = None, edgecolors = None) + +``` +## Prerequisites +Scatter plots can be created in Python with Matplotlib's pyplot library. To build a Scatter plot, first import matplotlib. It is a standard convention to import Matplotlib's pyplot library as plt. +``` +import matplotlib.pyplot as plt + +``` +## Creating a simple Scatter Plot +With Pyplot, you can use the `scatter()` function to draw a scatter plot. + +The `scatter()` function plots one dot for each observation. It needs two arrays of the same length, one for the values of the x-axis, and one for values on the y-axis: +``` +import matplotlib.pyplot as plt +import numpy as np + +x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6]) +y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86]) + +plt.scatter(x, y) +plt.show() +``` + +When executed, this will show the following Scatter plot: + +![Basic line Chart](images/simple_scatter.png) + +## Compare Plots + +In a scatter plot, comparing plots involves examining multiple sets of points to identify differences or similarities in patterns, trends, or correlations between the data sets. + +``` +import matplotlib.pyplot as plt +import numpy as np + +#day one, the age and speed of 13 cars: +x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6]) +y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86]) +plt.scatter(x, y) + +#day two, the age and speed of 15 cars: +x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12]) +y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85]) +plt.scatter(x, y) + +plt.show() +``` + +When executed, this will show the following Compare Scatter plot: + +![Compare Plots](images/scatter_compare.png) + +## Colors in Scatter plot +You can set your own color for each scatter plot with the `color` or the `c` argument: + +``` +import matplotlib.pyplot as plt +import numpy as np + +x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6]) +y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86]) +plt.scatter(x, y, color = 'hotpink') + +x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12]) +y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85]) +plt.scatter(x, y, color = '#88c999') + +plt.show() +``` + +When executed, this will show the following Colors Scatter plot: + +![Colors in Scatter plot](images/scatter_color.png) + +## Color Each Dot +You can even set a specific color for each dot by using an array of colors as value for the `c` argument: + +``Note: You cannot use the `color` argument for this, only the `c` argument.`` + +``` +import matplotlib.pyplot as plt +import numpy as np + +x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6]) +y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86]) +colors = np.array(["red","green","blue","yellow","pink","black","orange","purple","beige","brown","gray","cyan","magenta"]) + +plt.scatter(x, y, c=colors) + +plt.show() +``` + +When executed, this will show the following Color Each Dot: + +![Color Each Dot](images/scatter_coloreachdot.png) + +## ColorMap +The Matplotlib module has a number of available colormaps. + +A colormap is like a list of colors, where each color has a value that ranges from 0 to 100. + +Here is an example of a colormap: + +![ColorMap](images/img_colorbar.png) + +This colormap is called 'viridis' and as you can see it ranges from 0, which is a purple color, up to 100, which is a yellow color. + +## How to Use the ColorMap +You can specify the colormap with the keyword argument `cmap` with the value of the colormap, in this case `'viridis'` which is one of the built-in colormaps available in Matplotlib. + +In addition you have to create an array with values (from 0 to 100), one value for each point in the scatter plot: + +``` +import matplotlib.pyplot as plt +import numpy as np + +x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6]) +y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86]) +colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100]) + +plt.scatter(x, y, c=colors, cmap='viridis') + +plt.show() +``` + +When executed, this will show the following Scatter ColorMap: + +![Scatter ColorMap](images/scatter_colormap1.png) + +You can include the colormap in the drawing by including the `plt.colorbar()` statement: + +``` +import matplotlib.pyplot as plt +import numpy as np + +x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6]) +y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86]) +colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100]) + +plt.scatter(x, y, c=colors, cmap='viridis') + +plt.colorbar() + +plt.show() +``` + +When executed, this will show the following Scatter ColorMap using `plt.colorbar()`: + +![Scatter ColorMap1](images/scatter_colormap2.png) + + + + diff --git a/contrib/plotting-visualization/matplotlib-sub-plot.md b/contrib/plotting-visualization/matplotlib-sub-plot.md new file mode 100644 index 00000000..16c294cc --- /dev/null +++ b/contrib/plotting-visualization/matplotlib-sub-plot.md @@ -0,0 +1,130 @@ +### 1. Using `plt.subplots()` + +The `plt.subplots()` function is a versatile and easy way to create a grid of subplots. It returns a figure and an array of Axes objects. + +#### Code Explanation + +1. **Import Libraries**: + ```python + import matplotlib.pyplot as plt + import numpy as np + ``` + +2. **Generate Sample Data**: + ```python + x = np.linspace(0, 10, 100) + y1 = np.sin(x) + y2 = np.cos(x) + y3 = np.tan(x) + ``` + +3. **Create Subplots**: + ```python + fig, axs = plt.subplots(3, 1, figsize=(8, 12)) + ``` + + - `3, 1` indicates a 3-row, 1-column grid. + - `figsize` specifies the overall size of the figure. + +4. **Plot Data**: + ```python + axs[0].plot(x, y1, 'r') + axs[0].set_title('Sine Function') + + axs[1].plot(x, y2, 'g') + axs[1].set_title('Cosine Function') + + axs[2].plot(x, y3, 'b') + axs[2].set_title('Tangent Function') + ``` + +5. **Adjust Layout and Show Plot**: + ```python + plt.tight_layout() + plt.show() + ``` + +#### Result + +The result will be a figure with three vertically stacked subplots. +![subplot Chart](images/subplots.png) + +### 2. Using `plt.subplot()` + +The `plt.subplot()` function allows you to add a single subplot at a time to a figure. + +#### Code Explanation + +1. **Import Libraries and Generate Data** (same as above). + +2. **Create Figure and Subplots**: + ```python + plt.figure(figsize=(8, 12)) + + plt.subplot(3, 1, 1) + plt.plot(x, y1, 'r') + plt.title('Sine Function') + + plt.subplot(3, 1, 2) + plt.plot(x, y2, 'g') + plt.title('Cosine Function') + + plt.subplot(3, 1, 3) + plt.plot(x, y3, 'b') + plt.title('Tangent Function') + ``` + +3. **Adjust Layout and Show Plot** (same as above). + +#### Result + +The result will be similar to the first method but created using individual subplot commands. + +![subplot Chart](images/subplots.png) + +### 3. Using `GridSpec` + +`GridSpec` allows for more complex subplot layouts. + +#### Code Explanation + +1. **Import Libraries and Generate Data** (same as above). + +2. **Create Figure and GridSpec**: + ```python + from matplotlib.gridspec import GridSpec + + fig = plt.figure(figsize=(8, 12)) + gs = GridSpec(3, 1, figure=fig) + ``` + +3. **Create Subplots**: + ```python + ax1 = fig.add_subplot(gs[0, 0]) + ax1.plot(x, y1, 'r') + ax1.set_title('Sine Function') + + ax2 = fig.add_subplot(gs[1, 0]) + ax2.plot(x, y2, 'g') + ax2.set_title('Cosine Function') + + ax3 = fig.add_subplot(gs[2, 0]) + ax3.plot(x, y3, 'b') + ax3.set_title('Tangent Function') + ``` + +4. **Adjust Layout and Show Plot** (same as above). + +#### Result + +The result will again be three subplots in a vertical stack, created using the flexible `GridSpec`. + +![subplot Chart](images/subplots.png) + +### Summary + +- **`plt.subplots()`**: Creates a grid of subplots with shared axes. +- **`plt.subplot()`**: Adds individual subplots in a figure. +- **`GridSpec`**: Allows for complex and custom subplot layouts. + +By mastering these techniques, you can create detailed and organized visualizations, enhancing the clarity and comprehension of your data presentations. \ No newline at end of file diff --git a/contrib/plotting-visualization/matplotlib-violin-plots.md b/contrib/plotting-visualization/matplotlib-violin-plots.md new file mode 100644 index 00000000..ef2ec42c --- /dev/null +++ b/contrib/plotting-visualization/matplotlib-violin-plots.md @@ -0,0 +1,277 @@ +# Violin Plots in Matplotlib + +A violin plot is a method of plotting numeric data and a probability density function. It is a combination of a box plot and a kernel density plot, providing a richer visualization of the distribution of the data. In a violin plot, each data point is represented by a kernel density plot, mirrored and joined together to form a symmetrical shape resembling a violin, hence the name. + +Violin plots are particularly useful when comparing distributions across different categories or groups. They provide insights into the shape, spread, and central tendency of the data, allowing for a more comprehensive understanding than traditional box plots. + +Violin plots offer a more detailed distribution representation, combining summary statistics and kernel density plots, handle unequal sample sizes effectively, allow easy comparison across groups, and facilitate identification of multiple modes compared to box plots. + +![Violen plot 1](images/violen-plots1.webp) + +## Prerequisites + +Before creating violin charts in matplotlib you must ensure that you have Python as well as Matplotlib installed on your system. + +## Creating a simple Violin Plot with `violinplot()` method + +A basic violin plot can be created with `violinplot()` method in `matplotlib.pyplot`. + +```Python +import matplotlib.pyplot as plt +import numpy as np + +# Creating dataset +data = [np.random.normal(0, std, 100) for std in range(1, 5)] + +# Creating Plot +plt.violinplot(data) + +# Show plot +plt.show() + +``` + +When executed, this would show the following pie chart: + + +![Basic violin plot](images/violinplotnocolor.png) + + +The `Violinplot` function in matplotlib.pyplot creates a violin plot, which is a graphical representation of the distribution of data across different levels of a categorical variable. Here's a breakdown of its usage: + +```Python +plt.violinplot(data, showmeans=False, showextrema=False) +``` + +- `data`: This parameter represents the dataset used to create the violin plot. It can be a single array or a sequence of arrays. + +- `showmeans`: This optional parameter, if set to True, displays the mean value as a point on the violin plot. Default is False. + +- `showextrema`: This optional parameter, if set to True, displays the minimum and maximum values as points on the violin plot. Default is False. + +Additional parameters can be used to further customize the appearance of the violin plot, such as setting custom colors, adding labels, and adjusting the orientation. For instance: + +```Python +plt.violinplot(data, showmedians=True, showmeans=True, showextrema=True, vert=False, widths=0.9, bw_method=0.5) +``` +- showmedians: Setting this parameter to True displays the median value as a line on the violin plot. + +- `vert`: This parameter determines the orientation of the violin plot. Setting it to False creates a horizontal violin plot. Default is True. + +- `widths`: This parameter sets the width of the violins. Default is 0.5. + +- `bw_method`: This parameter determines the method used to calculate the kernel bandwidth for the kernel density estimation. Default is 0.5. + +Using these parameters, you can customize the violin plot according to your requirements, enhancing its readability and visual appeal. + + +## Customizing Violin Plots in Matplotlib + +When customizing violin plots in Matplotlib, using `matplotlib.pyplot.subplots()` provides greater flexibility for applying customizations. + +### Coloring Violin Plots + +You can assign custom colors to the `violins` by passing an array of colors to the color parameter in `violinplot()` method. + +```Python +import matplotlib.pyplot as plt +import numpy as np + +# Creating dataset +data = [np.random.normal(0, std, 100) for std in range(1, 5)] +colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange'] + +# Creating plot using matplotlib.pyplot.subplots() +fig, ax = plt.subplots() + +# Customizing colors of violins +for i in range(len(data)): + parts = ax.violinplot(data[i], positions=[i], vert=False, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5) + for pc in parts['bodies']: + pc.set_facecolor(colors[i]) + +# Show plot +plt.show() +``` +This code snippet creates a violin plot with custom colors assigned to each violin, enhancing the visual appeal and clarity of the plot. + + +![Coloring violin](images/violenplotnormal.png) + + +When customizing violin plots using `matplotlib.pyplot.subplots()`, you obtain a `Figure` object `fig` and an `Axes` object `ax`, allowing for extensive customization. Each `violin plot` consists of various components, including the `violin body`, `lines representing median and quartiles`, and `potential markers for mean and outliers`. You can customize these components using the appropriate methods and attributes of the Axes object. + +- Here's an example of how to customize violin plots: + +```Python +import matplotlib.pyplot as plt +import numpy as np + +# Creating dataset +data = [np.random.normal(0, std, 100) for std in range(1, 5)] +colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange'] + +# Creating plot using matplotlib.pyplot.subplots() +fig, ax = plt.subplots() + +# Creating violin plots +parts = ax.violinplot(data, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5) + +# Customizing colors of violins +for i, pc in enumerate(parts['bodies']): + pc.set_facecolor(colors[i]) + +# Customizing median lines +for line in parts['cmedians'].get_segments(): + ax.plot(line[:, 0], line[:, 1], color='black') + +# Customizing quartile lines +for line in parts['cmedians'].get_segments(): + ax.plot(line[:, 0], line[:, 1], linestyle='--', color='black', linewidth=2) + +# Adding mean markers +for line in parts['cmedians'].get_segments(): + ax.scatter(np.mean(line[:, 0]), np.mean(line[:, 1]), marker='o', color='black') + +# Customizing axes labels +ax.set_xlabel('X Label') +ax.set_ylabel('Y Label') + +# Adding title +ax.set_title('Customized Violin Plot') + +# Show plot +plt.show() +``` + +![Customizing violin](images/violin-plot4.png) + +In this example, we customize various components of the violin plot, such as colors, line styles, and markers, to enhance its visual appeal and clarity. Additionally, we modify the axes labels and add a title to provide context to the plot. + +### Adding Hatching to Violin Plots + +You can add hatching patterns to the violin plots to enhance their visual distinction. This can be achieved by setting the `hatch` parameter in the `violinplot()` function. + +```Python +import matplotlib.pyplot as plt +import numpy as np + +# Creating dataset +data = [np.random.normal(0, std, 100) for std in range(1, 5)] +colors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange'] +hatches = ['/', '\\', '|', '-'] + +# Creating plot using matplotlib.pyplot.subplots() +fig, ax = plt.subplots() + +# Creating violin plots with hatching +parts = ax.violinplot(data, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5) + +for i, pc in enumerate(parts['bodies']): + pc.set_facecolor(colors[i]) + pc.set_hatch(hatches[i]) + +# Show plot +plt.show() +``` + +![violin_hatching](images/violin-hatching.png) + + + +### Labeling Violin Plots + +You can add `labels` to violin plots to provide additional information about the data. This can be achieved by setting the label parameter in the `violinplot()` function. + +An example in shown here: + +```Python +import matplotlib.pyplot as plt +import numpy as np + +# Creating dataset +data = [np.random.normal(0, std, 100) for std in range(1, 5)] +labels = ['Group {}'.format(i) for i in range(1, 5)] + +# Creating plot using matplotlib.pyplot.subplots() +fig, ax = plt.subplots() + +# Creating violin plots +parts = ax.violinplot(data, showmeans=False, showextrema=False, showmedians=True, widths=0.9, bw_method=0.5) + +# Adding labels to violin plots +for i, label in enumerate(labels): + parts['bodies'][i].set_label(label) + +# Show plot +plt.legend() +plt.show() +``` +![violin_labeling](images/violin-labelling.png) + +In this example, each violin plot is labeled according to its group, providing context to the viewer. +These customizations can be combined and further refined to create violin plots that effectively convey the underlying data distributions. + +### Stacked Violin Plots + +`Stacked violin plots` are useful when you want to compare the distribution of a `single` variable across different categories or groups. In a stacked violin plot, violins for each category or group are `stacked` on top of each other, allowing for easy visual comparison. + +```Python +import matplotlib.pyplot as plt +import numpy as np + +# Generating sample data +np.random.seed(0) +data1 = np.random.normal(0, 1, 100) +data2 = np.random.normal(2, 1, 100) +data3 = np.random.normal(1, 1, 100) + +# Creating a stacked violin plot +plt.violinplot([data1, data2, data3], showmedians=True) + +# Adding labels to x-axis ticks +plt.xticks([1, 2, 3], ['Group 1', 'Group 2', 'Group 3']) + +# Adding title and labels +plt.title('Stacked Violin Plot') +plt.xlabel('Groups') +plt.ylabel('Values') + +# Displaying the plot +plt.show() +``` +![stacked violin plots](images/stacked_violin_plots.png) + + +### Split Violin Plots + +`Split violin plots` are effective for comparing the distribution of a `single variable` across `two` different categories or groups. In a split violin plot, each violin is split into two parts representing the distributions of the variable for each category. + +```Python +import matplotlib.pyplot as plt +import numpy as np + +# Generating sample data +np.random.seed(0) +data_male = np.random.normal(0, 1, 100) +data_female = np.random.normal(2, 1, 100) + +# Creating a split violin plot +plt.violinplot([data_male, data_female], showmedians=True) + +# Adding labels to x-axis ticks +plt.xticks([1, 2], ['Male', 'Female']) + +# Adding title and labels +plt.title('Split Violin Plot') +plt.xlabel('Gender') +plt.ylabel('Values') + +# Displaying the plot +plt.show() +``` + +![Shadow](images/split-violin-plot.png) + +In both examples, we use Matplotlib's `violinplot()` function to create the violin plots. These unique features provide additional flexibility and insights when analyzing data distributions across different groups or categories. + diff --git a/contrib/plotting-visualization/plotly-bar-plots.md b/contrib/plotting-visualization/plotly-bar-plots.md new file mode 100644 index 00000000..5f2159a8 --- /dev/null +++ b/contrib/plotting-visualization/plotly-bar-plots.md @@ -0,0 +1,348 @@ +# Bar Plots in Plotly + +A bar plot or a bar chart is a type of data visualisation that represents data in the form of rectangular bars, with lengths or heights proportional to the values and data which they represent. The bar plots can be plotted both vertically and horizontally. + +It is one of the most widely used type of data visualisation as it is easy to interpret and is pleasing to the eyes. + +Plotly is a very powerful library for creating modern visualizations and it provides a very easy and intuitive method to create highly customized bar plots. + +## Prerequisites + +Before creating bar plots in Plotly you must ensure that you have Python, Plotly and Pandas installed on your system. + +## Introduction + +There are various ways to create bar plots in `plotly`. One of the prominent and easiest one is using `plotly.express`. Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures. On the other hand you can also use `plotly.graph_objects` to create various plots. + +Here, we'll be using `plotly.express` to create the bar plots. Also we'll be converting our datasets into pandas DataFrames which makes it extremely convenient to create plots. + +Also, note that when you execute the codes in a simple python file, the output plot will be shown in your **browser**, rather than a pop-up window like in matplotlib. If you do not want that, it is **recommended to create the plots in a notebook (like jupyter)**. For this, install an additional library `nbformat`. This way you can see the output on the notebook itself, and can also render its format to png, jpg, etc. + +## Creating a simple bar plot using `plotly.express.bar` + +With `plotly.express.bar`, each row of the DataFrame is represented as a rectangular mark. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating bar plot +fig = px.bar(df, x='Years', y='Number of Cars sold') + +# Showing plot +fig.show() +``` +![Basic Bar Plot](images/plotly-basic-bar-plot.png) + +Here, we are first creating the dataset and converting it into Pandas DataFrames using dictionaries, with its keys being DataFrame columns. Next, we are plotting the bar chart by using `px.bar`. In the `x` and `y` parameters, we have to specify a column name in the DataFrame. + +**Note:** When you generate the image using above code, it will show you an **interactive plot**, if you want image, you can download it from their itself. + +## Customizing Bar Plots + +### Adding title to the graph + +Let us create an imaginary graph of number of cars sold in a various years. Simply pass the title of your graph as a parameter in `px.bar`. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating bar plot +fig = px.bar(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years') + +# Showing plot +fig.show() +``` +![Bar Plot Title](images/plotly-bar-title.png) + +### Adding bar colors and legends + +To add different colors to different bars, simply pass the column name of the x-axis or a custom column which groups different bars in `color` parameter. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating bar plot +fig = px.bar(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years', + color='Years') + +# Showing plot +fig.show() +``` +![Bar Colors Basic](images/plotly-bar-colors-basic.png) + +Now, let us consider our previous example of number of cars sold in various years and suppose that we want to add different colors to the bars from different centuries and respective legends for better interpretation. + +The easiest way to achieve this is to add a new column to the dataframe and then pass it to the `color` parameter. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] +# Creating the relevant colors dataset +colors = ['1900s','1900s','2000s','2000s','2000s'] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold, "Century":colors} +df = pd.DataFrame(dataset) + +# Creating bar plot +fig = px.bar(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years', + color='Century') + +# Showing plot +fig.show() +``` +![Bar Colors](images/plotly-bar-colors.png) + +### Adding labels to bars + +We may want to add labels to bars representing their absolute (or truncated) values for instant and accurate reading. This can be achieved by setting `text_auto` parameter to `True`. If you want custom text then you can pass a column name to the `text` parameter. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] +colors = ['1900s','1900s','2000s','2000s','2000s'] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold, "Century":colors} +df = pd.DataFrame(dataset) + +# Creating bar plot +fig = px.bar(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years', + color='Century', + text_auto=True) + +# Showing plot +fig.show() +``` +![Bar Labels-1](images/plotly-bar-labels-1.png) + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] +colors = ['1900s','1900s','2000s','2000s','2000s'] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold, "Century":colors} +df = pd.DataFrame(dataset) + +# Creating bar plot +fig = px.bar(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years', + color='Century', + text='Century') + +# Showing plot +fig.show() +``` +![Bar Labels-2](images/plotly-bar-labels-2.png) + +You can also change the features of text (or any other element of your plot) using `fig.update_traces`. + +Here, we are changing the position of text to position it outside the bars. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] +colors = ['1900s','1900s','2000s','2000s','2000s'] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold, "Century":colors} +df = pd.DataFrame(dataset) + +# Creating bar plot +fig = px.bar(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years', + color='Century', + text_auto=True) + +# Updating bar text properties +fig.update_traces(textposition="outside", cliponaxis=False) + +# Showing plot +fig.show() +``` +![Bar Labels-3](images/plotly-bar-labels-3.png) + +### Rounded Bars + +You can create rounded by specifying the radius value to `barcornerradius` in `fig.update_layout`. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] +colors = ['1900s','1900s','2000s','2000s','2000s'] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold, "Century":colors} +df = pd.DataFrame(dataset) + +# Creating bar plot +fig = px.bar(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years', + color='Century', + text_auto=True) + +# Updating bar text properties +fig.update_traces(textposition="outside", cliponaxis=False) + +# Updating figure layout +fig.update_layout({ +'plot_bgcolor': 'rgba(255, 255, 255, 1)', +'paper_bgcolor': 'rgba(255, 255, 255, 1)', +'barcornerradius': 15 +}) + +# Showing plot +fig.show() +``` + +![Rounded Bars](images/plotly-rounded-bars.png) + +## Horizontal Bar Plot + +To create a horizontal bar plot, you just have to interchange your `x` and `y` DataFrame columns. Plotly takes care of the rest! + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] +colors = ['1900s','1900s','2000s','2000s','2000s'] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold, "Century":colors} +df = pd.DataFrame(dataset) + +# Creating bar plot +fig = px.bar(df, x='Number of Cars sold', y='Years', + title='Number of cars sold in various years', + color='Century', + text_auto=True) + +# Updating bar text properties +fig.update_traces(textposition="outside", cliponaxis=False) + +# Updating figure layout +fig.update_layout({ +'barcornerradius': 30 +}) + +# Showing plot +fig.show() +``` +![Horizontal Bar Plot](images/plotly-horizontal-bar-plot.png) + +## Plotting Long Format and Wide Format Data + +Long-form data has one row per observation, and one column per variable. This is suitable for storing and displaying multivariate data i.e. with dimension greater than 2. + +```Python +# Plotting long format data + +import plotly.express as px + +# Long format dataset +long_df = px.data.medals_long() + +# Creating Bar Plot +fig = px.bar(long_df, x="nation", y="count", color="medal", title="Long-Form Input") + +# Showing Plot +fig.show() +``` +![Long format bar plot](images/plotly-long-format-bar-plot.png) + +```Python +print(long_df) + +# Output + nation medal count +0 South Korea gold 24 +1 China gold 10 +2 Canada gold 9 +3 South Korea silver 13 +4 China silver 15 +5 Canada silver 12 +6 South Korea bronze 11 +7 China bronze 8 +8 Canada bronze 12 +``` + +Wide-form data has one row per value of one of the first variable, and one column per value of the second variable. This is suitable for storing and displaying 2-dimensional data. + +```Python +# Plotting wide format data +import plotly.express as px + +# Wide format dataset +wide_df = px.data.medals_wide() + +# Creating Bar Plot +fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input") + +# Showing Plot +fig.show() +``` +![Wide format bar plot](images/plotly-wide-format-bar-plot.png) + +```Python +print(wide_df) + +# Output + nation gold silver bronze +0 South Korea 24 13 11 +1 China 10 15 8 +2 Canada 9 12 12 \ No newline at end of file diff --git a/contrib/plotting-visualization/plotly-line-charts.md b/contrib/plotting-visualization/plotly-line-charts.md new file mode 100644 index 00000000..35a2bea3 --- /dev/null +++ b/contrib/plotting-visualization/plotly-line-charts.md @@ -0,0 +1,300 @@ +# Line Charts in Plotly + +A line chart displays information as a series of data points connected by straight line segments. It represents the change in a quantity with respect to another quantity and helps us to see trends and patterns over time or across categories. It is a basic type of chart common in many fields. For example, it is used to represent the price of stocks with respect to time, among many others. + +It is one of the most widely used type of data visualisation as it is easy to interpret and is pleasing to the eyes. + +Plotly is a very powerful library for creating modern visualizations and it provides a very easy and intuitive method to create highly customized line charts. + +## Prerequisites + +Before creating line charts in Plotly you must ensure that you have Python, Plotly and Pandas installed on your system. + +## Introduction + +There are various ways to create line charts in `plotly`. One of the prominent and easiest one is using `plotly.express`. Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures. On the other hand you can also use `plotly.graph_objects` to create various plots. + +Here, we'll be using `plotly.express` to create the line charts. Also we'll be converting our datasets into pandas DataFrames which makes it extremely convenient to create plots. + +Also, note that when you execute the codes in a simple python file, the output plot will be shown in your **browser**, rather than a pop-up window like in matplotlib. If you do not want that, it is **recommended to create the plots in a notebook (like jupyter)**. For this, install an additional library `nbformat`. This way you can see the output on the notebook itself, and can also render its format to png, jpg, etc. + +## Creating a simple line chart using `plotly.express.line` + +With `plotly.express.line`, each data point is represented as a vertex (which location is given by the x and y columns) of a polyline mark in 2D space. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating line chart +fig = px.line(df, x='Years', y='Number of Cars sold') + +# Showing plot +fig.show() +``` + +![Basic Line Chart](images/plotly-basic-line-chart.png) + +Here, we are first creating the dataset and converting it into Pandas DataFrames using dictionaries, with its keys being DataFrame columns. Next, we are plotting the line chart by using `px.line`. In the `x` and `y` parameters, we have to specify a column name in the DataFrame. + +**Note:** When you generate the image using above code, it will show you an **interactive plot**, if you want image, you can download it from their itself. + +## Customizing Line Charts + +### Adding title to the chart + +Simply pass the title of your graph as a parameter in `px.line`. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating line chart +fig = px.line(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years') + +# Showing plot +fig.show() +``` + +![Line Chart Title](images/plotly-line-title.png) + +### Adding Markers to the lines + +The `markers` argument can be set to `True` to show markers on lines. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating line chart +fig = px.line(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years', + markers=True) + +# Showing plot +fig.show() +``` + +![Line Markers](images/plotly-line-markers.png) + +### Dashed Lines + +You can plot dashed lines by changing the `dash` property of `line` to `dash` or `longdash` and passing it as a dictionary to `patch` parameter in `fig.update_traces`. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating line chart +fig = px.line(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years') + +fig.update_traces(patch={"line": {"dash": 'dash'}}) + +# Showing plot +fig.show() +``` + +![Dashed Line](images/plotly-line-dashed.png) + +### Dotted Lines + +You can plot dotted lines by changing the `dash` property of `line` to `dot` and passing it as a dictionary to `patch` parameter in `fig.update_traces`. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating line chart +fig = px.line(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years') + +fig.update_traces(patch={"line": {"dash": 'dot'}}) + +# Showing plot +fig.show() +``` + +![Dotted Line](images/plotly-line-dotted.png) + +### Dashed and Dotted Lines + +You can plot dotted lines by changing the `dash` property of `line` to `dashdot` and passing it as a dictionary to `patch` parameter in `fig.update_traces`. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating line chart +fig = px.line(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years') + +fig.update_traces(patch={"line": {"dash": 'dashdot'}}) + +# Showing plot +fig.show() +``` + +![Dotted Line](images/plotly-line-dasheddotted.png) + +### Changing line colors + +You can set custom colors to lines by changing the `color` property of `line` to `your_color` and passing it as a dictionary to `patch` parameter in `fig.update_traces`. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating line chart +fig = px.line(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years') + +fig.update_traces(patch={"line": {"color": 'red'}}) + +# Showing plot +fig.show() +``` + +![Colored Line](images/plotly-line-color.png) + +### Changing line width + +You can set custom width to lines by changing the `width` property of `line` to `your_width` and passing it as a dictionary to `patch` parameter in `fig.update_traces`. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating line chart +fig = px.line(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years') + +fig.update_traces(patch={"line": {"width": 7}}) + +# Showing plot +fig.show() +``` + +![Width Line](images/plotly-line-width.png) + +### Labeling Data Points + +You can label your data points by passing the relevant column name of your DataFrame to `text` parameter in `px.line`. + +```Python +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years":years, "Number of Cars sold":num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating line chart +fig = px.line(df, x='Years', y='Number of Cars sold', + title='Number of cars sold in various years', + text='Number of Cars sold') + +fig.update_traces(textposition="bottom right") + +# Showing plot +fig.show() +``` + +![Data Point Labelling](images/plotly-line-datapoint-label.png) + +## Plotting multiple lines + +There are several ways to plot multiple lines in plotly, like using `plotly.graph_objects`, using `fig.add_scatter`, having multiple columns in the DataFrame, etc. + +Here, we'll be creating a simple dataset of the runs scored by the end of each over by India and South Africa in recent T20 World Cup Final and plot it using plotly. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +overs = list(range(0,21)) +runs_india = [0,15,23,26,32,39,45,49,59,68,75,82,93,98,108,118,126,134,150,167,176] +runs_rsa = [0,6,11,14,22,32,42,49,62,71,81,93,101,109,123,147,151,155,157,161,169] + +# Converting dataset to pandas DataFrame +dataset = {"overs":overs, "India":runs_india, "South Africa":runs_rsa} +df = pd.DataFrame(dataset) + +# Creating line chart +fig = px.line(df, x="overs", y=["India", "South Africa"]) +fig.update_layout(xaxis_title="Overs", yaxis_title="Runs", legend_title=None) + +# Showing plot +fig.show() +``` + +![Multiple Lines](images/plotly-line-multiple-lines.png) + +To plot multiple lines, we have passed multiple columns of the DataFrame in the `y` parameter. diff --git a/contrib/plotting-visualization/plotly-pie-charts.md b/contrib/plotting-visualization/plotly-pie-charts.md new file mode 100644 index 00000000..2f788096 --- /dev/null +++ b/contrib/plotting-visualization/plotly-pie-charts.md @@ -0,0 +1,221 @@ +# Pie Charts in Plotly + +A pie chart is a type of graph that represents the data in the circular graph. The slices of pie show the relative size of the data, and it is a type of pictorial representation of data. A pie chart requires a list of categorical variables and numerical variables. Here, the term "pie" represents the whole, and the "slices" represent the parts of the whole. + +Pie charts are commonly used in business presentations like sales, operations, survey results, resources, etc. as they are pleasing to the eye and provide a quick summary. + +Plotly is a very powerful library for creating modern visualizations and it provides a very easy and intuitive method to create highly customized pie charts. + +## Prerequisites + +Before creating bar plots in Plotly you must ensure that you have Python, Plotly and Pandas installed on your system. + +## Introduction + +There are various ways to create pie charts in `plotly`. One of the prominent and easiest one is using `plotly.express`. Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures. On the other hand you can also use `plotly.graph_objects` to create various plots. + +Here, we'll be using `plotly.express` to create the pie charts. Also we'll be converting our datasets into pandas DataFrames which makes it extremely convenient and easy to create charts. + +Also, note that when you execute the codes in a simple python file, the output plot will be shown in your **browser**, rather than a pop-up window like in matplotlib. If you do not want that, it is **recommended to create the plots in a notebook (like jupyter)**. For this, install an additional library `nbformat`. This way you can see the output on the notebook itself, and can also render its format to png, jpg, etc. + +## Creating a simple pie chart using `plotly.express.pie` + +In `plotly.express.pie`, data visualized by the sectors of the pie is set in values. The sector labels are set in names. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +petals = [11,9,17,4,7] + +# Converting dataset to pandas DataFrame +dataset = {'flowers':flowers, 'petals':petals} +df = pd.DataFrame(dataset) + +# Creating pie chart +fig = px.pie(df, values='petals', names='flowers') + +# Showing plot +fig.show() +``` +![Basic Pie Chart](images/plotly-basic-pie-chart.png) + +Here, we are first creating the dataset and converting it into Pandas DataFrames using dictionaries, with its keys being DataFrame columns. Next, we are plotting the pie chart by using `px.pie`. In the `values` and `names` parameters, we have to specify a column name in the DataFrame. + +`px.pie(df, values='Petals', names='Flowers')` is used to specify that the pie chart is to be plotted by taking the values from column `Petals` and the fractional area of each slice is represented by **petal/sum(petals)**. The column `flowers` represents the labels of slices corresponding to each value in `petals`. + +**Note:** When you generate the image using above code, it will show you an **interactive plot**, if you want image, you can download it from their itself. + +## Customizing Pie Charts + +### Adding title to the chart + +Simply pass the title of your chart as a parameter in `px.pie`. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +petals = [11,9,17,4,7] + +# Converting dataset to pandas DataFrame +dataset = {'flowers':flowers, 'petals':petals} +df = pd.DataFrame(dataset) + +# Creating pie chart +fig = px.pie(df, values='petals', names='flowers', + title='Number of Petals in Flowers') + +# Showing plot +fig.show() +``` +![Title in Pie Chart](images/plotly-pie-title.png) + +### Coloring Slices + +There are a lot of beautiful color scales available in plotly and can be found here [plotly color scales](https://plotly.com/python/builtin-colorscales/). Choose your favourite colorscale apply it like this: + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +petals = [11,9,17,4,7] + +# Converting dataset to pandas DataFrame +dataset = {'flowers':flowers, 'petals':petals} +df = pd.DataFrame(dataset) + +# Creating pie chart +fig = px.pie(df, values='petals', names='flowers', + title='Number of Petals in Flowers', + color_discrete_sequence=px.colors.sequential.Agsunset) + +# Showing plot +fig.show() +``` +![Pie Chart Colors-1](images/plotly-pie-color-1.png) + +You can also set custom colors for each label by passing it as a dictionary(map) in `color_discrete_map`, like this: + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +petals = [11,9,17,4,7] + +# Converting dataset to pandas DataFrame +dataset = {'flowers':flowers, 'petals':petals} +df = pd.DataFrame(dataset) + +# Creating pie chart +fig = px.pie(df, values='petals', names='flowers', + title='Number of Petals in Flowers', + color='flowers', + color_discrete_map={'Rose':'red', + 'Tulip':'magenta', + 'Marigold':'green', + 'Sunflower':'yellow', + 'Daffodil':'royalblue'}) + +# Showing plot +fig.show() +``` +![Pie Chart Colors-1](images/plotly-pie-color-2.png) + +### Labeling Slices + +You can use `fig.update_traces` to effectively control the properties of text being displayed on your figure, for example if we want both flower name , petal count and percentage in our slices, we can do it like this: + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +petals = [11,9,17,4,7] + +# Converting dataset to pandas DataFrame +dataset = {'flowers':flowers, 'petals':petals} +df = pd.DataFrame(dataset) + +# Creating pie chart +fig = px.pie(df, values='petals', names='flowers', + title='Number of Petals in Flowers') + +# Updating text properties +fig.update_traces(textposition='inside', textinfo='label+value+percent') + +# Showing plot +fig.show() +``` +![Pie Labels](images/plotly-pie-labels.png) + +### Pulling out a slice + +To pull out a slice pass an array to parameter `pull` in `fig.update_traces` corresponding to the slices and amount to be pulled. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +petals = [11,9,17,4,7] + +# Converting dataset to pandas DataFrame +dataset = {'flowers':flowers, 'petals':petals} +df = pd.DataFrame(dataset) + +# Creating pie chart +fig = px.pie(df, values='petals', names='flowers', + title='Number of Petals in Flowers') + +# Updating text properties +fig.update_traces(textposition='inside', textinfo='label+value') + +# Pulling out slice +fig.update_traces(pull=[0,0,0,0.2,0]) + +# Showing plot +fig.show() +``` +![Slice Pull](images/plotly-pie-pull.png) + +### Pattern Fills + +You can also add patterns (hatches), in addition to colors in pie charts. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +petals = [11,9,17,4,7] + +# Converting dataset to pandas DataFrame +dataset = {'flowers':flowers, 'petals':petals} +df = pd.DataFrame(dataset) + +# Creating pie chart +fig = px.pie(df, values='petals', names='flowers', + title='Number of Petals in Flowers') + +# Updating text properties +fig.update_traces(textposition='outside', textinfo='label+value') + +# Adding pattern fills +fig.update_traces(marker=dict(pattern=dict(shape=[".", "/", "+", "-","+"]))) + +# Showing plot +fig.show() +``` +![Pattern Fills Pie Chart](images/plotly-pie-patterns.png) \ No newline at end of file diff --git a/contrib/plotting-visualization/plotly-scatter-plots.md b/contrib/plotting-visualization/plotly-scatter-plots.md new file mode 100644 index 00000000..dc40b1f7 --- /dev/null +++ b/contrib/plotting-visualization/plotly-scatter-plots.md @@ -0,0 +1,198 @@ +# Scatter Plots in Plotly + +* A scatter plot is a type of data visualization that uses dots to show values for two variables, with one variable on the x-axis and the other on the y-axis. It's useful for identifying relationships, trends, and correlations, as well as spotting clusters and outliers. +* The dots on the plot shows how the variables are related. A scatter plot is made with the plotly library's `px.scatter()`. + +## Prerequisites + +Before creating Scatter plots in Plotly you must ensure that you have Python, Plotly and Pandas installed on your system. + +## Introduction + +There are various ways to create Scatter plots in `plotly`. One of the prominent and easiest one is using `plotly.express`. Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures. On the other hand you can also use `plotly.graph_objects` to create various plots. + +Here, we'll be using `plotly.express` to create the Scatter Plots. Also we'll be converting our datasets into pandas DataFrames which makes it extremely convenient and easy to create charts. + +Also, note that when you execute the codes in a simple python file, the output plot will be shown in your **browser**, rather than a pop-up window like in matplotlib. If you do not want that, it is **recommended to create the plots in a notebook (like jupyter)**. For this, install an additional library `nbformat`. This way you can see the output on the notebook itself, and can also render its format to png, jpg, etc. + +## Creating a simple Scatter Plot using `plotly.express.scatter` + +In `plotly.express.scatter`, each data point is represented as a marker point, whose location is given by the x and y columns. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating scatter plot +fig = px.scatter(df, x='Years', y='Number of Cars sold') + +# Showing plot +fig.show() +``` +![Basic Scatter Plot](images/plotly-basic-scatter-plot.png) + +Here, we are first creating the dataset and converting it into a pandas DataFrame using a dictionary, with its keys being DataFrame columns. Next, we are plotting the scatter plot by using `px.scatter`. In the `x` and `y` parameters, we have to specify a column name in the DataFrame. + +`px.scatter(df, x='Years', y='Number of Cars sold')` is used to specify that the scatter plot is to be plotted by taking the values from column `Years` for the x-axis and the values from column `Number of Cars sold` for the y-axis. + +Note: When you generate the image using the above code, it will show you an interactive plot. If you want an image, you can download it from the interactive plot itself. + +## Customizing Scatter Plots + +### Adding title to the plot + +Simply pass the title of your plot as a parameter in `px.scatter`. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating scatter plot +fig = px.scatter(df, x='Years', y='Number of Cars sold' ,title='Number of cars sold in various years') + +# Showing plot +fig.show() +``` +![Scatter Plot title](images/plotly-scatter-title.png) + +### Adding bar colors and legends + +* To add different colors to different bars, simply pass the column name of the x-axis or a custom column which groups different bars in `color` parameter. +* There are a lot of beautiful color scales available in plotly and can be found here [plotly color scales](https://plotly.com/python/builtin-colorscales/). Choose your favourite colorscale apply it like this: + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +flowers = ['Rose','Tulip','Marigold','Sunflower','Daffodil'] +petals = [11,9,17,4,7] + +# Converting dataset to pandas DataFrame +dataset = {'flowers':flowers, 'petals':petals} +df = pd.DataFrame(dataset) + +# Creating pie chart +fig = px.pie(df, values='petals', names='flowers', + title='Number of Petals in Flowers', + color_discrete_sequence=px.colors.sequential.Agsunset) + +# Showing plot +fig.show() +``` +![Scatter Plot Colors-1](images/plotly-scatter-colour.png) + +You can also set custom colors for each label by passing it as a dictionary(map) in `color_discrete_map`, like this: + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating scatter plot +fig = px.scatter(df, x='Years', + y='Number of Cars sold' , + title='Number of cars sold in various years', + color='Years', + color_discrete_map={'1998':'red', + '1999':'magenta', + '2000':'green', + '2001':'yellow', + '2002':'royalblue'}) + +# Showing plot +fig.show() +``` +![Scatter Plot Colors-1](images/plotly-scatter-colour-2.png) + +### Setting Size of Scatter + +We may want to set the size of different scatters for visibility differences between categories. This can be done by using the `size` parameter in `px.scatter`, where we specify a column in the DataFrame that determines the size of each scatter point. + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating scatter plot +fig = px.scatter(df, x='Years', + y='Number of Cars sold' , + title='Number of cars sold in various years', + color='Years', + color_discrete_map={'1998':'red', + '1999':'magenta', + '2000':'green', + '2001':'yellow', + '2002':'royalblue'}, + size='Number of Cars sold') + +# Showing plot +fig.show() +``` +![Scatter plot size](images/plotly-scatter-size.png) + +### Giving a hover effect + +you can use the `hover_name` and `hover_data` parameters in `px.scatter`. The `hover_name` parameter specifies the column to use for the `hover text`, and the `hover_data` parameter allows you to specify additional data to display when hovering over a point + +```Python +import plotly.express as px +import pandas as pd + +# Creating dataset +years = ['1998', '1999', '2000', '2001', '2002'] +num_of_cars_sold = [200, 300, 500, 700, 1000] + +# Converting dataset to pandas DataFrame +dataset = {"Years": years, "Number of Cars sold": num_of_cars_sold} +df = pd.DataFrame(dataset) + +# Creating scatter plot +fig = px.scatter(df, x='Years', + y='Number of Cars sold' , + title='Number of cars sold in various years', + color='Years', + color_discrete_map={'1998':'red', + '1999':'magenta', + '2000':'green', + '2001':'yellow', + '2002':'royalblue'}, + size='Number of Cars sold', + hover_name='Years', + hover_data={'Number of Cars sold': True}) + +# Showing plot +fig.show() +``` +![Scatter Hover](images/plotly-scatter-hover.png) + diff --git a/contrib/plotting-visualization/seaborn-basics.md b/contrib/plotting-visualization/seaborn-basics.md new file mode 100644 index 00000000..42df5522 --- /dev/null +++ b/contrib/plotting-visualization/seaborn-basics.md @@ -0,0 +1,39 @@ +Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of your plots mean, rather than on the details of how to draw them. + +Here’s an example of what seaborn can do: +```Python +# Import seaborn +import seaborn as sns + +# Apply the default theme +sns.set_theme() + +# Load an example dataset +tips = sns.load_dataset("tips") + +# Create a visualization +sns.relplot( + data=tips, + x="total_bill", y="tip", col="time", + hue="smoker", style="smoker", size="size", +) +``` +Below is the output for the above code snippet: + +![Seaborn intro image](images/seaborn-basics1.png) + +```Python +# Load an example dataset +tips = sns.load_dataset("tips") +``` +Most code in the docs will use the `load_dataset()` function to get quick access to an example dataset. There’s nothing special about these datasets: they are just pandas data frames, and we could have loaded them with `pandas.read_csv()` or build them by hand. Many users specify data using pandas data frames, but Seaborn is very flexible about the data structures that it accepts. + +```Python +# Create a visualization +sns.relplot( + data=tips, + x="total_bill", y="tip", col="time", + hue="smoker", style="smoker", size="size", +) +``` +This plot shows the relationship between five variables in the tips dataset using a single call to the seaborn function `relplot()`. Notice how only the names of the variables and their roles in the plot are provided. Unlike when using matplotlib directly, it wasn’t necessary to specify attributes of the plot elements in terms of the color values or marker codes. Behind the scenes, seaborn handled the translation from values in the dataframe to arguments that Matplotlib understands. This declarative approach lets you stay focused on the questions that you want to answer, rather than on the details of how to control matplotlib. diff --git a/contrib/plotting-visualization/seaborn-intro.md b/contrib/plotting-visualization/seaborn-intro.md new file mode 100644 index 00000000..6e0a7d82 --- /dev/null +++ b/contrib/plotting-visualization/seaborn-intro.md @@ -0,0 +1,41 @@ +Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. + +## Seaborn Installation +Before installing Matplotlib, ensure you have Python installed on your system. You can download and install Python from the [official Python website](https://www.python.org/). + +Below are the steps to install and setup Seaborn: + +1. Open your terminal or command prompt and run the following command to install Seaborn using `pip`: + +```bash +pip install seaborn +``` + +2. The basic invocation of `pip` will install seaborn and, if necessary, its mandatory dependencies. It is possible to include optional dependencies that give access to a few advanced features: +```bash +pip install seaborn[stats] +``` + +3. The library is also included as part of the Anaconda distribution, and it can be installed with `conda`: +```bash +conda install seaborn +``` + +4. As the main Anaconda repository can be slow to add new releases, you may prefer using the conda-forge channel: +```bash +conda install seaborn -c conda-forge +``` + +## Dependencies +### Supported Python versions +- Python 3.8+ + +### Mandatory Dependencies + - [numpy](https://numpy.org/) + - [pandas](https://pandas.pydata.org/) + - [matplotlib](https://matplotlib.org/) + +### Optional Dependencies + - [statsmodels](https://www.statsmodels.org/stable/index.html) for advanced regression plots + - [scipy](https://scipy.org/) for clustering matrices and some advanced options + - [fastcluster](https://pypi.org/project/fastcluster/) for faster clustering of large matrices diff --git a/contrib/plotting-visualization/seaborn-plotting.md b/contrib/plotting-visualization/seaborn-plotting.md new file mode 100644 index 00000000..655f625e --- /dev/null +++ b/contrib/plotting-visualization/seaborn-plotting.md @@ -0,0 +1,259 @@ +# Seaborn + +Seaborn is a powerful and easy-to-use data visualization library in Python built on top of Matplotlib.It provides a high-level interface for drawing attractive and informative statistical graphics.Now we will cover various functions covered by Seaborn, along with examples to illustrate their usage. +Seaborn simplifies the process of creating complex visualizations with a few lines of code and it integrates closely with pandas data structure , making it an excellent choice for data analysis and exploration. + +## Setting up Seaborn + +Make sure seaborn library is installed in your system. If not use command +`pip install seaborn` + +After installing you are all set to experiment with plotting functions. + +```python +#import necessary libraries + +import seaborn as sns +import matplotlib.pyplot as plt +import pandas as pd +``` + +Seaborn includes several built-in datasets that you can use for practice +You can list all available datasets using below command +```python +sns.get_dataset_names() +``` + +Here we are using 'tips' dataset + +```python +# loading an example dataset +tips=sns.load_dataset('tips') +``` + +Before delving into plotting, make yourself comfortable with the dataset. To do that, use the pandas library to understand what information the dataset contains and preprocess the data. If you get stuck, feel free to refer to the pandas documentation. + +## Relational Plots + +Relational plots are used to visualize the relationship between two or more variables + +### Scatter Plot +A scatter plot displays data points based on two numerical variables.Seaborn `scatterplot` function allows you to create scatter plots with ease + +```python +# scatter plot using Seaborn + +plt.figure(figsize=(5,5)) +sns.scatterplot(data=tips,x='total_bill',y='tip',hue='day',style='time') +plt.title('Scatter Plot of Total Bill vs Tip') +plt.show() +``` +![scatter plot](images/seaborn-plotting/image1.png) + +### Line Plot +A line plot connects data points in the order they appear in the dataset.This is useful for time series data.`lineplot` function allows you to create lineplots. + +```python +# lineplot using seaborn + +plt.figure(figsize=(5,5)) +sns.lineplot(data=tips,x='size',y='total_bill',hue='day') +plt.title('Line Plot of Total Bill by Size and Day') +plt.show() +``` + +![lineplot](images/seaborn-plotting/image2.png) + +## Distribution Plots + +Distribution Plots visualize the distribution of a single numerical variable + +### HistPlot +A histplot displays the distribution of a numerical variable by dividing the data into bins. + +```python +# Histplot using Seaborn + +plt.figure(figsize=(5,5)) +sns.histplot(data=tips, x='total_bill', kde=True) +plt.title('Histplot of Total Bill') +plt.show() +``` +![Histplot](images/seaborn-plotting/image3.png) + +### KDE Plot +A Kernel Density Estimate (KDE) plot represents the distribution of a variable as a smooth curve. + +```python +# KDE Plot using Seaborn + +plt.figure(figsize=(5,5)) +sns.kdeplot(data=tips, x='total_bill', hue='sex', fill=True) +plt.title('KDE Plot of Total Bill by Sex') +plt.show() +``` +![KDE](images/seaborn-plotting/image4.png) + +### ECDF Plot +An Empirical Cumulative Distribution Function (ECDF) plot shows the proportion of data points below each value. + +```python +# ECDF Plot using Seaborn + +plt.figure(figsize=(5,5)) +sns.ecdfplot(data=tips, x='total_bill', hue='sex') +plt.title('ECDF Plot of Total Bill by Sex') +plt.show() +``` +![ECDF](images/seaborn-plotting/image5.png) + +### Rug Plot +A rug plot in Seaborn is a simple way to show the distribution of a variable by drawing small vertical lines (or "rugs") at each data point along the x-axis. + +```python +# Rug Plot using Seaborn + +plt.figure(figsize=(3,3)) +sns.rugplot(x='total_bill', data=tips) +plt.title('Rug Plot of Total Bill Amounts') +plt.show() +``` +![Rug](images/seaborn-plotting/image6.png) + +## Categorical Plots +Categorical plots are used to visualize data where one or more variables are categorical. + +### Bar Plot + +A bar plot shows the relationship between a categorical variable and a numerical variable. +```python +# Bar Plot using Seaborn + +plt.figure(figsize=(5,5)) +sns.barplot(data=tips,x='day',y='total_bill',hue='sex') +plt.title('Bar Plot of Total Bill by Day and Sex') +plt.show() + ``` +![Bar](images/seaborn-plotting/image7.png) + +### Point Plot +A point plot in Seaborn is used to show the relationship between two categorical variables, with the size of the points representing the values of third variable. + +```python +# Point Plot using Seaborn + +plt.figure(figsize=(5,5)) +sns.pointplot(x='day',y='total_bill',hue='sex',data=tips) +plt.title('Average Total Bill by Day and Sex') +plt.show() +``` +![Point](images/seaborn-plotting/image8.png) + +### Box Plot +A box plot displays the distribution of a numerical variable across different categories. + +```python +# Box Plot using Seaborn + +plt.figure(figsize=(5,5)) +sns.boxplot(data=tips, x='day', y='total_bill', hue='sex') +plt.title('Box Plot of Total Bill by Day and Sex') +plt.show() +``` +![Box](images/seaborn-plotting/image9.png) + +### Violin Plot +A violin plot combines aspects of a box plot and a KDE plot to show the distribution of data + +```python +# Violin Plot using Seaborn + +plt.figure(figsize=(5,5)) +sns.violinplot(data=tips,x='day',y='total_bill',hue='sex',split=True) +plt.title('Violin Plot of Total Bill by Day and Sex') +plt.show() +``` +![Violin](images/seaborn-plotting/image10.png) + +## Matrix Plots +Matrix plots are useful for visualizing data in a matrix format. + +### Heatmap +A heatmap displays data in a matrix where values are represented by color. + +```python +# Heatmap using Seaborn + +plt.figure(figsize=(10,8)) +flights = sns.load_dataset('flights') +flights_pivot = flights.pivot(index='month', columns='year', values='passengers') +sns.heatmap(flights_pivot, annot=True, fmt='d', cmap='YlGnBu') +plt.title('Heatmap of Flight Passengers') +plt.show() +``` +![Heatmap](images/seaborn-plotting/image11.png) + +## Pair Plot +A pair plot shows the pairwise relationships between multiple variables in a dataset. + +```python +#Pairplot using Seaborn + +plt.figure(figsize=(5,5)) +sns.pairplot(tips, hue='sex') +plt.title('Pair Plot of Tips Dataset') +plt.show() +``` +![Pair](images/seaborn-plotting/image12.png) + +## FacetGrid +FacetGrid allows you to create a grid of plots based on the values of one or more categorical variables. + +```python +#Facetgrid using Seaborn + +plt.figure(figsize=(5,5)) +g = sns.FacetGrid(tips, col='sex', row='time', margin_titles=True) +g.map(sns.scatterplot, 'total_bill', 'tip') +plt.show() +``` +![Facetgrid](images/seaborn-plotting/image13.png) + +## Customizing Seaborn Plots +Seaborn plots can be customized to improve their appearance and convey more information. + +### Changing the Aesthetic Style +Seaborn comes with several built-in themes. + +```python +sns.set_style('whitegrid') +sns.scatterplot(data=tips, x='total_bill', y='tip') +plt.title('Scatter Plot with Whitegrid Style') +plt.show() +``` +![Aesthetic](images/seaborn-plotting/image14.png) + +### Customizing Colors +You can use color palettes to customize the colors in your plots. + +```python +sns.set_palette('pastel') +sns.barplot(data=tips, x='day', y='total_bill', hue='sex') +plt.title('Bar Plot with Pastel Palette') +plt.show() +``` +![Colors](images/seaborn-plotting/image15.png) + +### Adding Titles and Labels +Titles and labels can be added to make plots more informative. + +```python +plot = sns.scatterplot(data=tips, x='total_bill', y='tip') +plot.set_title('Scatter Plot of Total Bill vs Tip') +plot.set_xlabel('Total Bill ($)') +plot.set_ylabel('Tip ($)') +plt.show() +``` +![Titles](images/seaborn-plotting/image16.png) + +Seaborn is a versatile library that simplifies the creation of complex visualizations. By using Seaborn's plotting functions, you can create a wide range of statistical graphics with minimal effort. Whether you're working with relational data, categorical data, or distributions, Seaborn provides the tools you need to visualize your data effectively. diff --git a/contrib/question-bank/index.md b/contrib/question-bank/index.md new file mode 100644 index 00000000..82596a2f --- /dev/null +++ b/contrib/question-bank/index.md @@ -0,0 +1,3 @@ +# List of sections + +- [Section title](filename.md) diff --git a/contrib/scipy/index.md b/contrib/scipy/index.md new file mode 100644 index 00000000..5425c4ac --- /dev/null +++ b/contrib/scipy/index.md @@ -0,0 +1,5 @@ +# List of sections + +- [Installation of Scipy and its key uses](installation_features.md) +- [SciPy Graphs](scipy-graphs.md) + diff --git a/contrib/scipy/installation_features.md b/contrib/scipy/installation_features.md new file mode 100644 index 00000000..d5410869 --- /dev/null +++ b/contrib/scipy/installation_features.md @@ -0,0 +1,173 @@ +## Installation of Scipy + +You can install scipy using the command: + +``` +$ pip install scipy +``` + +You can also use a Python distribution that already has Scipy installed like Anaconda, or Spyder. + +### Importing SciPy + +```python +from scipy import constants +``` + +## Key Features of SciPy +### 1. Numerical Integration + +It helps in computing definite or indefinite integrals of functions + +```python +from scipy import integrate + +#Define the function to integrate +def f(x): + return x**2 + +#Compute definite integral of f from 0 to 1 +result, error = integrate.quad(f, 0, 1) +print(result) +``` + +#### Output + +``` +0.33333333333333337 +``` + +### 2. Optimization + +It can be used to minimize or maximize functions, here is an example of minimizing roots of an equation + +```python +from scipy.optimize import minimize + +# Define an objective function to minimize +def objective(x): + return x**2 + 10*np.sin(x) + +# Minimize the objective function starting from x=0 +result = minimize(objective, x0=0) +print(result.x) +``` + +#### Output + +``` +array([-1.30644012]) +``` + +### 3. Linear Algebra + +Solving Linear computations + +```python +from scipy import linalg +import numpy as np + +# Define a square matrix +A = np.array([[1, 2], [3, 4]]) + +# Define a vector +b = np.array([5, 6]) + +# Solve Ax = b for x +x = linalg.solve(A, b) +print(x) +``` + +#### Output + +``` +array([-4. , 4.5]) +``` + +### 4. Statistics + +Performing statistics functions, like here we'll be distributing the data + +```python +from scipy import stats +import numpy as np + +# Generate random data from a normal distribution +data = stats.norm.rvs(loc=0, scale=1, size=1000) + +# Fit a normal distribution to the data +mean, std = stats.norm.fit(data) +``` + +### 5. Signal Processing + +To process spectral signals, like EEG or MEG + +```python +from scipy import signal +import numpy as np + +# Create a signal (e.g., sine wave) +t = np.linspace(0, 1, 1000) +signal = np.sin(2 * np.pi * 5 * t) + 0.5 * np.random.randn(1000) + +# Apply a low-pass Butterworth filter +b, a = signal.butter(4, 0.1, 'low') +filtered_signal = signal.filtfilt(b, a, signal) +``` + +The various filters applied that are applied here, are a part of signal analysis at a deeper level. + +### 6. Sparse Matrix + +The word ' sparse 'means less, i.e., the data is mostly unused during some operation or analysis. So, to handle this data, a Sparse Matrix is created + +There are two types of Sparse Matrices: + +1. CSC: Compressed Sparse Column, it is used for efficient math functions and for column slicing +2. CSR: Compressed Sparse Row, it is used for fast row slicing + +#### In CSC format + +```python +from scipy import sparse +import numpy as np + +data = np.array([[0, 0], [0, 1], [2, 0]]) + +row_indices = np.array([1, 2, 1]) +col_indices = np.array([1, 0, 2]) +values = np.array([1, 2, 1]) + +sparse_matrix_csc = sparse.csc_matrix((values, (row_indices, col_indices))) +``` + +#### In CSR format + +```python +from scipy import sparse +import numpy as np + +data = np.array([[0, 0], [0, 1], [2, 0]]) +sparse_matrix = sparse.csr_matrix(data) +``` + +### 7. Image Processing + +It is used to process the images, like changing dimensions or properties. For example, when you're doing a project on medical imaging, this library is mainly used. + +```python +from scipy import ndimage +import matplotlib.pyplot as plt + +image = plt.imread('path/to/image.jpg') +plt.imshow(image) +plt.show() + +# Apply Gaussian blur to the image +blurred_image = ndimage.gaussian_filter(image, sigma=1) +plt.imshow(blurred_image) +plt.show() +``` + +The gaussian blur is one of the properties of the ' ndimage ' package in SciPy libraries, it used for better understanding of the image. diff --git a/contrib/scipy/scipy-graphs.md b/contrib/scipy/scipy-graphs.md new file mode 100644 index 00000000..60bb5659 --- /dev/null +++ b/contrib/scipy/scipy-graphs.md @@ -0,0 +1,165 @@ +# SciPy Graphs +Graphs are also a type of data structure, SciPy provides a module called scipy.sparse.csgraph for working with graphs. + +## Adjacency Matrix +An adjacency matrix is a way of representing a graph using a square matrix. In the matrix, the element at the i-th row and j-th column indicates whether there is an edge from vertex +i to vertex j. + +```python +import numpy as np +from scipy.sparse import csr_matrix + +adj_matrix = np.array([ + [0, 1, 0, 0], + [1, 0, 1, 0], + [0, 1, 0, 1], + [0, 0, 1, 0] +]) + + +sparse_matrix = csr_matrix(adj_matrix) + +print(sparse_matrix) +``` + +In this example: + +1. The graph has 4 nodes. +2. is an edge between node 0 and node 1, node 1 and node 2, and node 2 and node 3. +3. The csr_matrix function converts the dense adjacency matrix into a compressed sparse row (CSR) format, which is efficient for storing large, sparse matrices. + +## Floyd Warshall + +The Floyd-Warshall algorithm is a classic algorithm used to find the shortest paths between all pairs of nodes in a weighted graph. + +```python +import numpy as np +from scipy.sparse.csgraph import floyd_warshall +from scipy.sparse import csr_matrix + +arr = np.array([ + [0, 1, 2], + [1, 0, 0], + [2, 0, 0] +]) + +newarr = csr_matrix(arr) + +print(floyd_warshall(newarr, return_predecessors=True)) +``` + +#### Output + +``` +(array([[0., 1., 2.], + [1., 0., 3.], + [2., 3., 0.]]), array([[-9999, 0, 0], + [ 1, -9999, 0], + [ 2, 0, -9999]], dtype=int32)) +``` + +## Dijkstra + +Dijkstra's algorithm is used to find the shortest path from a source node to all other nodes in a graph with non-negative edge weights. + +```python +import numpy as np +from scipy.sparse.csgraph import dijkstra +from scipy.sparse import csr_matrix + +arr = np.array([ + [0, 1, 2], + [1, 0, 0], + [2, 0, 0] +]) + +newarr = csr_matrix(arr) + +print(dijkstra(newarr, return_predecessors=True, indices=0)) +``` + +#### Output + +``` +(array([ 0., 1., 2.]), array([-9999, 0, 0], dtype=int32)) +``` + +## Bellman Ford + +The Bellman-Ford algorithm is used to find the shortest path from a single source vertex to all other vertices in a weighted graph. It can handle graphs with negative weights, and it also detects negative weight cycles. + +```python +import numpy as np +from scipy.sparse.csgraph import bellman_ford +from scipy.sparse import csr_matrix + +arr = np.array([ + [0, -1, 2], + [1, 0, 0], + [2, 0, 0] +]) + +newarr = csr_matrix(arr) + +print(bellman_ford(newarr, return_predecessors=True, indices=0)) +``` + +#### Output + +``` +(array([ 0., -1., 2.]), array([-9999, 0, 0], dtype=int32)) +``` + +## Depth First Order + +Depth-First Search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root and explores as far as possible along each branch before backtracking. + +```python +import numpy as np +from scipy.sparse.csgraph import depth_first_order +from scipy.sparse import csr_matrix + +arr = np.array([ + [0, 1, 0, 1], + [1, 1, 1, 1], + [2, 1, 1, 0], + [0, 1, 0, 1] +]) + +newarr = csr_matrix(arr) + +print(depth_first_order(newarr, 1)) +``` + +#### Output + +``` +(array([1, 0, 3, 2], dtype=int32), array([ 1, -9999, 1, 0], dtype=int32)) +``` + +## Breadth First Order + +Breadth-First Search (BFS) is an algorithm for traversing or searching tree or graph data structures. It starts at the root present depth level before moving on to nodes at the next depth level. + +```python +import numpy as np +from scipy.sparse.csgraph import breadth_first_order +from scipy.sparse import csr_matrix + +arr = np.array([ + [0, 1, 0, 1], + [1, 1, 1, 1], + [2, 1, 1, 0], + [0, 1, 0, 1] +]) + +newarr = csr_matrix(arr) + +print(breadth_first_order(newarr, 1)) +``` + +### Output + +``` +(array([1, 0, 2, 3], dtype=int32), array([ 1, -9999, 1, 1], dtype=int32)) +``` diff --git a/contrib/web-scrapping/flask.md b/contrib/web-scrapping/flask.md new file mode 100644 index 00000000..d65da024 --- /dev/null +++ b/contrib/web-scrapping/flask.md @@ -0,0 +1,436 @@ +# Introduction to Flask: A Python Web Framework + +## Table of Contents +1. Introduction +2. Prerequisites +3. Setting Up Your Environment +4. Creating Your First Flask Application + - Project Structure + - Hello World Application +5. Routing +6. Templates and Static Files + - Jinja2 Templating Engine + - Serving Static Files +7. Working with Forms + - Handling Form Data +8. Database Integration + - Setting Up SQLAlchemy + - Performing CRUD Operations +9. Error Handling +10. Testing Your Application +11. Deploying Your Flask Application + - Using Gunicorn + - Deploying to Render +12. Conclusion +13. Further Reading and Resources + +--- + +## 1. Introduction +Flask is a lightweight WSGI web application framework in Python. It is designed with simplicity and flexibility in mind, allowing developers to create web applications with minimal setup. Flask was created by Armin Ronacher as part of the Pocoo project and has gained popularity for its ease of use and extensive documentation. + +## 2. Prerequisites +Before starting with Flask, ensure you have the following: +- Basic knowledge of Python. +- Understanding of web development concepts (HTML, CSS, JavaScript). +- Python installed on your machine (version 3.6 or higher). +- pip (Python package installer) installed. + +## 3. Setting Up Your Environment +1. **Install Python**: Download and install Python from python.org. +2. **Create a Virtual Environment**: + ``` + python -m venv venv + ``` +3. **Activate the Virtual Environment**: + - On Windows: + ``` + venv\Scripts\activate + ``` + - On macOS/Linux: + ``` + source venv/bin/activate + ``` +4. **Install Flask**: + ``` + pip install Flask + ``` + +## 4. Creating Your First Flask Application +### Project Structure +A typical Flask project structure might look like this: +``` +my_flask_app/ + app/ + __init__.py + routes.py + templates/ + static/ + venv/ + run.py +``` + +### Hello World Application +1. **Create a Directory for Your Project**: + ``` + mkdir my_flask_app + cd my_flask_app + ``` +2. **Initialize the Application**: + - Create `app/__init__.py`: + ```python + from flask import Flask + + def create_app(): + app = Flask(__name__) + + with app.app_context(): + from . import routes + return app + ``` + - Create `run.py`: + ```python + from app import create_app + + app = create_app() + + if __name__ == "__main__": + app.run(debug=True) + ``` + - Create `app/routes.py`: + ```python + from flask import current_app as app + + @app.route('/') + def hello_world(): + return 'Hello, World!' + ``` + +3. **Run the Application**: + ``` + python run.py + ``` + Navigate to `http://127.0.0.1:5000` in your browser to see "Hello, World!". + +## 5. Routing +In Flask, routes are defined using the `@app.route` decorator. Here's an example of different routes: + +```python +from flask import Flask + +app = Flask(__name__) + +@app.route('/') +def home(): + return 'Home Page' + +@app.route('/about') +def about(): + return 'About Page' + +@app.route('/user/') +def show_user_profile(username): + return f'User: {username}' +``` + +- **Explanation**: + - The `@app.route('/')` decorator binds the URL `'/'` to the `home` function, which returns 'Home Page'. + - The `@app.route('/about')` decorator binds the URL `/about` to the `about` function. + - The `@app.route('/user/')` decorator binds the URL `/user/` to the `show_user_profile` function, capturing the part of the URL as the `username` variable. + +## 6. Templates and Static Files +### Jinja2 Templating Engine +Jinja2 is Flask's templating engine. Templates are HTML files that can include dynamic content. + +- **Create a Template**: + - `app/templates/index.html`: + ```html + + + + {{ title }} + + +

{{ heading }}

+

{{ content }}

+ + + ``` + +- **Render the Template**: + ```python + from flask import Flask, render_template + + app = Flask(__name__) + + @app.route('/') + def home(): + return render_template('index.html', title='Home', heading='Welcome to Flask', content='This is a Flask application.') + ``` + +### Serving Static Files +Static files like CSS, JavaScript, and images are placed in the `static` directory. + +- **Create Static Files**: + - `app/static/style.css`: + ```css + body { + font-family: Arial, sans-serif; + } + ``` + +- **Include Static Files in Templates**: + ```html + + + + {{ title }} + + + +

{{ heading }}

+

{{ content }}

+ + + ``` + +## 7. Working with Forms +### Handling Form Data +Forms are used to collect user input. Flask provides utilities to handle form submissions. + +- **Create a Form**: + - `app/templates/form.html`: + ```html + + + + Form + + +
+ + + +
+ + + ``` + +- **Handle Form Submission**: + ```python + from flask import Flask, request, render_template + + app = Flask(__name__) + + @app.route('/form') + def form(): + return render_template('form.html') + + @app.route('/submit', methods=['POST']) + def submit(): + name = request.form['name'] + return f'Hello, {name}!' + ``` + +- **Explanation**: + - The `@app.route('/form')` route renders the form. + - The `@app.route('/submit', methods=['POST'])` route handles the form submission and displays the input name. + +## 8. Database Integration +### Setting Up SQLAlchemy +SQLAlchemy is an ORM that allows you to interact with databases using Python objects. + +- **Install SQLAlchemy**: + ``` + pip install flask_sqlalchemy + ``` + +- **Configure SQLAlchemy**: + - `app/__init__.py`: + ```python + from flask import Flask + from flask_sqlalchemy import SQLAlchemy + + db = SQLAlchemy() + + def create_app(): + app = Flask(__name__) + app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///site.db' + db.init_app(app) + return app + ``` + +### Performing CRUD Operations +Define models and perform CRUD operations. + +- **Define a Model**: + - `app/models.py`: + ```python + from app import db + + class User(db.Model): + id = db.Column(db.Integer, primary key=True) + username = db.Column(db.String(80), unique=True, nullable=False) + + def __repr__(self): + return f'' + ``` + +- **Create the Database**: + ```python + from app import create_app, db + from app.models import User + + app = create_app() + with app.app_context(): + db.create_all() + ``` + +- **Perform CRUD Operations**: + ```python + from app import db + from app.models import User + + # Create + new_user = User(username='new_user') + db.session.add(new_user) + db.session.commit() + + # Read + user = User.query.first() + + # Update + user.username = 'updated_user' + db.session.commit() + + # Delete + db.session.delete(user) + db.session.commit() + ``` + +## 9. Error Handling +Error handling in Flask can be managed by defining error handlers for different HTTP status codes. + +- **Define an Error Handler**: + ```python + from flask import Flask, render_template + + app = Flask(__name__) + + @app.errorhandler(404) + def page_not_found(e): + return render_template('404.html'), 404 + + @app.errorhandler(500) + def internal_server_error(e): + return render_template('500.html'), 500 + ``` + + - **Create Error Pages**: + `app/templates/404.html`: + + + + + Page Not Found + + +

404 - Page Not Found

+

The page you are looking for does not exist.

+ + + + + + - **app/templates/500.html:** + + + + + Internal Server Error + + +

500 - Internal Server Error

+

Something went wrong on our end. Please try again later.

+ + + + +## 10. Testing Your Application +Flask applications can be tested using Python's built-in `unittest` framework. + +- **Write a Test Case**: + - `tests/test_app.py`: + ```python + import unittest + from app import create_app + + class BasicTestCase(unittest.TestCase): + def setUp(self): + self.app = create_app() + self.app.config['TESTING'] = True + self.client = self.app.test_client() + + def test_home(self): + response = self.client.get('/') + self.assertEqual(response.status_code, 200) + self.assertIn(b'Hello, World!', response.data) + + if __name__ == '__main__': + unittest.main() + ``` + + - **Run the Tests**: + ``` + python -m unittest discover -s tests + ``` + +## 11. Deploying Your Flask Application +### Using Gunicorn +Gunicorn is a Python WSGI HTTP Server for UNIX. It’s a pre-fork worker model, meaning that it forks multiple worker processes to handle requests. + +- **Install Gunicorn**: + ``` + pip install gunicorn + ``` + +- **Run Your Application with Gunicorn**: + ``` + gunicorn -w 4 run:app + ``` + +### Deploying to Render +Render is a cloud platform for deploying web applications. + +- **Create a `requirements.txt` File**: + ``` + Flask + gunicorn + flask_sqlalchemy + ``` + +- **Create a `render.yaml` File**: + ```yaml + services: + - type: web + name: my-flask-app + env: python + plan: free + buildCommand: pip install -r requirements.txt + startCommand: gunicorn -w 4 run:app + ``` + +- **Deploy Your Application**: + 1. Push your code to a Git repository. + 2. Sign in to Render and create a new Web Service. + 3. Connect your repository and select the branch to deploy. + 4. Render will automatically use the `render.yaml` file to configure and deploy your application. + +## 12. Conclusion +Flask is a powerful and flexible framework for building web applications in Python. It offers simplicity and ease of use, making it a great choice for both beginners and experienced developers. This guide covered the basics of setting up a Flask application, routing, templating, working with forms, integrating databases, error handling, testing, and deployment. + +## 13. Further Reading and Resources +- Flask Documentation: https://flask.palletsprojects.com/en/latest/ +- Jinja2 Documentation: https://jinja.palletsprojects.com/en/latest/ +- SQLAlchemy Documentation: https://docs.sqlalchemy.org/en/latest/ +- Render Documentation: https://render.com/docs diff --git a/contrib/web-scrapping/index.md b/contrib/web-scrapping/index.md new file mode 100644 index 00000000..276014ea --- /dev/null +++ b/contrib/web-scrapping/index.md @@ -0,0 +1,4 @@ +# List of sections + +- [Section title](filename.md) +- [Introduction to Flask](flask.md) diff --git a/hello.py b/hello.py new file mode 100644 index 00000000..7df869a1 --- /dev/null +++ b/hello.py @@ -0,0 +1 @@ +print("Hello, World!")