Skip to content
#

pyspark-sql

Here are 48 public repositories matching this topic...

All updated cheat sheets regarding data science, data analysis provided by Datacamp are here. These cheat sheets cover quick reads on Machine Learning, Deep Learning, Python, R, SQL and more. Perfect cheat sheets when you want to revise some topics in less time.

  • Updated Dec 13, 2022

This code demonstrates how to integrate PySpark with datasets and perform simple data transformations. It loads a sample dataset using PySpark's built-in functionalities or reads data from external sources and converts it into a PySpark DataFrame for distributed processing and manipulation.

  • Updated Mar 31, 2025
  • Python

Generate a synthetic dataset with one million records of employee information from a fictional company, load it into a PostgreSQL database, create analytical reports using PySpark and large-scale data analysis techniques, and implement machine learning models to predict trends in hiring and layoffs on a monthly and yearly basis.

  • Updated Apr 29, 2025
  • Python

Inventory value is also important for determining a company's liquidity, or its ability to meet its short-term financial obligations. A high inventory value can indicate that a company has too much money tied up in inventory, which could make it difficult for the company to pay its bills.

  • Updated Oct 15, 2023
  • Jupyter Notebook

Network Intrusion Detector is a distributed intrusion detection system built with PySpark. It preprocesses, encodes, and models network traffic data to detect anomalies using a Random Forest classifier, achieving high accuracy and efficiency through feature selection and scalable data processing. The system is suitable for large-scale environments

  • Updated May 22, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the pyspark-sql topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark-sql topic, visit your repo's landing page and select "manage topics."

Learn more