Ultimate Developer Command Guide
Python, PySpark & SQL Reference
Essential Commands for Developers
Python Commands
Command Function Example
print() Outputs data to console print("Welcome to Python!") # Prints:
Welcome to Python!
len() Returns length of an object my_list = [1, 2, 3]; print(len(my_list))
# Output: 3
range() Generates a sequence of numbers for i in range(3): print(i) # Prints: 0,
1, 2
def Defines a custom function def greet(name): return f"Hello,
{name}"; print(greet("Alice")) # Prints:
Hello, Alice
import Imports a module or library import math; print(math.pi) # Prints:
3.141592653589793
[x for x in Creates a list using comprehension squares = [x**2 for x in [1, 2, 3]];
iterable] print(squares) # Prints: [1, 4, 9]
if/elif/else Conditional logic x = 10; if x > 5: print("Big") else:
print("Small") # Prints: Big
for Iterates over a sequence for fruit in ["apple", "banana"]:
print(fruit) # Prints: apple, banana
while Loops until condition is false count = 0; while count < 3:
print(count); count += 1 # Prints: 0, 1,
2
try/except Handles exceptions try: print(1/0) except
ZeroDivisionError: print("Cannot divide
by zero") # Prints: Cannot divide by
zero
open() Opens a file for reading/writing with open("example.txt", "w") as f:
f.write("Hello") # Creates file with
text
list.append() Adds an item to a list my_list = []; my_list.append(5);
print(my_list) # Prints: [5]
dict.get() Retrieves value from dictionary my_dict = {"key": "value"};
print(my_dict.get("key")) # Prints:
value
PySpark Commands
Command/Function Function Example
SparkSession.builder Initializes a Spark from pyspark.sql import SparkSession; spark =
session SparkSession.builder.appName("MyApp").getOrCreate()
spark.read.csv() Loads CSV file into df = spark.read.csv("data.csv", header=True,
a DataFrame inferSchema=True); df.show() # Displays CSV data
df.show() Displays first n df.show(3) # Shows first 3 rows
rows of DataFrame
df.printSchema() Displays df.printSchema() # Shows column names and types
DataFrame schema
df.select() Selects specific df.select("name", "age").show() # Shows name and
columns age columns
df.filter() Filters rows based df.filter(df.age > 25).show() # Shows rows where
on condition age > 25
df.where() Alias for filter df.where("salary > 50000").show() # Filters rows
where salary > 50000
df.groupBy().agg() Groups data and df.groupBy("department").agg({"salary":
applies aggregation "avg"}).show() # Shows avg salary per dept
df.join() Joins two df1.join(df2, df1.id == df2.id, "inner").show() #
DataFrames Inner join on id
df.withColumn() Adds or modifies a df.withColumn("age_plus_10", df.age + 10).show() #
column Adds column with age + 10
df.withColumnRenamed() Renames a column df.withColumnRenamed("old_name", "new_name").show()
# Renames column
df.drop() Drops specified df.drop("salary").show() # Drops salary column
columns
df.fillna() Replaces null df.fillna({"age": 0}).show() # Replaces null ages
values with 0
df.dropDuplicates() Removes duplicate df.dropDuplicates(["name"]).show() # Drops
rows duplicate names
df.write.csv() Saves DataFrame df.write.csv("output.csv", mode="overwrite") #
as CSV Saves DataFrame to CSV
df.createOrReplaceTempView() Registers df.createOrReplaceTempView("temp_table") # Creates
DataFrame as SQL SQL view
table
spark.sql() Runs SQL query on spark.sql("SELECT name FROM temp_table WHERE age >
DataFrame 30").show() # Runs SQL query
Window.partitionBy() Defines window for from pyspark.sql.window import Window; w =
ranking/aggregation Window.partitionBy("dept").orderBy("salary");
df.withColumn("rank", row_number().over(w)).show()
# Adds rank column
SQL Commands
Command Function Example
SELECT Retrieves data from a table SELECT name, age FROM employees #
Selects name and age columns
WHERE Filters rows based on condition SELECT * FROM employees WHERE age > 30 #
Filters employees older than 30
ORDER BY Sorts result set SELECT * FROM employees ORDER BY salary
DESC # Sorts by salary in descending
order
GROUP BY Groups rows for aggregation SELECT department, AVG(salary) FROM
employees GROUP BY department # Avg
salary per dept
HAVING Filters grouped results SELECT department, COUNT(*) FROM
employees GROUP BY department HAVING
COUNT(*) > 5 # Depts with > 5 employees
JOIN Combines rows from multiple tables SELECT e.name, d.dept_name FROM
employees e JOIN departments d ON
e.dept_id = d.id # Joins tables
LEFT JOIN Includes all rows from left table SELECT e.name, d.dept_name FROM
employees e LEFT JOIN departments d ON
e.dept_id = d.id # Left join
LIMIT Restricts number of returned rows SELECT * FROM employees LIMIT 5 #
Returns first 5 rows
INSERT INTO Adds new rows to a table INSERT INTO employees (name, age) VALUES
('Alice', 28) # Inserts a new employee
UPDATE Modifies existing rows UPDATE employees SET salary = 60000
WHERE name = 'Alice' # Updates salary
DELETE Removes rows from a table DELETE FROM employees WHERE age < 18 #
Deletes rows where age < 18
CREATE TABLE Creates a new table CREATE TABLE employees (id INT, name
VARCHAR(50), age INT) # Creates
employees table
ALTER TABLE Modifies table structure ALTER TABLE employees ADD COLUMN salary
DECIMAL(10,2) # Adds salary column
DROP TABLE Deletes a table DROP TABLE employees # Deletes employees
table
Cheat Sheet Summary
Comprehensive reference for Python, PySpark, and SQL development tasks.
Version 2.0 | Updated: August 2024
Print Tip: Use Ctrl+P (Win) / Cmd+P (Mac) to save as PDF