✅ Part 1: Get the Data
Steps:
Fork and clone the repo:
bash
git clone https://github.com/your-username/forage-walmart-task-4
cd forage-walmart-task-4
You should see:
A data/ folder with Spreadsheet 0, 1, 2 (probably CSV or XLSX).
An SQLite database (likely walmart.db or similar).
Part 2: Populate the Database
Files Overview:
Spreadsheet 0: Direct data insert (likely a locations, products, or customers
table).
Spreadsheet 1: Shipment items — each row = a product in a shipment.
Spreadsheet 2: Shipping metadata — contains shipment_id, origin, destination, etc.
Python Script Template (populate_db.py)
import sqlite3
import pandas as pd
# Paths
DB_PATH = 'walmart.db'
SPREADSHEET_0 = 'data/spreadsheet_0.csv'
SPREADSHEET_1 = 'data/spreadsheet_1.csv'
SPREADSHEET_2 = 'data/spreadsheet_2.csv'
# Connect to the database
conn = sqlite3.connect(DB_PATH)
cur = conn.cursor()
# === Step 1: Insert data from Spreadsheet 0 ===
df0 = pd.read_csv(SPREADSHEET_0)
df0.to_sql('some_table_name', conn, if_exists='append', index=False) # Update
'some_table_name'
# === Step 2: Load spreadsheet 1 and 2 ===
df1 = pd.read_csv(SPREADSHEET_1) # product-level shipment data
df2 = pd.read_csv(SPREADSHEET_2) # shipment meta (origin/destination)
# === Step 3: Merge shipment metadata ===
merged_df = df1.merge(df2, on='shipment_id')
# === Step 4: Insert into shipments and shipment_products ===
for _, row in merged_df.iterrows():
# Insert or get origin location ID
cur.execute("SELECT id FROM Location WHERE name = ? AND zip_code = ?",
(row['origin_name'], row['origin_zip']))
origin = cur.fetchone()
if not origin:
cur.execute("INSERT INTO Location (name, zip_code) VALUES (?, ?)",
(row['origin_name'], row['origin_zip']))
origin_id = cur.lastrowid
else:
origin_id = origin[0]
# Insert or get destination location ID
cur.execute("SELECT id FROM Location WHERE name = ? AND zip_code = ?",
(row['destination_name'], row['destination_zip']))
dest = cur.fetchone()
if not dest:
cur.execute("INSERT INTO Location (name, zip_code) VALUES (?, ?)",
(row['destination_name'], row['destination_zip']))
dest_id = cur.lastrowid
else:
dest_id = dest[0]
# Insert shipment
cur.execute("""
INSERT OR IGNORE INTO Shipment (shipment_id, origin_location_id,
destination_location_id, shipment_date)
VALUES (?, ?, ?, ?)
""", (row['shipment_id'], origin_id, dest_id, row['shipment_date']))
# Insert product-shipment link
cur.execute("SELECT id FROM Product WHERE name = ?", (row['product_name'],))
product_id = cur.fetchone()[0]
cur.execute("""
INSERT INTO ShipmentProduct (shipment_id, product_id, quantity)
VALUES (?, ?, ?)
""", (row['shipment_id'], product_id, row['quantity']))
# Commit and close
conn.commit()
conn.close()
print("Database populated successfully.")
Notes:
You might need to update the table names (some_table_name, etc.) to match the
actual DB schema.
Use pandas.read_excel() if the files are .xlsx.
Always inspect the first few rows with df.head() to understand column structure.