0% found this document useful (0 votes)
2 views3 pages

pythonscript

python language
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

pythonscript

python language
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

✅ Part 1: Get the Data

Steps:
Fork and clone the repo:

bash

git clone https://github.com/your-username/forage-walmart-task-4


cd forage-walmart-task-4
You should see:

A data/ folder with Spreadsheet 0, 1, 2 (probably CSV or XLSX).

An SQLite database (likely walmart.db or similar).

Part 2: Populate the Database


Files Overview:
Spreadsheet 0: Direct data insert (likely a locations, products, or customers
table).

Spreadsheet 1: Shipment items — each row = a product in a shipment.

Spreadsheet 2: Shipping metadata — contains shipment_id, origin, destination, etc.

Python Script Template (populate_db.py)

import sqlite3
import pandas as pd

# Paths
DB_PATH = 'walmart.db'
SPREADSHEET_0 = 'data/spreadsheet_0.csv'
SPREADSHEET_1 = 'data/spreadsheet_1.csv'
SPREADSHEET_2 = 'data/spreadsheet_2.csv'

# Connect to the database


conn = sqlite3.connect(DB_PATH)
cur = conn.cursor()

# === Step 1: Insert data from Spreadsheet 0 ===


df0 = pd.read_csv(SPREADSHEET_0)
df0.to_sql('some_table_name', conn, if_exists='append', index=False) # Update
'some_table_name'

# === Step 2: Load spreadsheet 1 and 2 ===


df1 = pd.read_csv(SPREADSHEET_1) # product-level shipment data
df2 = pd.read_csv(SPREADSHEET_2) # shipment meta (origin/destination)

# === Step 3: Merge shipment metadata ===


merged_df = df1.merge(df2, on='shipment_id')
# === Step 4: Insert into shipments and shipment_products ===
for _, row in merged_df.iterrows():
# Insert or get origin location ID
cur.execute("SELECT id FROM Location WHERE name = ? AND zip_code = ?",
(row['origin_name'], row['origin_zip']))
origin = cur.fetchone()
if not origin:
cur.execute("INSERT INTO Location (name, zip_code) VALUES (?, ?)",
(row['origin_name'], row['origin_zip']))
origin_id = cur.lastrowid
else:
origin_id = origin[0]

# Insert or get destination location ID


cur.execute("SELECT id FROM Location WHERE name = ? AND zip_code = ?",
(row['destination_name'], row['destination_zip']))
dest = cur.fetchone()
if not dest:
cur.execute("INSERT INTO Location (name, zip_code) VALUES (?, ?)",
(row['destination_name'], row['destination_zip']))
dest_id = cur.lastrowid
else:
dest_id = dest[0]

# Insert shipment
cur.execute("""
INSERT OR IGNORE INTO Shipment (shipment_id, origin_location_id,
destination_location_id, shipment_date)
VALUES (?, ?, ?, ?)
""", (row['shipment_id'], origin_id, dest_id, row['shipment_date']))

# Insert product-shipment link


cur.execute("SELECT id FROM Product WHERE name = ?", (row['product_name'],))
product_id = cur.fetchone()[0]

cur.execute("""
INSERT INTO ShipmentProduct (shipment_id, product_id, quantity)
VALUES (?, ?, ?)
""", (row['shipment_id'], product_id, row['quantity']))

# Commit and close


conn.commit()
conn.close()
print("Database populated successfully.")

Notes:
You might need to update the table names (some_table_name, etc.) to match the
actual DB schema.

Use pandas.read_excel() if the files are .xlsx.


Always inspect the first few rows with df.head() to understand column structure.

You might also like