Python Assignment: Practical Automation and Data Handling
Objective:
This assignment aims to test your skills in web scraping, automation, and data handling using
Python. You will complete three distinct tasks that involve building a web scraping tool, automating a
KPI dashboard, and automating the download and processing of reports.
Problem 1: Web Scraping Tool
Task:
Create a Python script that scrapes data from a specified website and stores it in a structured format
(CSV or JSON).
Requirements:
1. Website Selection:
- Choose a publicly accessible website that lists products, articles, or any structured data (e.g.,
e-commerce site, news site, or job listings).
2. Data Points to Scrape:
- Select at least five data points to scrape for each item (e.g., title, price, date, URL, description).
3. Pagination Handling:
- The script should handle pagination if the website displays data across multiple pages.
4. Error Handling:
- Implement basic error handling for cases like connection issues or missing data.
5. Data Storage:
- Save the scraped data into a CSV or JSON file.
Deliverables:
Python script (`web_scraper.py`) with comments explaining the code.
A sample output file (`scraped_data.csv` or `scraped_data.json`).
Problem 2: Automating a KPI Dashboard
Task:
Develop a Python script that automates the creation of a Key Performance Indicator (KPI)
dashboard using a given dataset.
Requirements:
1. Dataset:
- Use the CSV file sent which has sales data
2. KPIs to Calculate:
- Calculate KPIs on a yearly basis, such as:
- Total sales/revenue per category
- Return on Marketing Spend per category
- Average order value per category
3. Data Visualization:
- Use a Python library like `matplotlib` or `plotly` to create visual representations (bar charts, line
graphs, etc.) of the calculated KPIs.
4. Dashboard Output:
- The script should generate a PDF or HTML file that includes the visualized KPIs, along with the
data used for calculations.
5. Automation:
- Set up the script to run automatically at specified intervals (e.g., daily, weekly) and save the
output to a specified directory.
Deliverables:
Python script (`kpi_dashboard.py`) with comments explaining the code.
Sample output file (`kpi_dashboard.pdf` or `kpi_dashboard.html`).
A brief description of how to schedule the script using a task scheduler (like `cron` on Linux or Task
Scheduler on Windows).
Problem 3: Automating Report Download and Data Processing
Task:
Create a Python script that automates the downloading of a report from a given URL, extracts data
from the report, and uploads the processed data to a Google Sheet.
Requirements:
1. Report Download:
- Use Python to download a report (e.g., an Excel or CSV file) from a specified URL.
2. Data Extraction:
- Extract specific data points from the downloaded report (e.g., total sales, top products, etc.).
3. Data Processing:
- Process the extracted data (e.g., calculate totals, filter by a certain criteria).
4. Google Sheets Integration:
- Use the Google Sheets API to create a new sheet (or update an existing one) with the processed
data.
5. Automation:
- Automate the entire process, so the script runs at regular intervals (e.g., daily or weekly) to
update the Google Sheet.
Deliverables:
Python script (`automate_report.py`) with comments explaining the code.
Instructions for setting up Google Sheets API access, including how to create credentials and share
the sheet.
A brief description of how to schedule the script using a task scheduler.
Evaluation Criteria:
1. Correctness and Efficiency: Scripts should perform the tasks as expected with efficient use of
resources.
2. Code Quality: Clean, readable code with proper documentation and comments.
3. Error Handling: Adequate handling of exceptions and potential issues (e.g., network errors,
missing data).
4. Automation Setup: Proper setup and explanation of how to automate the scripts.
5. Data Handling: Accurate and effective processing and visualization of data.