Managing Data:
Compiled for Celestino
Date: 01 August 2025
Learning Objectives
- Understand the end-to-end data lifecycle, from collection to archiving.
- Differentiate between raw data and meaningful information.
- Identify and apply appropriate data types and structures.
- Explore database models and basic SQL operations.
- Utilize spreadsheet tools for validation, analysis, and reporting.
- Recognize the importance of data quality, security, and ethics.
Data vs Information
Data are unprocessed facts or measurements, such as a list of temperatures,
student names, or survey responses.
Information is the result of processing, organizing, or interpreting data to provide
context and meaning—for example, a weekly temperature trend chart or a summary
report of survey findings.
Data Lifecycle
1. Collection
- Manual entry via forms, sensors, or automated imports.
2. Storage
- Files (CSV, JSON) or databases (relational, NoSQL).
3. Processing
-Cleaning, transforming, aggregating.
4. Analysis
- Statistical functions, charts, SQL queries.
5. Archiving & Deletion
- Backup, retention policies, secure disposal.
Data Types
| Categories Type | Description | Example |
|-------------|-----------|------------
| Primitive | Text | Alphanumeric characters | “Kenya”, “A123” |
| | Number | Integers or decimals | 42, 3.14 |
| | Boolean | True/False | TRUE, FALSE |
| Composite | Date/Time | Dates, timestamps | 2024-02-15, 14:30 |
| | Binary | Raw bytes (files, images, audio) | JPEG, MP3 |
| Structured | List | Ordered collection of items | [“apple”, “banana”] |
| | Record | Grouping of related fields | {Name: “Asha”, Age: 12} |
Data Structures
- Field: a single attribute (column) in a dataset, e.g., “Price”.
- Record: a complete row combining fields, e.g., one product entry.
- Table: a collection of records sharing the same fields.
- Index: a lookup table to speed up searches by one or more fields.
Relational Databases
- Tables store structured data; relationships link tables via keys.
- Primary Key uniquely identifies records (e.g., StudentID).
- Foreign Key references primary keys in other tables.
- Normalization reduces redundancy (1NF, 2NF, 3NF).
SQL Commands
| Command | Purpose | Example |
|---------|-----------------------------
| SELECT | Retrieve specific fields from a table eg. SELECT Name, Age FROM
Students;
| WHERE | Filter records by condition eg. SELECT * FROM Sales WHERE Amount >
100; |
JOIN | Combine rows from two tables based on a key eg. SELECT * FROM
Orders JOIN Customers;
INSERT | Add a new record | INSERT INTO Products VALUES (...); |
UPDATE Modify existing records eg UPDATE Inventory SET Qty = Qty - 1; |
Spreadsheets
- Cells arranged in rows (1, 2, 3…) and columns (A, B, C…).
- Absolute vs relative references ($A$1 vs A1).
- Named ranges simplify formulas.
Key Functions
- SUM(range) – totals values.
- AVERAGE(range) – computes mean.
- VLOOKUP(key, table, col, FALSE) – finds matching data.
- COUNTIF(range, criteria) – counts entries meeting criteria.
- IF(condition, true-value, false-value) – conditional logic.
Advanced Features
- PivotTables for dynamic summarization.
- Conditional formatting to highlight patterns.
- Data import from external sources (CSV, web queries).
Data Validation Techniques
- Restrict cell values by type (whole number, date, text length).
- Custom formulas to enforce complex rules, e.g.:
- =AND(A2>=1, A2<=100) for score ranges.
- =NOT(COUNTIF($B$2:$B$100, B2)>1) to prevent duplicates.
- Dropdown lists for controlled vocabulary.
- Error alerts and input messages guide users.
Data Quality and Ethics
- Accuracy: ensure data is correct and up-to-date.
- Completeness: no missing critical fields.
- Consistency: uniform formats across records.
- Timeliness: data available when needed.
Ethical Considerations
- Obtain consent before collecting personal data.
- Anonymize sensitive information (e.g., names IDs). - Secure storage and controlled
access (passwords, encryption).
- Comply with local regulations (e.g., GDPR principles).
Importance of Data Management
Effective data management delivers:
- Enhanced Visibility: users can quickly find and trust data.
- Increased Reliability: standard processes minimize errors.
- Stronger Security: encryption and access controls protect assets.
- Improved Efficiency: streamlined workflows reduce duplication.
- Better Compliance: policies ensure regulatory adherence.
- Scalability: repeatable procedures support data growth.
Effects of Data Loss
Data loss can lead to:
- Financial Loss: recovery costs and lost revenue.
- Reputational Damage: erosion of customer trust.
- Operational Disruption: downtime and productivity setbacks. - Legal Consequences:
fines and litigation for non-compliance.
- Business Closure: small firms may not survive severe breaches.
Data Security and Backup
- User authentication and role-based permissions.
- Encryption at rest (database files) and in transit (HTTPS).
- Regular backups: full, incremental, and archival copies.
- Audit logs to track who accessed or modified data.
Data Analysis and Reporting
- Use charts (bar, line, scatter) to visualize trends.
- Calculate descriptive statistics: mean, median, mode, standard deviation.
- Perform “what-if” analysis with data tables and scenario manager.
- Generate automated reports using scripts or macros.
Real-World Examples
- School attendance tracker: collect daily marks, validate dates, summarize by
month.
- Inventory management: relational tables for products, suppliers, sales; reorder
alerts via SQL queries.
- Health survey: CSV data cleaned in spreadsheet, pivoted by age group, visualized
with pie charts.
Activities and Exercises
1. Design a small relational database on paper for a library:
- Tables: Books, Members, Loans.
- Identify fields, primary keys, and relationships.
2. Build a spreadsheet to track fitness data:
- Collect daily steps, validate with whole-number rule.
- Create a line chart showing weekly progress.
3. Write SQL queries against sample data:
- List all orders above a threshold.
- Join customers with their purchase history.
4. Implement advanced validation in a spreadsheet: