0% found this document useful (0 votes)
23 views

Computational

Computational past papers
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Computational

Computational past papers
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

1.

Functions of the three Python packages


(NumPy, Pandas, MatPlotLib) - 6 marks

NumPy:

- Array Operations:

Provides support for large multi-dimensional arrays and matrices, along with a large library of high-level
mathematical functions to operate on these arrays.

- Mathematical Functions:

Includes functions for operations like statistical analysis, linear algebra, Fourier transforms, and random
number generation.

- Efficiency:

Optimized for performance, allowing operations on arrays to be performed much faster than with
standard Python lists.

Pandas:

- Data Structures:

Introduces data structures like Series (one-dimensional) and Data Frame (two-dimensional) for efficient
data manipulation and analysis.

- Data Manipulation:

Provides tools for data cleaning, merging, reshaping, and filtering.

- Handling Missing Data:

Includes functions to handle missing data, such as filling or dropping null values.

Matplotlib:

- Plotting:

Provides a comprehensive library for creating static, animated, and interactive visualizations in Python.

- Customization:

Allows for extensive customization of plots, including control over line styles, font properties, and more.

- Integration:

Works well with other libraries like NumPy and Pandas, enabling easy plotting of data stored in these
structures.
2. Describe what the following command does - 3 marks

x <- 3 if(x>2) y else y <- 3*x

This command contains a logical error. In R, if statements require a condition and two separate
commands for the if and else clauses. The correct form should use proper syntax such as:

x <- 3

if(x > 2) {

y <- y

} else {

y <- 3 * x

In the corrected command:

- x is assigned the value 3.

- The if condition checks if x is greater than 2. Since x is 3, the condition is true.

- If true, y is supposed to be assigned a value. However, y is not defined, so this will result in an error
unless y has been defined previously.

3. State and describe five types of data representation in a computer - 5 marks

a. Binary (Machine Code):

The most basic form of data representation, using binary digits (0s and 1s) to represent all types of data.

b. Text (ASCII/Unicode):
Characters are represented using standards like ASCII or Unicode, allowing text data to be
encoded in a binary format.
c. Integer:
Whole numbers represented in binary form, either as signed or unsigned integers.
d. Floating-point:
Numbers with fractional parts, represented using a specific format (like IEEE 754) to encode the
value in binary.
e. Boolean:
Logical data that can be either true or false, often represented as 1 or 0 in binary.

4. Explain the difference between supervised and unsupervised learning - 4 marks

Supervised Learning:

- Definition: Involves training a model on a labeled dataset, where the correct output is known for each
training example.

- Purpose: Used for tasks like classification and regression where the goal is to predict an output based
on input data.

- Example: Predicting house prices based on features like size, location, and number of rooms.

Unsupervised Learning:

- Definition: Involves training a model on an unlabeled dataset, where the output is not provided, and
the model tries to find patterns or structures in the data.

- Purpose: Used for tasks like clustering and dimensionality reduction.

- Example: Grouping customers into segments based on purchasing behavior.

5. Differentiate between overfitting and underfitting in data models - 4 marks

Overfitting:

- Definition: Occurs when a model learns the training data too well, including noise and outliers, leading
to poor performance on unseen data.

- Symptoms: High accuracy on training data but low accuracy on test data.

- Solution: Use techniques like cross-validation, pruning, regularization, and simplifying the model.

Underfitting:

- Definition: Occurs when a model is too simple to capture the underlying patterns in the data, leading
to poor performance on both training and test data.

- Symptoms: Low accuracy on both training and test data.


- Solution: Use more complex models, adding features and reducing bias.

6. Briefly describe any three problem-solving strategies - 6 marks

a. Divide and Conquer:

- Approach: Break down a large problem into smaller, more manageable sub-problems, solve each sub-
problem individually, and then combine the solutions.

- Example: Sorting algorithms like Merge Sort and Quick Sort.

b. Dynamic Programming:

- Approach: Solve complex problems by breaking them down into simpler overlapping sub-problems
and storing the results of these sub-problems to avoid redundant computations.

- Example: Fibonacci sequence calculation, shortest path algorithms like Dijkstra's.

c. Greedy Algorithm:

- Approach: Make a series of choices by selecting the best option available at each step without
reconsidering previous choices.

- Example: Coin change problem, Kruskal’s algorithm for minimum spanning trees.

7. Define the following terms - 2 marks

Algorithm:

- Definition: A step-by-step procedure or formula for solving a problem, often expressed in pseudocode
or a programming language.

Debugging:

- Definition: The process of identifying, analyzing, and removing errors or bugs in a computer program to
ensure it runs as expected.

8. Write a Python code to create a data frame with appropriate headings from the list - 4 marks
Here's a Python example to create a DataFrame from a list of dictionaries:

python

import pandas as pd

# List of dictionaries

data = [

{'Name': 'Alice', 'Age': 25, 'City': 'New York'},

{'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},

{'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}

# Creating DataFrame

df = pd.DataFrame(data)

# Display DataFrame

print(df)

9. Environmental data analysis - 16 marks

Preprocessing Steps (5 marks):

a. Handling Missing Data:


Identify missing values and decide whether to fill them (imputation) or remove them. For
instance, using mean/mode for imputation or dropping rows/columns with excessive missing
data.
b. Outlier Detection:
Identify and handle outliers using statistical methods or visualization techniques like box plots.
c. Normalization/Standardization:
Normalize or standardize data to bring different features onto a similar scale, which can
improve the performance of many machine learning algorithms.
d. Encoding Categorical Data:
Convert categorical variables into numerical format using techniques like one-hot encoding.
e. Data Splitting:
Split the dataset into training and testing sets to validate the model's performance on unseen
data.

Correlation Analysis (4 marks):

a. Calculate Correlation Coefficients:


Use methods like Pearson, Spearman, or Kendall to calculate correlation coefficients between
industrial emissions and air quality metrics.
b. Visualize Correlation:
Create correlation matrices and heatmaps to visualize the relationships between different
variables.
c. Interpret Results:
Analyze the correlation coefficients to understand the strength and direction of the
relationships.

Variables Selection (2 marks):

- Industrial Emissions: Key variables might include emissions of specific pollutants like CO2, NOx, SOx.

- Air Quality Metrics: Include variables like PM2.5 levels, ozone levels, and other relevant air quality
indices.

- Reasoning: These variables are chosen because they directly measure the pollutants and air quality
levels which are necessary to assess the impact of industrial emissions.

Time Series Analysis ( 5 marks):

a. Decomposition: Decompose the time series data into trend, seasonal, and residual components to
understand the underlying patterns.

b. Visualization: Plot time series graphs to visualize trends, seasonal patterns, and anomalies over time.

c. Modeling: Apply time series models like ARIMA, SARIMA, or Exponential Smoothing to model and
forecast air quality trends.

d. Validation: Use techniques like cross-validation on time series data to ensure the model's accuracy.

e. Interpretation: Analyze the results to identify long-term trends, seasonal effects, and potential
impacts of industrial emissions on air quality.

10. Discuss the two sources of errors in computational methods - 4 marks


a. Truncation Error:

- Definition: Arises when an infinite process is approximated by a finite one, such as truncating an
infinite series or using a finite number of terms.

- Example: Approximating the value of π using a limited number of terms in its series representation.

b. Round-off Error:

- Definition: Occurs due to the finite precision with which computers represent real numbers, leading to
small discrepancies between the true value and its computer representation.

- Example: When performing arithmetic operations on floating-point numbers, the precision limits of the
hardware can introduce small errors that accumulate over multiple operations.

You might also like