For the Amazon ML Summer School assessment, the programming questions will likely focus on
implementing and understanding key machine learning concepts. Here are some example questions
and tasks you might encounter:
Example Programming Questions
Linear Regression Implementation
Task: Implement a simple linear regression model from scratch without using any machine learning
libraries.
Input: A dataset with input features and target values.
Output: Predicted target values for a test dataset.
Example:
python
Copy code
import numpy as np
def linear_regression(X, y):
# Adding a column of ones for the intercept term
X = np.hstack((np.ones((X.shape[0], 1)), X))
# Calculating weights using the Normal Equation
weights = np.linalg.inv(X.T @ X) @ X.T @ y
return weights
def predict(X, weights):
X = np.hstack((np.ones((X.shape[0], 1)), X))
return X @ weights
# Example usage
X_train = np.array([[1, 2], [2, 3], [3, 4]])
y_train = np.array([2, 3, 4])
weights = linear_regression(X_train, y_train)
X_test = np.array([[4, 5], [5, 6]])
predictions = predict(X_test, weights)
print(predictions)
Decision Tree Classifier
Task: Implement a decision tree classifier for a given dataset.
Input: A dataset with features and labels.
Output: Predictions for a test dataset.
Example:
python
Copy code
from sklearn.tree import DecisionTreeClassifier
def decision_tree_classifier(X_train, y_train, X_test):
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
return clf.predict(X_test)
# Example usage
X_train = [[0, 0], [1, 1], [0, 1], [1, 0]]
y_train = [0, 1, 1, 0]
X_test = [[0, 0], [1, 1]]
predictions = decision_tree_classifier(X_train, y_train, X_test)
print(predictions)
K-means Clustering
Task: Implement the K-means clustering algorithm.
Input: A dataset and the number of clusters (K).
Output: Cluster assignments for each data point.
Example:
python
Copy code
import numpy as np
def kmeans(X, k, max_iters=100):
centroids = X[np.random.choice(X.shape[0], k, replace=False)]
for _ in range(max_iters):
clusters = [np.argmin([np.linalg.norm(x - centroid) for centroid in centroids]) for x in X]
new_centroids = [X[np.array(clusters) == i].mean(axis=0) for i in range(k)]
if np.all(centroids == new_centroids):
break
centroids = new_centroids
return clusters
# Example usage
X = np.array([[1, 2], [2, 3], [3, 4], [8, 9], [9, 10], [10, 11]])
clusters = kmeans(X, 2)
print(clusters)
Principal Component Analysis (PCA)
Task: Implement PCA for dimensionality reduction.
Input: A dataset and the number of principal components.
Output: Transformed dataset with reduced dimensions.
Example:
python
Copy code
import numpy as np
def pca(X, n_components):
X_mean = np.mean(X, axis=0)
X_centered = X - X_mean
covariance_matrix = np.cov(X_centered, rowvar=False)
eigenvalues, eigenvectors = np.linalg.eigh(covariance_matrix)
sorted_indices = np.argsort(eigenvalues)[::-1]
sorted_eigenvectors = eigenvectors[:, sorted_indices]
principal_components = sorted_eigenvectors[:, :n_components]
return X_centered @ principal_components
# Example usage
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
X_pca = pca(X, 1)
print(X_pca)
Text Preprocessing for NLP
Task: Preprocess text data for further analysis.
Input: A list of text documents.
Output: Cleaned and tokenized text.
Example:
python
Copy code
import re
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
def preprocess_text(texts):
stop_words = set(stopwords.words('english'))
ps = PorterStemmer()
processed_texts = []
for text in texts:
text = re.sub(r'\W', ' ', text)
text = re.sub(r'\s+', ' ', text)
tokens = word_tokenize(text)
tokens = [ps.stem(word) for word in tokens if word not in stop_words]
processed_texts.append(tokens)
return processed_texts
# Example usage
texts = ["Hello, world! This is a test.", "Preprocessing text is important."]
processed_texts = preprocess_text(texts)
print(processed_texts)
Preparation Tips
Understand the Fundamentals: Make sure you have a solid understanding of the basic algorithms
and concepts in machine learning.
Practice Coding: Implement algorithms from scratch to deepen your understanding. Use libraries like
NumPy and pandas for data manipulation.
Work on Real Datasets: Use platforms like Kaggle to practice with real-world datasets and improve
your data preprocessing and model training skills.
Review Python Libraries: Familiarize yourself with machine learning libraries such as scikit-learn,
TensorFlow, and PyTorch.
By practicing these types of questions and understanding the underlying concepts, you'll be well-
prepared for the assessment. Good luck!