0% found this document useful (0 votes)
11 views

Singh_Project1_Report

Uploaded by

aarushi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Singh_Project1_Report

Uploaded by

aarushi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Introduction to Analytics 1

College of Professional Studies

Northeastern University

ALY 6000 Introduction to Analytics

Module 1

Project 1

By: Aarushi Singh

Professor: Kayal Chandrasekaran

January 1
Introduction to Analytics 2
Introduction

This project aims to demonstrate the application of fundamental data manipulation and statistical analysis

techniques in R. The key areas of focus include performing arithmetic and logical calculations, vector

operations, and data extraction techniques. Additionally, the project involved analyzing a dataset

(ds_salaries.csv) to gain insights into salary data using various R functions. Multiple visualizations and

statistical measures are incorporated where applicable to support findings and recommendations.

1. Clean Up Canvas and Console:

The project starts by clearing the console, to ensure a clean working environment. This is done using command

such as `cat(“\014")`.

2. Clean Up Variables:

Variables and plots are cleared to reset the workspace.

3. Load Libraries:

While this section is not explicitly covered in the code provided, loading required libraries (such as `testthat`,

‘tidyverse’ or `pacman`) would typically be performed in this part of the report.

4. Perform Calculations:

Basic arithmetic operations are performed, such as multiplication, squaring numbers, and logical operations like

AND, OR, and modulo. Results are computed for various expressions, which are documented and explained.
Introduction to Analytics 3
Problem 1. Write lines of code to compute all of the following. Include the answers in your
written report.
123 * 453
5^2 * 40
TRUE & FALSE
TRUE | FALSE
75 %% 10
75 / 10

5. Create Vectors:

A series of vectors are created using the `c()` function and the colon operator `:`. These vectors include various

numbers and ranges, and operations are performed on them.

Problem 2. Create a vector using the c function with the values 17, 12, -33, 5 and assign it to a
variable called first_vector.

Problem 3. Create a vector using the c function with the values 5, 10, 15, 20, 25, 30, 35 and
assign it to a variable called counting_by_fives.

Problem 4. Create a vector using the range operator (the colon), that contains the numbers
from 20 down to 1 . Store the result in a variable called second_vector.
Introduction to Analytics 4
Problem 5. Create a vector using the range operator that contains the number from 5 to 15.
Store the result in a variable called counting_vector

Problem 6. Create a vector with the values (96, 100, 85, 92, 81, 72). Store the result in a variable
called grades

Problem 7. Add the number 3 to the vector grades. Store the result in a variable called
bonus_points_added.

Problem 8. Create a vector with the values 1 – 100 and store it in a variable called
one_to_one_hundred. Do not type out all 100 numbers.

Problem 9. Write each of the following lines of code. Add a one-sentence comment above each
line explaining what is computed. Include your comments in the written report.
second_vector + 20
second_vector * 20
second_vector >= 20
second_vector != 20 # != means "not equal"

Problem 10. Using the built in sum function, compute the sum of one_to_one_hundred. Store
the result in a variable called total.
Introduction to Analytics 5

Problem 11. Using the built in mean function, compute the average of one_to_one_hundred. Store the result in
a variable called average_value

Problem 12. Using the built in median function, compute the average of one_to_one_hundred.
Store the result in a variable called median_value

Problem 13. Using the built in max function, compute the max of one_to_one_hundred. Store
the result in a variable called max_value

Problem 14. Using the built in min function, compute the min of one_to_one_hundred. Store the
result in a variable called min_value

6. Data Extraction:

Vector elements are extracted using indexing techniques. The data is filtered based on logical conditions, and

subsets are created based on specific criteria, such as extracting elements larger than a certain value.

Problem 15. Using brackets, extract the first value from second_vector and store it in a variable
called first_value
Introduction to Analytics 6
Problem 16. Using brackets, extract the first, second and third valuevalues from second_vector. Store the result
in a variable called first_three_values.

Problem 17. Using brackets, extract the 1st, 5th, 10th, and 11th elements of second_vector. Store
the resulting vector in a variable called vector_from_brackets.

Problem 18. Use the brackets to extract elements from first_vector using the following vector
c(FALSE, TRUE, FALSE, TRUE). Store the result in a variable called
vector_from_boolean_brackets.
Explanation - The logical vector c(FALSE, TRUE, FALSE, TRUE) helps us choose specific elements from
first_vector. It picks the elements in the 2nd and 4th positions, where the logical vector has TRUE values. The
selected elements are saved in vector_from_boolean_brackets.

Problem 19. Examine the following piece of code and write a one sentence comment explaining
what is happening. Include the answer in your written report.
Explanation - This code checks each item in the second_vector to see if it is 10 or higher. It then creates a list
that shows TRUE for items that meet this condition and FALSE for those that do not.
second_vector >= 10

Problem 20. Examine the following piece of code and write a one sentence comment explaining
what is happening and assuming one_to_one_hundredwas computed in the
previous problem. Include the answers in your written report.
Explanation - This code lters the elements of one_to_one_hundred and returns only the values that are
greater than or equal to 20.
one_to_one_hundred[one_to_one_hundred >= 20]

Problem 21. Using the same approach as in the previous question, create a new vector from the
grades vector with only values larger than 85. Store the result in a variable called
fi
Introduction to Analytics 7
lowest_grades_removed.

Problem 22. Use the grades vector to create a new vector with the 3rd and 4th elements of
grades removed. Store the result in a variable called middle_grades_removed. Try
utilizing a vector of negative indexes to complete this task.

Problem 23. Use bracket notation to remove the 5th and 10th elements of second_vector. Store
the result in a variable called fifth_vector.

7. Random Vector Analysis:

A random vector is generated using the `runif()` function, and various statistical operations are performed on it,

such as calculating the sum, cumulative sum, mean, standard deviation, rounding, and sorting.

Problem 24. Write the following code. This creates a variable called random_vector that will be
utilized in problems 25 - 30.
set.seed(5)
random_vector <- runif(n=10, min = 0, max = 1000)

Problem 25. Use the sum function to compute the total of random_vector. Store the result in a
variable called sum_vector

Problem 26. Use the cumsum function to compute the cumulative sum of random_vector. Store
the result in a variable called called cumsum_vector
Introduction to Analytics 8
Problem 27. Use the mean function to compute the mean of random_vector. Store the result in a
variable called mean_vector

Problem 28. Use the sd function to compute the standard deviation of random_vector. Store the
result in a variable called sd_vector

Problem 29. Use the round function to round the values of random_vector Store the result in a
variable called round_vector

Problem 30. Use the sort function to sort the values of random_vector. Store the result in a
variable called sort_vector

8. Data Analysis with `ds_salaries.csv`:

This section involves reading and analyzing a CSV file (`ds_salaries.csv`) containing salary data. The dataset is

read using `read.csv()`, and summary statistics are generated using the `summary()` function to understand the

structure and distribution of the data.

Problem 31. Download the datafile ds_salaries.csv from Canvas. Save it on your computer in the
same folder (directory) where your .R file for this project is located.
Introduction to Analytics 9
Problem 32. Use the function read.csv to read the ds_salaries.csv file. Store the result of the
read into a variable called first_dataframe.
Introduction to Analytics 10
Introduction to Analytics 11
Problem 33. Use the summary function with first_dataframe to produce summary statistics
based on each column of the data frame.

9. Clean Up Variables at End:

Finally, the variables are cleared using `rm(list=ls())` to ensure the workspace is clean at the end of the session.

Conclusion

This R script provides a hands-on introduction to essential data manipulation and analysis techniques, including

vector creation, arithmetic operations, logical tests, and data extraction. It covers key statistical functions like

sum, mean, median, max, and min, and demonstrates how to generate and analyze random data. Additionally,

the script showcases how to import and analyze external datasets, generating summary statistics to derive

insights. By the end of the script, the workspace is cleared, ensuring a clean environment for future tasks. This

exercise equips users with foundational skills in data analysis using R.


Introduction to Analytics 12
Bibliography

Bluman, A. G. (2018). Elementary statistics: a step by step approach. McGraw-Hill Education.

https://youtu.be/KlsYCECWEWE?si=pghbQqb9kNEosTTe

Kabacoff, R. (2015). R in action: data analysis and graphics with R. Manning

You might also like