Use the dataset CEOSAL1.
dta to answer the following questions
1. How many variables are there in the dataset? [STATA has a command to count variables.
It also shows the count somewhere. Search the command and use it]
2. List all the variables [STATA has a command to list variables. Search the command and
use it]
3. List salary and sales for the first 5 observations only
4. List salary and sales for the last 5 observations only
5. Report the mean and the number of observations for each variable
6. How many financial firms are included in the dataset?
7. How many non-financial firms are included in the dataset?
8. Which industry has the highest average sales value? Which command did you use?
9. Are there any missing observations in the dataset? How many? How do you know that?
10. What is the average salary of CEOs for those working in finance or utility firms?
11. What is the average salary of CEOs for those working in finance and utility firms?
12. Make a table with the number of financial and non-financial firms on the one hand and
the CEO salary on the other hand.
13. Plot a relationship between salary and sales (search and use the stata command for plot
graph)
14. Have a histogram graph for salary
15. Make a table of correlations for all tables. (search and use the stata command for
correlation)
16. What is the correlation between salary and sales?
17. What is the correlation between salary and finance?
18. Generate a dummy variable called “large” which is 1 if the CEO works in a firm whose
sales exceeds the average sales value in the dataset and 0 otherwise.
19. Label the variable “large” as “=1 if the firm is large”
20. How many financial firms are “large”?
21. What is the percentage financial firms in the “large” firms?
22. Find the percentage of financial firms and utility firms in the data-set
23. Generate a new variable called “benefit” which is 0 for those CEOs who earn a salary
below 4000, 1 for those who earn a salary between 4,000 and 10,000 (including 4,000),
2 for those who earn a salary above 10,000 (including 10,000)
24. Generate a variable called “logsalary” which is the natural logarism of the variable
“salary” and label it “log of salary”
25. Generate a variable called “logsalary” which is the natural logarism of the variable
“pcsalary” and label it “log of pcsalary”. How many missing observations do you have?
Why?
26. Generate a variable called “roesqr” which is the square of the variable “roe”
27. Generate a variable called “ssratio” which is salary divided by sales?
28. In the dataset the first variable is “salary”. Move this variable as last in the list.
29. Move the variable “ros” above the variable “roe”
30. Save your dataset with name CEOSAL1NEW.dta
use CEOSAL1.dta // Load the dataset
// 1. Count the number of variables in the dataset
ds // Display the dataset structure; it shows the number of variables
// 2. List all the variables
ds // Display all variables in the dataset
// 3. List salary and sales for the first 5 observations only
list salary sales in 1/5
// 4. List salary and sales for the last 5 observations only
list salary sales in -5/l
// 5. Report the mean and the number of observations for each variable
summarize
// 6. Count how many financial firms are included in the dataset
count if industry == "finance"
// 7. Count how many non-financial firms are included in the dataset
count if industry != "finance"
// 8. Find the industry with the highest average sales value
collapse (mean) sales, by(industry) // Computes the mean sales for each industry
// 9. Check for missing observations in the dataset
desc, count // Shows the count of missing observations for each variable
// 10. Calculate the average salary of CEOs for those working in finance or utility firms
summarize salary if industry == "finance" | industry == "utility"
// 11. Calculate the average salary of CEOs for those working in finance and utility firms
summarize salary if industry == "finance" & industry == "utility"
// 12. Make a table with the number of financial and non-financial firms along with CEO salary
tabulate industry, summarize(salary)
// 13. Plot a relationship between salary and sales
scatter salary sales
// 14. Histogram graph for salary
histogram salary
// 15. Make a table of correlations for all variables
pwcorr
// 16. Calculate the correlation between salary and sales
pwcorr salary sales
// 17. Calculate the correlation between salary and finance
gen finance_dummy = (industry == "finance")
pwcorr salary finance_dummy
// 18. Generate a dummy variable "large" based on sales exceeding the average sales value
summarize sales
gen large = (sales > r(mean))
// 19. Label the variable "large"
label variable large "=1 if the firm is large"
// 20. Count how many financial firms are "large"
count if industry == "finance" & large == 1
// 21. Calculate the percentage of financial firms in the "large" category
tabulate large industry, row
// 22. Calculate the percentage of financial firms and utility firms in the dataset
tabulate industry, all row
// 23. Generate a new variable "benefit" based on CEO salary ranges
gen benefit = cond(salary < 4000, 0, cond(salary <= 10000, 1, 2))
// 24. Generate a variable "logsalary" which is the natural log of "salary" and label it
gen logsalary = log(salary)
label variable logsalary "log of salary"
// 25. Generate a variable "logsalary" for "pcsalary" and identify missing observations
gen logsalary_pcsalary = log(pcsalary)
count if missing(logsalary_pcsalary)
// 26. Generate a variable "roesqr" which is the square of "roe"
gen roesqr = roe^2
// 27. Generate a variable "ssratio" which is salary divided by sales
gen ssratio = salary / sales
// 28. Move the variable "salary" to the last position in the dataset
order salary last
// 29. Move the variable "ros" above the variable "roe"
order ros roe
// 30. Save the modified dataset
save CEOSAL1NEW.dta, replace