Micron Interview Questions Summary # Question 1 Parsing The HTML Webpages

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Micron interview questions summary

# Question 1 Parsing the html webpages


For parsing the html pages I have used the beautiful Soup along with the pandas
for the dataframes object. Then I read the tables which was in the section of your
question using soup html parser. Then I find the rows and row data associated
with it. In order to process the column headers and values I have separated both
and merged afterwards by cleaning it for any spaces.
And then as required I have packed all the list values which contains column
headers and values to a dict.

# Question 2
a Check if there are new lines in the ‘NewData.csv’, and append them to the
existing ‘MasterDB.csv’, as long as the ‘Status’ in the row is ‘Available’, and
the ‘Price’ and ‘COE’ columns are not ‘N.A’ (has value in ).

Initially read the data and check the condition given for appending the new data

rows_to_be_updated=new_data[(new_data['COE']!="N.A.") &
(new_data['Price']!="N.A") &(new_data['Status']=='Avaialble')]

And after fetching the above records, there is missing value which has to be
treated. Then for comparing the rows from master and fetched rows , there is
‘compare’ method in pandas which I have avoided as it’s resource intensive and
not supported with some pandas versions which could be bottleneck. Inorder to
compare I have used the last index of master data and then appended the
fetched rows according to it.

b. For the existing lines, see if the NewData.csv, contains any changes. If
yes, update the changes in the ‘MasterDB.csv’.

Used left outer join for comparing the further rows and removed unwanted rows.
We could have done with several methods alternatively.
c. If the column ‘Status’ in the NewData.csv is ‘Sold’, then remove those
lines from the ‘MasterDB.csv’
Just checked the condition for not equals sold and then filtered the remaining
rows in master.

# Question 3
a. Develop a script that can split Column ‘Car Name’ to get the following
attributes

i. Car Make

ii. Car Model Name

iii. COE End Date

Used the lambda functions for splitting the column according to space. And for
the end date I have fetched last elements and extracted the date from it. Lambda
function can be used in python as well as in spark which provides better
performance.

b. Build statistical model for every car


make(Eg. Toyota)
i. Mean, Median, mode

ii. +- 3 Sigma Value

In order to extract all the above statistics I have formatted the data in specific
dtypes. And the nI have used groupby pandas as well as aggregation for all the
values.

You might also like