0% found this document useful (0 votes)
160 views7 pages

Nikhil Kumar Mutyala - Senior Big Data Engineer

The document provides a profile summary and details for Nikhil Kumar Mutyala. It includes his contact information, over 9 years of experience in big data engineering using tools like Hadoop, Spark, Hive, Kafka and databases like Oracle, SQL Server and MongoDB. It also lists his technical skills and experience building data pipelines and warehouses for clients like McKinsey and Travelport.

Uploaded by

0305vipul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views7 pages

Nikhil Kumar Mutyala - Senior Big Data Engineer

The document provides a profile summary and details for Nikhil Kumar Mutyala. It includes his contact information, over 9 years of experience in big data engineering using tools like Hadoop, Spark, Hive, Kafka and databases like Oracle, SQL Server and MongoDB. It also lists his technical skills and experience building data pipelines and warehouses for clients like McKinsey and Travelport.

Uploaded by

0305vipul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Name: Nikhil Kumar Mutyala

Email: nikhilm.dev6@gmail.com
Phone: 9803890859
Sr. Big Data Engineer
LinkedIn: www.linkedin.com/in/nikhilkumar-nk

PROFILE SUMMARY:

 9+ years of professional experience in information technology with an expert hand in the areas of BIG DATA, HADOOP,
SPARK, HIVE, IMPALA, SQOOP, FLUME, KAFKA, SQL tuning, ETL development, report development, SAS, database
development, data modeling and strong knowledge of oracle database architecture.
 Experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig,
Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop, AWS, Spring Boot, Spark integration with
Cassandra, Avro, Solr and Zookeeper.
 Hands on experience in test driven development (TDD), Behavior driven development (BDD) and acceptance test driven
development (ATDD) approaches.
 Strong experience in migrating other databases to Snowflake
 Managing Database, Azure Data Platform services (Azure Data Lake(ADLS), Data Factory(ADF), Data Lake Analytics, Stream
Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB), SQL Server, Oracle, Data Warehouse etc. Build multiple Data
Lakes.
 Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools
like Tableau, Power BI.
 Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue, Lambda functions, Step
functions, Cloud Watch, SNS, Dynamo DB, SQS.
 Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server. Worked on different file
formats like delimited files, avro, Json and parquet. Docker container orchestration using ECS, ALB and lambda.
 Created Snowflake Schemas by normalizing the dimension tables as appropriate, and creating a Sub Dimension named
Demographic as a subset to the Customer Dimension.
 Expertise in Java programming and have a good understanding on OOPs, I/O, Collections, Exceptions Handling, Lambda
Expressions, Annotations
 Able to use Sqoop to migrate data between RDBMS, NoSQL databases and HDFS.
 Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as
data processing like collecting, aggregating and moving data from various sources using Apache Flume, Kafka, Power BI and
Microsoft SSIS.
 Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
 Expertise in designing complex Mappings and have expertise in performance tuning and slowly changing Dimension Tables
and Fact tables
 Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source
systems including flat files.
 Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle,
SQL Server, Hive, and MongoDB using Python.
 Provided full life cycle support to logical/physical database design, schema management and deployment. Adept at
database deployment phase with strict configuration management and controlled coordination with different teams.
 Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy. Experience in
working on creating and running Docker images with multiple micro services.
 Utilized analytical applications like R, SPSS, Rattle and Python to identify trends and relationships between different pieces
of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that
drive value.
 Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem
components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL, Kafka.
 Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark Streaming for processing and transforming
complex data using in-memory computing capabilities written in Scala. Worked with Spark to improve efficiency of existing
algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD's and Spark YARN.
 Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle)
 Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in
collaborative team, a self-motivated enthusiastic learner.
 Developed spark applications in python (Pyspark) on distributed environment to load huge number of CSV files with
different schema in to Hive ORC tables.

TECHNICAL EXPERIENCE:

 Big Data Tools: Hadoop Ecosystem: Map Reduce, Spark 2.3, Airflow 1.10.8, Nifi 2, HBase 1.2, Hive 2.3, Pig 0.17 Sqoop 1.4,
Kafka 1.0.1, Oozie 4.3, Hadoop 3.0
 Data Modeling Tools: Erwin Data Modeler, ER Studio v17
 Programming Languages: SQL, PL/SQL, and UNIX.
 Methodologies: RAD, JAD, System Development Life Cycle (SDLC), Agile
 Cloud Platform: AWS, Azure, Google Cloud, Snowflake
 Cloud Management: Amazon Web Services (AWS)-EC2, EMR, S3, Redshift, EMR, Lambda, Athena
 Databases: Oracle, MS Sql Server, MySQL, MongoDB, Cassandra, Hbase, Teradata R15/R14.
 OLAP Tools: Tableau, SSAS, Business Objects, and Crystal Reports 9
 ETL/Data warehouse Tools: Informatica 9.6/9.1, and Tableau.
 Operating System: Windows, Unix, Sun Solaris

PROJECT EXPERIENCE:

Mckinsey, Charlotte, NC Sep 2021 – Till Date


Senior Big Data Engineer
Roles& Responsibilities:

 Expert in analyzing business requirements and contributing to solution creation, design and deployment.
 Participates in the development improvement and maintenance of snowflake database applications.
 Created tables and views on snowflake as per the business needs.
 Spark Streaming integration with Kafka POC.
 Developing ETL pipelines in and out of data warehouse using Snowflakes SnowSQL Writing SQL queries against Snowflake
 Involved in Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which
include loading nested JSON formatted data into snowflake table.
 Responsible for building scalable distributed data solutions using Hadoop.
 Connecting Kafka servers using SSL and SASL Connectivity
 Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems and vice-
versa
 Worked on the ETL jobs (FiveTran, Informatica) to migrate data from on premise to AWS cloud S3 and snowflake by
generating JSON and CSV files to support Catalog API integration.
 Developing and maintaining Workflow Scheduling Jobs in Oozie.
 Developed efficient Map Reduce programs in java for filtering out the unstructured data.
 Initiated Data Governance for real time events by designing and Implementing Schema Registry (Avro format)
 Built Snowpipe pipelines for continuous data load to AWS S3 and Snowflake Datawarehouse
 Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
 Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
 Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and
PySpark.
 Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch
 Used AWS Glue for the data transformation, validate and data cleansing.
 Used python Boto 3 to configure the services AWS glue, EC2, S3
 Developed Spark scripts by using Scala shell commands as per the requirement.
 Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations
 Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
 Monitoring Kafka metrics, performance metrics and Health checks using App Dynamics.
 Log monitoring and Info monitoring using Splunk.
 Built real time dashboards using Grafana.

Environment: Hadoop, Spark, Kafka, Hive, AWS, Glue, EC2, S3, Lambda, Auto Scaling, Cloud Watch, Cloud Formation, IAM,
Snowflake, SnowSQL, MapReduce, Oozie, Data Governance, Python, Sqoop, Scala, PL/SQL, Oracle 12c, MongoDB, Shell Scripting,
Splunk, Grafana

Travelport, Inglewood, CA July 2020– Sep 2021


Sr. Big Data Engineer
Roles& Responsibilities:
 Installing, configuring and maintaining Data Pipelines
 Transforming business problems into Big Data solutions and define Big Data strategy and Roadmap.
 Designing the business requirement collection approach based on the project scope and SDLC methodology.
 Conduct performance analysis and optimize data processes. Make recommendations for continuous improvement of the
data processing environment Conduct performance analysis and optimize data processes. Make recommendations for
continuous improvement of the data processing environment.
 Develop a data platform from scratch and took part in requirement gathering and analysis phase of the project in
documenting the business requirements.
 Design and implement multiple ETL solutions with various data sources by extensive SQL Scripting, ETL tools, Python, Shell
Scripting and scheduling tools. Data profiling and data wrangling of XML, Web feeds and file handling using python, Unix
and Sql.
 Loading data from different sources to a data warehouse to perform some data aggregations for business Intelligence using
python.
 Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to
Tableau for generating interactive reports using Hive server2.
 Monitored SAS servers for utilization of resources and promptly investigated and resolved incidents.
 Authoring Python (PySpark) Scripts for custom UDF’s for Row/ Column manipulations, merges, aggregations, stacking, data
labelling and for all Cleaning and conforming tasks.
 Writing Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
 Used Sqoop to channel data from different sources of HDFS and RDBMS.
 Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from
multiple file formats.
 Implemented data intelligence solutions around Snowflake Data Warehouse.
 Used SQL Server Management Tool to check the data in the database as compared to the requirement give
 Validated the test data in DB2 tables on Mainframes and on Teradata using SQL queries.
 Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.
 Identified and documented Functional/Non-Functional and other related business decisions for implementing Actimize-
SAM to comply with AML Regulations.
 Creating Reports in Looker based on Snowflake Connections.
 Work with region and country AML Compliance leads to support start-up of compliance-led projects at regional and
country levels. Including defining the subsequent phases training, UAT, staff to perform test scripts, data migration and the
uplift strategy (updating of customer information to bring them to the new KYC standards) review of customer
documentation.
 Description of End-to-end development of Actimize models for trading compliance solutions of the project bank.
 Automated and scheduled recurring reporting processes using UNIX shell scripting and Teradata utilities such as MLOAD,
BTEQ and Fast Load
 Used SSIS to build automated multi-dimensional cubes.
 Used Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS using Python and NoSQL
databases such as HBase and Cassandra
 Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and
Aggregation on the fly to build the common learner data model and persists the data in HDFS.
 Prepared and uploaded SSRS reports. Manages database and SSRS permissions.
 Used Apache NiFi to copy data from local file system to HDP. Thorough understanding of various modules of AML including
Watch List Filtering, Suspicious Activity Monitoring, CTR, CDD, and EDD.
 Worked on Dimensional and Relational Data Modelling using Star and Snowflake Schemas, OLTP/OLAP system, Conceptual,
Logical and Physical data modelling using Erwin.
 Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System.
 Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift,
Oracle, MongoDB, T-SQL, and SQL Server using Python.

Environment: Cloudera Manager (CDH5), Hadoop, Snowflake, Pyspark, HDFS, NiFi, Pig, Hive, AWS, S3, Kafka, GCP, Scrum,
Git, Sqoop, Oozie, Pyspark, Informatica, Tableau, OLTP, OLAP, HBase, Cassandra, Informatica, SQL Server, Python, Shell
Scripting, XML, Unix.

Tailored Brands, Fremont, CA Apr 2018- Jun 2020


Sr. Big Data Engineer
Roles& Responsibilities:
 Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines
 Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines
 Performed data extraction, transformation, loading, and integration in data warehouse, operational data stores and master
data management
 Worked on architecting the ETL transformation layers and writing spark jobs to do the processing.
 Designed & build infrastructure for the Google Cloud environment from scratch
 Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly
changing dimension)
 Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS,
GCP
 Worked on confluence and Jira
 Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built
with Python
 Strong understanding of AWS components such as EC2 and S3
 Responsible for data services and data movement infrastructures
 Experienced in ETL concepts, building ETL solutions and Data modeling
 Worked on continuous Integration tools Jenkins and automated jar files at end of day.
 Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
 Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
 Experience in setting up the whole app stack, setup, and debug log stash to send Apache logs to AWS Elastic search.
 Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
 Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data
Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data
Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks
 Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling
 Compiled data from various sources to perform complex analysis for actionable results
 Experience in working with different join patterns and implemented both Map and Reduce Side Joins.
 Wrote Flume configuration files for importing streaming log data into HBase with Flume.
 Imported several transactional logs from web servers with Flume to ingest the data into HDFS.
 Using Flume and Spool directory for loading the data from local system (LFS) to HDFS.
 Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
 Created Partitioned Hive tables and worked on them using Hive QL.
 Tested Apache Tez for building high performance batch and interactive data processing applications on Pig and Hive jobs.
 Measured Efficiency of Hadoop/Hive environment ensuring SLA is met
 Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWS
 Participated in the full software development lifecycle with requirements, solution design, development, QA
implementation, and product support using Scrum and other Agile methodologies
 Collaborate with team members and stakeholders in design and development of data environment
 Preparing associated documentation for specifications, requirements, and testing
Environment: Hadoop, Hive, AWS, Hbase, Scala, Flume, Apache Tez, Cloud Shell, Azure Databricks, Docker, Jira, MySQL, Posgres,
Sql Server, Python, Scala, Spark, Hive, Spark-Sql

Quest Diagnostics, NJ Dec 2016- Mar 2018


Data Engineer/ Analyst
Roles& Responsibilities:
 Responsibilities include gathering business requirements, developing strategy for data cleansing and data migration,
writing functional and technical specifications, creating source to target mapping, designing data profiling and data
validation jobs in Informatica, and creating ETL jobs in Informatica.
 Worked on Hadoop cluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended up to
24 nodes during production.
 Developed SAS table templates and coding protocol in order to process and analyze clinical trial data.
 The new Business Data Warehouse (BDW) improved query/report performance, reduced the time needed to develop
reports and established self-service reporting model in Cognos for business users.
 Implemented Bucketing and Partitioning using hive to assist the users with data analysis.
 Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
 Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
 Develop database management systems for easy access, storage, and retrieval of data.
 Perform DB activities such as indexing, performance tuning, and backup and restore.
 Expertise in writing Hadoop Jobs for analysing data using Hive QL (Queries), Pig Latin (Data flow language), and custom
MapReduce programs in Java.
 Built APIs that will allow customer service representatives to access the data and answer queries.
 Designed changes to transform current Hadoop jobs to HBase.
 Handled fixing of defects efficiently and worked with the QA and BA team for clarifications.
 Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting,
Manage and review data backups, Manage & review log files.
 Extending the functionality of Hive with custom UDF s and UDAF's
 Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and
Map Side joins.
 Expert in creating Hive UDFs using Java to analyse the data efficiently.
 Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.
 Implemented AJAX, JSON, and Java script to create interactive web screens.
 Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in
NoSQL databases such as MongoDB. Involved in loading and transforming large sets of Structured, Semi-Structured and
Unstructured data and analysed them by running Hive queries. Processed the image data through the Hadoop distributed
system by using Map and Reduce then stored into HDFS.
 Created Session Beans and controller Servlets for handling HTTP requests from Talend
 Performed Data Visualization and Designed Dashboards with Tableau and generated complex reports including chars,
summaries, and graphs to interpret the findings to the team and stakeholders.
 Wrote documentation for each report including purpose, data source, column mapping, transformation, and user group.
 Utilized Waterfall methodology for team and project management
 Used Git for version control with Data Engineer team and Data Scientists colleagues. Involved in creating
Created Tableau dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts etc. using show
me functionality. Dashboards and stories as needed using Tableau Desktop and Tableau Server
 Performed statistical analysis using SQL, Python, R Programming and Excel.
 Worked extensively with Excel VBA Macros, Microsoft Access Forms
 Import, clean, filter and analyse data using tools such as SQL, HIVE and PIG.
 Used Python& SAS to extract, transform & load source data from transaction systems, generated reports, insights, and key
conclusions.
 Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users
to understand the data on the fly with the usage of quick filters for on demand needed information.
 Analysed and recommended improvements for better data consistency and efficiency
 Designed and Developed data mapping procedures ETL-Data Extraction, Data Analysis and Loading process for
integrating data using R programming.
 Effectively Communicated plans, project status, project risks and project metrics to the project team planned test strategies
in accordance with project scope.
Environment: Cloudera CDH4.3, Hadoop, Pig, Hive, Informatica, Hbase, MapReduce, HDFS, Sqoop, Impala, SQL, Tableau, Python,
SAS, Flume, Java script, Oozie, Linux, No SQL, MongoDB, Talend, Git.

Equifax, Hyderabad, India July 2014- Oct 2016


Data Engineer
Roles& Responsibilities:
 Experience in Big Data Analytics and design in Hadoop ecosystem using MapReduce Programming, Spark, Hive, Pig, Sqoop,
HBase, Oozie, Impala, Kafka
 Build the Oozie pipeline which performs several actions like file move process, Sqoop the data from the source Teradata or
SQL and exports into the hive staging tables and performing aggregations as per business requirements and loading into
the main tables.
 Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce(EMR) on (EC2).
 Performing the forking action whenever there is a scope of parallel process for optimization of data latency.
 Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
 Involved in creating UNIX shell Scripting. Defragmentation of tables, partitioning, compressing and indexes for improved
performance and efficiency.
 Developed reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers
to be used by the team and satisfying the business rules.
 Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from
multiple sources
 Developed and implemented R and Shiny application which showcases machine learning for business forecasting.
Developed predictive models using Python & R to predict customers churn and classification of customers.
 Partner with infrastructure and platform teams to configure, tune tools, automate tasks and guide the evolution of internal
big data ecosystem; serve as a bridge between data scientists and infrastructure/platform teams.
 Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
 Performed pig script which picks the data from one Hdfs path and performs aggregation and loads into another path which
later pulls populates into another domain table. Converted this script into a jar and passed as parameter in Oozie script
 Hands on experiences on git bash commands like git pull to pull the code from source and developing it as per the
requirements, git add to add files, git commit after the code build and git push to the pre prod environment for the code
review and later used screwdriver. yaml which actually build the code, generates artifacts which releases in to production
 Created logical data model from the conceptual model and its conversion into the physical database design using Erwin.
Involved in transforming data from legacy tables to HDFS, and HBase tables using Sqoop.
 Connected to AWS Redshift through Tableau to extract live data for real time analysis.
 Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the
analysis and suggested solutions for investors
 Rapid model creation in Python using pandas, NumPy, sklearn, and plot.ly for data visualization. These models are then
implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.

Environment: MapReduce, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka, JSON, XML PL/SQL, Sql, HDFS, Unix, Python,
SAS, PySpark, Redshift, Shell Scripting.

Merrill Lynch (Bank of America), Hyderabad, India Dec 2012- Jun 2014
Data & Reporting Analyst
Roles& Responsibilities:
 Created consumption views on top of metrics to reduce the running time for complex queries.
 Compare the data in a leaf level process from various databases when data transformation or data loading takes place. I
need to analyze and look into the data quality when these types of loads are done (To look for any data loss, data
corruption).
 Analysed marketing campaigns from various perspectives including CTR, conversion rates, seasonal/geographical trends,
search queries, landing page, conversion funnel, quality score, competitors, distribution channel, etc. to achieve maximum
ROI for clients.
 Worked with business to identify the gaps in mobile tracking and come up with the solution to solve.
 Analysed click events of Hybrid landing page which includes bounce rate, conversion rate, Jump back rate, List/Gallery
view, etc. and provide valuable information for landing page optimization.
 Evaluated the traffic and performance of Daily deals PLA ads and compare those items with non-daily deal items to see the
possibility of increasing ROI. suggested improvements and modify existing BI components (Reports, Stored Procedures)
 Understood Business requirements to the core and Came up with Test Strategy based on Business rules
 Prepared Test Plan to ensure QA and Development phases are in parallel
 Written and executed Test Cases and reviewed with Business & Development Teams.
 Implemented Defect Tracking process using JIRA tool by assigning bugs to Development Team
 Automated Regression tool (Qute) and reduced manual effort and increased team productivity
 Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS
 Created performance dashboards in Tableau/ Excel / Power point for the key stakeholders
 Incorporated predictive modelling (rule engine) to evaluate the Customer/Seller health score using python scripts,
performed computations and integrated with the Tableau viz.
 Worked with stakeholders to communicate campaign results, strategy, issues or needs.
 Involved in Functional Testing, Integration testing, Regression Testing, Smoke testing and performance Testing. Tested
Hadoop MapReduce developed in python, pig, Hive
 Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.
 Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts
 Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.
 Developed spark code and spark-SQL/streaming for faster testing and processing of data.
 Closely involved in scheduling Daily, Monthly jobs with Precondition/Post condition based on the requirement.
 Monitor the Daily, Weekly, Monthly jobs and provide support in case of failures/issues.

Environment: Hadoop, MapReduce, AWS, AWS S3, GitHub, Service Now, HP Service Manager, Jira, EMR, Nebula, Teradata, SQL
Server, Apache Spark, Sqoop.

You might also like