Data Engineer Profiles

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

DATA ENGINEER PROFILES

 3+ years of overall experience with a minimum of 2 years experience in GCP or


AWS. GCP preferred.
 A strong foundation of relational and no relational databases. SQL Server, Oracle,
MySQL, PostgreSQL, MongoDB, Cassandra etc.,
 Hands on experience with Python. This is non-negotiable.
 Demonstrable ETL experience is a must. Apache Airflow / Apache Nifi and
enterprise ETL tools like SSIS / Informatica is a bonus
 Experience working on Data warehouse and data lake environments
 SAP exposure is a huge bonus
 Hadoop experience - SparkSQL, Spark scala, Hive, Pig etc
 Having strong programming experience on Data Engineering side on Python, Scala or
Java.
 Have worked on GCP and is hands on with data analysis, creating data pipelines and
orchestration on GCP tools like Data flow, Datafusion, composer, BigQuery.
 Should have worked on Git or code repository.
 Should have worked on Agile delivery. Advanced working SQL knowledge and
experience working with relational databases, query authoring (SQL) as well as
working familiarity with a variety of databases.
 Experience building or maintaining ETL processes from a wide variety of data
sources using SQL.
 Experience building and optimizing big data data pipelines, architectures and data
sets.
 Build processes supporting data transformation, data structures, metadata, dependency
and workload management.
 Able to identify data quality issues and has worked with creating framework to
resolve data quality issues. Java + Micro services Experience in Java 5-10 Experience
in SprintBoot(2+years)
 Good Knowledge of Microservice Concept Framework: Springboot, Spring Security,
JAX-RS, Hystrix, Kafka ORM: Spring Data JPA. Hibernate Cloud Service:
AWS(MSK, S3), Serverless lambda Functions Build tools: Maven, Gradle

 7 years of recent experience in data engineering.


 A solid track record of data management showing flawless execution and attention to
detail.
 Selecting and integrating any Big Data tools and frameworks required to provide
requested capabilities
 Extensive Experience around SQL, Streaming and ETL.
 Extensive experience in Kafka, spark, python, or Scala
 Experience with any NoSQL databases (like Hbase, Cassadra, redis)
 Experience with integration of data from multiple data sources.
 Technical expertise regarding data models, database design development, data mining
and segmentation techniques
 

 Programming experience, ideally in Python or Scala


 Knowledge of data cleaning, wrangling, visualization and reporting, with an
understanding of the best, most efficient use of associated tools and applications to
complete these tasks.
 Experience in processing large amounts of structured and unstructured data, including
integrating data from multiple sources.
 A willingness to explore new alternatives or options to solve data mining issues, and
utilize a combination of industry best practices, data innovations and your experience
to get the job done.
 Experience in production support and troubleshooting.

 Azure Data Engg


 Azure Databricks, Azure Data Factory, Hive, Spark and Python
 Unix, Pyspark, Microsoft Azure

SQL
Hands on Experience using AWS core services: EC2, S3, VPC, ELB, lambda.
Experience in building ETL routines in AWS (including Ingress and Egress)
Experience in data profiling and source system data analysis
Understand the concept and principles of data modeling

Good to have:
Experience working with scripting languages like Python, Bash, Pyspark
Experience in redshift and other PAAS databases in AWS
Experience in building Customer MDM solutions

ust Have : Spark with underlying technology Python” and "Hadoop ecosystem, architecture
of Hadoop, YARN, Sqoop, Hive and PySpark.

Nice to have :

 Java, AWS, SQL

 Good experience in Hadoop , Spark, Hive, Shell Scripting


 Development of Big Data application using Spark
 Development/Debugging in Linux/Unix platform
 Good understanding on AWS ECR ,Kinesis.

5+ years demonstratable work experience as a Data Architect using data modeling tools such as
Erwin, Toad or ER/Studio
2-3 years T-SQL coding experience on large projects working with newer versions of SQL Server (2014
and newer)
Must possess expert T-SQL programming, performance tuning and troubleshooting skills
Create database objects and T-SQL scripts
Experience designing data warehousing solutions
In-depth understanding of database structure principles
Seamlessly able to translate business requirements to technology requirements
Familiarity with Microsoft Azure, Azure SQL, Azure Data Flow & Azure Data Factory
Expertise with No-SQL databases such as Azure Cosmos D

ETL, Pyspark, Python, Redshift/Snowflake

Overall experience of 6-8 years with 4-6 years of strong experience building ETLs and Data pipelines
and optimizing using SQL, Hive, Python and Pyspark. 2+ years of experience on AWS Bigdata
solutions like EMR (or Hadoop-Hive), Glue and PySpark and Redshift. 1+ years of experience in AWS
using services like S3, EC2, RDS, EMR, Redshift. Experience building and optimizing RDBMS/BIGDATA
data pipelines, architecture and data sets.

 Looking for a tech savvy Data Engineer to Design, Develop and Support ETL interfaces
of a big data marketing technology platform built on AWS. Understand the existing
landscape, document and optimize the pipelines for best performance. Interact with Business
and Marketing users, Data Scientists and other developers.
 Prefer candidates having exposure to Data science (Model creation and execution
 Good to have knowledge in ETL tools such as Glue, Spark etc.
 Strong communication and interpersonal skills

 Strong Software Engineering experience with proficiency in at least one high-level


programming language (Python, Java, Scala).

 Experience and proficiency in at least one high-volume data processing environment


(Teradata, Oracle, Cloudera, BigQuery or equivalent)

Minimum 4 years of relevant experience in data engineering, including strong experience with SQL,
Python,PySpark.
Strong Experience in any ETL tool like Informatica, DataStage, Alteryx, Talend, etc.
Expertise in writing complex SQL and PL SQL queries.
Proficiency in at least one of the cloud technologies: Azure, AWS, or GCP

1. Analytical SQL query writing

2. Python (Problem solving skill)

3. Hive or spark

4. AWS/GCP
5. Airflow

Technical and Professional Requirements:


Required
Experience in handling and maintaining ETL and Data pipelines
Working knowledge on Apache Scoop, Flume, Spark, Airflow, Hive etc.
Experience in configuring and managing visualization tools like Grafana/Kibana,
Prometheus, ELK
Must have good experience with system integration with cloud/On-Prem vendors.
Must have experience in designing and proposing Data solutions
Work exposure with NoSQL Preferred
Good team player with exposure in managing teams and client rapport
Must have exposure to cluster based hyperscalers.
Experience with Kafka Streaming platforms.
Good to have cloud and containerization exposure
Preferred Skills:
ETL->NoSQL
Bigdata->Spark
Bigdata->Hive

 Experience working with varied forms of data infrastructure inclusive of relational


databases such as SQL, Hadoop,Spark
 Proficiency in scripting languages in python, pyspark/Scala spark
 Experience in Database design/data modeling.
 Must have strong experience in data warehouse concepts.
 Experience in AWS cloud
 Experience in Databricks
 Desired Skills: AWS, Cloud

 Graduate degree in Computer Science, Statistics, Informatics, Information Systems or
another quantitative field.
 7+ Yrs total experience in Data Engineering projects & 4+ years of relevant
experience on Azure technology services and Python
 Azure : Azure data factory, ADLS- Azure data lake store, Azure data bricks,
 Mandatory Programming languages : Py-Spark, PL/SQL, Spark SQL
 Database : SQL DB
 Experience with Azure: ADLS, Databricks, Stream Analytics, SQL DW, COSMOS
DB, Analysis Services, Azure Functions, Serverless Architecture, ARM Templates
 Experience with relational SQL and NoSQL databases, including Postgres and
Cassandra.
 Experience with object-oriented/object function scripting languages: Python, SQL,
Scala, Spark-SQL etc.
 Data Warehousing experience with strong domain

 rall experience of 4+ years


 Experience in AWS Cloud services & solutions
 Experience working with enterprise data warehouse
 Experience as an ETL/ELT Developer using various ETL/ELT tools
 Experience in SQL/NoSQL/DWH databases across SQL DB, Managed instance &
Data warehouse
 Experience in AWS platform services such as S3, EMR, RedShift, Glue, Kinesis,
OpenSearch, Athena, QuickSight
 Experience in Apache Spark, Databricks
 Experience in creating data structures optimized for storage and various query
patterns like Parquet
 Experience in building secured visualization reports and dashboards with access
controls
 Experience in working in an Agile SDLC methodology
 Experience in DevOps Services using Git Repos, deployment artifacts and release
packages for Test & production environment
 Experience in building end-end scalable data solutions, from sourcing raw data, and
transforming data to producing analytics reports
 Should have experience in developing a complete DWH ETL lifecycle
 Experience in Data Analysis, Data Modelling and Data Mart design
 Should have experience in developing ETL processes - ETL control tables, error
logging, auditing, data quality, etc. - using ETL tools.
 Experience in Data Integrator Scripts, workflows, Dataflow, Data stores, Transforms,
and Functions.
 Should have worked on at least 2 end-to-end implementations
 Worked on Change Data Capture on both SOURCE and TARGET levels and a good
understanding of Slowly changing Dimension (SCD)
 Should be able to implement reusability, parameterization, workflow design
 Should have experience in interacting with customers in understanding business
requirement documents and translating them into ETL specifications and Low/High-
level design documents
 Strong database development skills like complex SQL queries, complex stored
procedures
 Able to work in Agile Framework Should have exposure to Scrum meetings.

Good to have:

 Exposure to other ETL/ELT, DWT technologies


 Hands-on with Data visualization tools like Power BI, Tableau, Qlik, QuickSight etc.
 Exposure to Python on ETL and Data Visualization libraries

Additional Skills:

 Good Communication Skills.


 Able to deliver independently.
 Team player.

You might also like