By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
To advance the organization by developing algorithms to build models that uncover connections and make better decisions without human intervention.
Role
The role of a Lead data engineer is to preparing data by fetching data from different channels and standardize it tobe used easily.
Authority
Research and test new technologies
Collaborating with other stakeholders
Monitoring and Oversee company's data
Managing users and user roles
Leading the team
Initiate projects and plans
Detecting, announcing, and correcting errors
Responsibility
Develops large scale data structures and pipelines to organize, collect, and standardize data that helps generate insights and addresses reporting needs.
Writes ETL processes, design database systems, and develop tools for real-time and offline analytic processing.
Collaborates with the data science team to transform data and integrate algorithms and models into automated processes.
Uses knowledge in Hadoop architecture, HDFS commands, and experience designing & optimizing queries to build data pipelines.
Uses strong programming skills in Python, Java, or any of the major languages to build robust data pipelines and dynamic systems.
Builds data marts and data models to support Data Science and other internal customers.
Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards.
Analyzes current information technology environments to identify and assess critical capabilities and recommend solutions.
Experiments with available tools and advice on new tools in order to determine the optimal solution given the requirements dictated by the model/use case.
Requirements
Have degree in the computer engineering,
Expertise with different type of structure and unstructure databases, like MySQL, Postgres, MongoDB, and etc,
Know programming languages like Java, C++, Python, and etc,
Know python libraries, specially Pandas and numpy,
Know cloud infrastructures like AWS, Azure, and Google cloud,
Know Linux shell scripting,
Expertise with SQL, like oracle, greenplum, and teradata,
Work with data streaming framework Kafka, NiFi, Spark streaming, and etc),
Expertise with bigdata, like HDFS,hive, sqoop, pig, Hadoop, and spark.
Benefits
It's always a good idea to include the benefits of the job the company will provide such as:
Flexible hours to give you freedom and increase productivity