To advance the organization by developing algorithms to build models that uncover connections and make better decisions without human intervention.
Lead data scientist role is an inter-disciplinary field that uses scientific methods to extract structured and unstructured data, converting it to insights. Algorithms, Data Bases and Programming languages are the main ingredients of this role.
Research and test new technologies
Collaborating with other stakeholders
Monitoring and Oversee company's data
Managing users and user roles
Leading the team
Initiate projects and plans
Detecting, announcing, and correcting errors
Develops large scale data structures and pipelines to organize, collect, and standardize data that helps generate insights and addresses reporting needs.
Writes ETL processes, design database systems, and develop tools for real-time and offline analytic processing.
Collaborates with the data science team to transform data and integrate algorithms and models into automated processes.
Uses knowledge in Hadoop architecture, HDFS commands, and experience designing & optimizing queries to build data pipelines.
Uses strong programming skills in Python, Java, or any of the major languages to build robust data pipelines and dynamic systems.
Builds data marts and data models to support Data Science and other internal customers.
Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards.
Analyzes current information technology environments to identify and assess critical capabilities and recommend solutions.
Experiments with available tools and advice on new tools in order to determine the optimal solution given the requirements dictated by the model/use case.
Have degree in the computer engineering,
Expertise with different type of structure and unstructure databases, like MySQL, Postgres, MongoDB, and etc,
Know programming languages like Java, C++, Python, and etc,
Know python libraries, specially Pandas and numpy,
Know cloud infrastructures like AWS, Azure, and Google cloud,
Know Linux shell scripting,
Expertise with SQL, like oracle, greenplum, and teradata,
Work with data streaming framework Kafka, NiFi, Spark streaming, and etc),
Expertise with bigdata, like HDFS,hive, sqoop, pig, Hadoop, and spark.
It's always a good idea to include the benefits of the job the company will provide such as:
Flexible hours to give you freedom and increase productivity