Ensure adequate data infrastructure is in place to support the availability and efficient blending of traditional, unstructured and also third party data for various analytics projects;
Work closely with various business owners to prepare contextual data sets to support downstream applications, such as Tableau and SAS
Design, develop, document and implement end-to-end data pipelines and data integration processes, both batch and real time;
Perform data analysis, data profiling, data cleansing, data lineage, data mapping and data transformation;
Develop ETL / ELT jobs and workflows, and deployment of data solutions;
Monitor, recommend, develop and implement ways to improve data quality including reliability, efficiency and cleanliness to optimize and fine tune processes;
Recommend, execute and deliver best practices in data management and data lifecycle processes, including modular development of data processes, coding and configuration standards, error handling and notification standards, auditing standards, and data archival standards.
Maintain awareness of industry trends on regulatory compliance, emerging threats and technologies in order to understand the risk and better safeguard the company
Highlight any potential concerns /risks and proactively shares best risk management practice.
At least Bachelor’s Degree in Computer Science, Applied Math, Statistics, or related technical field
Experience in building and optimizing ‘big data’ data pipelines, architectures and data sets.
Experience with big data tools: Hadoop, Spark, Hive, Kafka, flume, Hbase etc.
Experience with stream-processing systems: Spark
Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases e.g. Oracle, Maria DB, Teradata etc.
CI/CD experience (Jenkins, GitHub)
Experience with data pipeline and workflow management tools e.g. Airflow, Jenkins etc.
Experience performing root cause analysis on internal and external data and processes.
Possess good communications skills to understand core business objectives and build end-to-end data-centric solutions to address them
Good critical thinking and problem-solving abilities