Data Platform Engineer
Responsible to procure and maintain required infra capabilities like Kube storage(Hadoop/object store)
Responsible to maintain and support job scheduler required for running job
Responsible to support for ADT data table creation modification replication or cleanup as needed for Data product or Data science
Responsible to create scripts and ensure onboard/offboard data products to access modules like ADAM/Papyrus
Responsible for namespace administration
Responsibilities
Data Platform Engineer
Responsible to procure and maintain required infra capabilities like Kube storage(Hadoop/object store)
Responsible to maintain and support job scheduler required for running job
Responsible to support for ADT data table creation modification replication or cleanup as needed for Data product or Data science
Responsible to create scripts and ensure onboard/offboard data products to access modules like ADAM/Papyrus
Responsible for namespace administration
Responsible for CICD
Defines and maintains Github processes like branching and PR processes to main
Defines and maintains deployment pipeline using Rio for new data products and implements checks for dev/stage/prod
Setup necessary access controls create deployment config by environment to ensure data pipelines can run
Creates job workflow on job scheduler and enables alerts to slack email etc
Responsible for enabling notebooks and compute needed for data science
Enable Git processes for exploratory data analysis for sql scripts/python code
Enable processes for running pipelines from notebooks for adhoc analysis
Configure and maintain notebook plugins as needed for specific usecase
Support for procuring data access for DS following governance practices as prescribed
Works with Amp Platform to leverage platform capabilities example Lakehouse Snowflake etc Strong understanding of
HDFS
ObjectStore
Containers
Kubernetes
Jenkins
Bash / Python scripting
Airflow Job Scheduler
GIT and Github enterprise
working knowledge of Iceberg tables and Parquet file
Trino
Access control principles
Hive Metastores