Data Engineer is responsible for developing Life Sciences content curation and delivery system for the purpose of building life sciences databases that empower CAS proprietary life sciences search technologies. This role encompasses developing and deploying scientific software solutions in the life sciences information space to support transformational initiatives, delivering both short and long-term results to the business.
Develops data transformation and integration pipelines and infrastructure foundations of life sciences content in support of scientific databases and data curation. Combines strong software development and data engineering skills with a working knowledge of basic biology/chemistry/physics to develop sophisticated informatics solutions that drive efficiencies in content curation and workflow process. Applies data transformation and other data-engineering software development capabilities to contribute to the building of new scientific information management systems supporting scientific database building activities. Lead Data Engineer
Education/Experience
4-year degree in computer science, engineering, informatics, or equivalent experience
Minimum of 9 years of software development experience ( Proficiency in Python)
Competencies/Technologies
Proficiency in Python
Proficiency in Linux/Unix environments
Experience building applications for public cloud environments (AWS preferred)
Experience with data engineering tools and techniques is highly desired
Experience with AWS DevOps tools (git, Cloud Development Kit, CDK Pipeline) is highly desired
Experience building applications using AWS Serverless technologies such as Lambda, SQS, Fargate, S3 is highly desired
Experience working with XML and XPath is highly desired
Experience building containerized applications (Docker, Kubernetes) is a plus