Our mission is to help people see data in new ways, discover insights, and unlock endless possibilities. For more than four decades, we’ve delivered innovations that have helped build entire industries. The following facts and figures highlight some of the many ways we continue to deliver innovations for our customers, partners, and communities.
The Communications SRE/Ops team is an integral part of Oracle s Communications Global Industry Unit which provides large-scale Communications Cloud solutions, such as Enterprise Communications Platform (ECP) - Realtime Communications (RTC) and Internet-of-Things (IOT) services for industry-vertical cloud services. The Communications SRE/Opsteam provides the technical and operational support link between Customer Service teams, DevOps/Platform engineering teams, and SaaS/PaaS development teams. The SRE/Ops team is responsible for the overall health, performance, availability, reliability, and deployment of the Communications cloud services operating in the Oracle Cloud (PaaS/IaaS) environment. The team s mission, goals, and objectives are based on advocating and implementing Agile, DevOps, DevSecOps, Site Reliability Engineering, and Automation Philosophies and Best Practices to achieve Excellence in delivering and operating large-scale communications cloud cervices.
Position Description: About the Job
The Communications DevOps Engineer position provides 24x7x365 coverage, requires working in rotational shifts, including on weekends and public holidays. The primary responsibility is the daily operations and maintenance of the critical Enterprise Communications Platforms (ECP) and Communications SaaS services deployed in Oracle Cloud Infrastructure and Database environments (PaaS/IaaS). Participate in the design, development, and implementation of operational, CI/CD, monitoring/observability, and cloud infrastructure services and capabilities. The team members continually exercise and grow diverse skills, thrive in an agile development environment, and given the autonomy and support to deliver the highest quality operational services possible to customers.
Roles and Responsibilities: What Youll Do
Perform day-day operational activities to support large-scale production communications cloud services and Enterprise Communications Platform (ECP) offerings adherence to Service Level Agreement and Objective (SLA/SLO) for Reliability, and Availability requirements and organizational Operational Level Agreements (OLA).
Perform duties as defined by Incident, Change, Service Request, and Problem Management processes.
Participate with the investigation, documentation, and resolution of issues affecting the cloud services and customers.
Using a data-driven process/mindset, author technical content to support the incident response process (e.g., postmortem/root cause analysis) and develop interim solution to prevent or quickly resolve current and future issues.
When not performing 24/7 operations duties, as a member of an Agile development team, you will be working on software engineering tasks such as design and development of services or capabilities that increase reliability, scalability, and reduce operational overhead through automation and orchestration principles.
Interact with colleagues on technical and Non-functional topics related to operations and support of cloud services.
Master technical and functional areas for the communications cloud services and Enterprise Communications Platform (ECP).
Constant assignments to learn cutting edge Communications & Cloud computing technologies, tools, and services.
Required Qualifications:
24/7/365 On-Call Shift Rotation, including on weekends and public holidays.
Required Weekday Shift assignment: Evening Shift
Occasional Travel
Career Level - IC3
Desired Qualifications: What the Perfect Candidate Will Have
Strong experience with programming in a high-level language such as Python, Java, Golang, BASH scripting, JavaScript, etc.
Strong experience with telemetry, observability, and analytics tool/applications, (e.g., ELK, Kibana, Grafana, Prometheus, Splunk).
Strong experience with Microservice architecture, containerization technologies (e.g., Docker), and Cloud Native principles.
Strong experience with Kafka, RabbitMQ, APIs, REST, JSON, XML or other common standards, data structures and protocols.
Strong experience with continuous integration, delivery, deployment (CI/CD) tools and Pipeline development (e.g., GitLab-CI, Jenkins, CircleCI)
Strong experience with Agile, DevOps, and DevSecOps methodologies and practices.
Strong experience with ITSM/ITIL4 and Site Reliability Engineering practices
Strong experience with web/cloud-based software version control tools such as GitLab, GitHub, Git, BitBucket, Artifactory
Strong experience with cloud orchestration tools (e.g., Kubernetes, Chef, Ansible, Puppet, Terraform, and Docker).
Experience with IOT industry technologies (MQTT, CoAP, AMQP, DDS, WebSocket, Analytics, Digital Twin, Device Mgmt., services, etc.)
Experience with networking technologies (SDN, routing, switching, IP addressing, DNS, Load balancers, etc.)
Experience with Relational Databases, SQL, Database Management tools, and Cloud Store technologies (File, Block, Object).
Experience in QA, Technical problem solving, and the ability to breakdown complex problems across multiple domains.
Experience with operations/support processes and tools such as: Incident Management, Change Management, Problem Management, Ticketing Systems (JIRA, Service Desk), operational metrics/KPIs, Escalation processes.
Experience with major cloud platform(s): Oracle Cloud Infrastructure (OCI), Azure, Google Cloud (GCP), AWS, certification a plus.
Excellent communications skills - verbal and written English
Motivated to learn multiple, cutting-edge technologies in the cloud industry
4-year Degree with a technical major or equivalent experience (e.g., Computer Science, Systems Engineering, Engineering, IT, etc.)