Site Reliability Developer 5- Bozeman

  • Oracle
  • Bozeman, MT, USA
  • Feb 01, 2021
Full time Computer Science Software

Job Description

Oracle is making global investments to deliver the most open, secure, reliable, and transformative cloud platforms and services to our customers. We are reengineering our vast array of software and hardware assets and retooling our sales and engineering practices. With Oracle’s deeply innovative culture; unique track record of delivering the most popular software platforms and unparalleled ability to maintain a pulse on the enterprise customers – we are the only company on the planet that is delivering the most compelling services at every layer of the cloud.

We are a fast-moving technology group with an ambitious mission inside Oracle – to fundamentally invert the man-machine equation for building, managing, and integration Oracle’s cloud services. We are building state-of-the-art capabilities; engineering systems; support tools and automation instrumentation that will fundamentally revolutionize how 20K developers develop services and how these services are managed, monitored, provisioned, integrated, and measured. We are building the platform that will form the foundation of all of the manageability instrumentation for Oracle’s Cloud applications and services.

We are looking for a proven, seasoned engineer with a track record of identifying and delivering systematic improvements that enhance the customer experience.  You will support the organization responsible for driving increasing availability of the IaaS services supporting Oracle’s portfolio of SaaS products. 

About the Role:
This role works as a member of the team, partnering with Technical Product Managers, technical SMEs, and business leaders to deliver a high level of cloud services available for the Oracle SaaS business. This team deep dives into complex process areas across systems to recommend improvements, and promote best practices, to increase system availability across the IaaS fleet.  Using your extensive experience and head for operational excellence, you will partner with teams across the SaaS and IaaS organizations to deliver, and an improved customer experience.

This is a senior-level role with interaction at the executive level.


  • Evaluate architectural models and perform/drive in-depth analysis of systems, data flow processes, and KPIs/metrics about the current state of IaaS systems
  • Develop an understanding of processes, data flows, telemetry, and team approaches to come up with strategies for driving operational improvements
  • Collaborate and/or lead the development and maintenance of metrics and KPIs that speak to the operational health of services that support the SaaS cloud platform
  • Drive new ways to manage, monitor, provision, integrate and measure SaaS services performance on cloud infrastructure
  • Develop automation around quantifying operational activities and analytics instrumentation.
  • Articulate the findings and supporting data with technical and non-technical audiences and Executive leaders
  • Other duties as assigned

Minimum Qualification

  • Bachelor's degree in Computer Science or related technical field or equivalent professional experience
  • 10+ years experience in the field of compute/systems performance, including in large-scale, distributed cloud environments
  • 5+ years of experience supporting the development and/or operations of a cloud environment, or technical background that enables you to understand cloud infrastructure (server/storage/network/cloud orchestration)
  • Demonstrated experience with troubleshooting, root cause analysis, and managing the identification and implementation of identified preventive measures
  • Significant experience with Linux-based systems including kernel-level troubleshooting, large-scale storage systems, and distributed computing environments in a cloud-infrastructure environment
  • Demonstrated experience working directly with business leaders to identify and translate business requirements into technical and functional requirements
  • Ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy
  • Familiar with data visualization best practices.  Experience designing, developing, troubleshooting, and maintaining visualizations and reports
  • Experience with object-oriented and/or functional programming languages, such as Python or Java
  • Experience with SQL/variants and large-scale metric/time-series data processing
  • Requires strong analytical, conceptual, and problem-solving abilities
  • Ability to synthesize complex elements into crisp and robust stories for audiences of variable technical levels
Oracle is an Affirmative Action-Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, protected veterans status, age, or any other characteristic protected by law.

Detailed Description and Job Requirements
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.

Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission-critical stack, with a focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate a clear understanding of automation and orchestration principles. Act as an ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the effect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.

A BS or MS in Computer Science, or equivalent. Identifies and implements complex solutions to knowledge of server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security, and compliance. Experience running large scale customer-facing web services. Identifies and implements complex solutions to understanding of load balancing technologies and experience with development in programming languages, databases and big data stores, and container technologies. Work involves defining and documenting the technical architecture of complex and highly scalable products. A minimum of 8+ years of experience running large scale customer-facing web services.

Oracle is an Affirmative Action-Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, protected veterans status, age, or any other characteristic protected by law.