qubitsok.com
Cut Noise. Work Quantum.
Europe, United Kingdom, Cambridge
•
Posted 20 days ago
🏢 Quantinuum
Role Type
Role Focus
Seniority
Employer Type
The System Reliability Engineer will maintain and enhance the Quantinuum Nexus cloud platform, ensuring its high performance, reliability, and security for quantum researchers. This role involves expert management of Kubernetes clusters, primarily Amazon EKS, and associated distributed systems infrastructure. Key duties include managing costs, optimizing performance through monitoring tools like Opentelemetry and AWS CloudWatch, and collaborating with development teams to quickly resolve outages and issues.
Key Responsibilities
Manage the architecture, performance, security, and cost efficiency of managed Kubernetes instances like Amazon EKS and the distributed systems built upon them.
Collect logs, traces, and metrics using Opentelemetry and make them available through AWS products such as x-ray and cloudwatch to monitor Nexus performance and reliability.
Use monitoring readings to ensure the Nexus platform meets high standards for performance and reliability, directing team improvements when necessary.
Actively report, monitor, and diagnose the cause of issues and outages when they occur.
Collaborate closely with the development team, providing necessary information to quickly identify and resolve production issues.
Required Skills
Expert knowledge of managed Kubernetes instances such as Amazon EKS.
Experience working with distributed systems.
Proficiency using tools such as Helm, Karpenter, and k9s.
Experience collecting logs, traces, and metrics for distributed systems.
Experience using AWS CloudWatch to locate bugs and performance issues.
Experience improving declarative Infrastructure as Code tools such as Terraform.
Professional experience working with Python.
Nice-to-have Skills
Experience with PostgreSQL.
Experience working in a continuous deployment environment.
Experience with triaging and debugging issues in code.
Familiarity with the OpenTelemetry standard and SDKs.
Technology Tags
The role is centered on managing and maintaining the cloud-based Quantinuum Nexus platform using specific AWS products like EKS and CloudWatch.
The candidate must have professional experience working with Python for debugging and production analysis.
The role explicitly requires experience with infrastructure and observability tools like Helm, Karpenter, k9s, Terraform, and OpenTelemetry.
Expertise in working with managed Kubernetes instances and distributed systems is a core requirement for the position.
The role focuses on collecting logs, traces, and metrics to ensure the platform meets high standards for performance and reliability.
The engineer supports the Quantinuum Nexus cloud platform, which serves as an intermediary layer between researchers and quantum computers.
The role involves maintaining the performance and reliability of the platform where quantum experiments and job executions occur.
Is this your company's listing?
Boost it to the top of search results and reach 497+ newsletter subscribers.