JobHire
face icon
Register to automatically apply for this and similar jobs
Register
star

Senior Site Reliability Engineer

Casper Labs

N/A


Job Details

Not Specified


Full Job Description

The Senior Site Reliability Engineer will be responsible for ensuring the reliability, performance, and scalability of our blockchain platform. The ideal candidate will have extensive experience with Kubernetes, CI/CD, Terraform, and public cloud providers (AWS & IBM). This role involves collaborating with engineering teams, implementing robust infrastructure solutions, and driving continuous improvement in our operations.

Responsibilities:

  • Guidance and Mentorship: Provide technical guidance and mentorship to engineers, fostering a culture of learning and collaboration.
  • Decision Making: Assist stakeholders in making informed technical decisions that align with best practices and business goals.
  • Knowledge Sharing: Actively share knowledge and expertise with team members to enhance overall team capability.
  • Kubernetes: Manage and optimize Kubernetes clusters to ensure high availability, performance, and scalability.
  • CI/CD Pipelines: Design, implement, and maintain continuous integration and continuous deployment pipelines to streamline development and deployment processes.
  • Terraform: Utilize Terraform for infrastructure as code, ensuring consistent and repeatable infrastructure deployments.
  • AWS & IBM: Leverage public cloud services from AWS and IBM to build and maintain scalable and resilient infrastructure solutions.
  • Monitoring and Optimization: Implement monitoring and alerting systems to proactively manage and optimize cloud infrastructure.
  • Incident Management: Lead incident response efforts to quickly diagnose and resolve reliability and performance issues.
  • Continuous Improvement: Identify areas for improvement in infrastructure and operations, implementing solutions to enhance reliability and efficiency.
  • Security: Ensure infrastructure security best practices are followed and proactively address potential vulnerabilities.
  • Cross-functional Collaboration: Work closely with development, product, and operations teams to ensure alignment and effective communication.
  • Stakeholder Engagement: Engage with stakeholders to understand their needs and translate them into technical requirements and solutions.
  • Documentation: Maintain comprehensive documentation of infrastructure, processes, and procedures to ensure knowledge transfer and operational continuity.

Requirements

  • Experience: 5-15 years of experience in site reliability engineering, DevOps, or a related field.
  • Technical Skills: Proficiency in Kubernetes, CI/CD, Terraform, and public cloud providers (AWS & IBM).
  • Soft Skills: Strong communication skills, with a low-ego, approachable demeanor.
  • Problem-Solving: Excellent problem-solving skills and the ability to work independently as a self-starter.
  • Education: Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

Benefits

  • Fully remote, work from home environment
  • Flexible working hours
  • Paid Time-Off
  • Periodic in-person offsites globally (travel permitting)
  • Long-term incentive programs
  • Continued education support
  • Advancement opportunity

Get 10x more interviews and get hired faster.

JobHire.AI is the first-ever AI-powered job search automation platformthat finds and applies to relevant job openings until you're hired.

Registration