SRE Manager
Crusoe
This job is no longer accepting applications
See open jobs at Crusoe.See open jobs similar to "SRE Manager" Climate Draft.Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.
Take a look at what we do! - https://www.youtube.com/watch?v=Rlt8k71Quqw
We aim to align the long term interests of the climate with the future of global computing infrastructure. As data centers consume an exponentially growing power footprint to deliver technology to all connected devices, we are inspired by making sure that the energy meeting that demand is sourced in an environmentally responsible fashion. Crusoe co-locates mobile data centers with stranded energy resources, like flare gas and underloaded renewables, to deliver low-cost, carbon-negative distributed computing solutions. Crusoe Cloud is a managed cloud services platform powered by stranded energy that enables climate-friendly innovation in computationally intensive fields including artificial intelligence, graphics rendering and computational biology.
About This Role:
As the SRE Manager in Dublin, you will lead the creation and operation of a 24/7 Site Reliability Engineering team. Your primary goal is to ensure continuous availability and optimal performance of our cloud infrastructure, providing customers with uninterrupted access to their GPUs. You will design and implement advanced alerting and monitoring systems, manage incident response, and drive system improvements. Collaborating with remote teams across time zones, you will prioritize projects and streamline workflows to achieve rapid results. This role offers the opportunity to significantly impact the reliability of our cutting-edge cloud services and drive the success of our team.
A Day in the Life:
As a Site Reliability Engineering Manager at Crusoe Energy Systems, your day is a blend of people management and operational oversight. Your morning starts with one-on-one meetings and team stand-ups, focusing on guidance, support, and aligning daily goals. You'll spend about 40% of your time on team development, strategic planning, and fostering a collaborative environment.
The remaining 60% is dedicated to operational tasks, such as reviewing performance metrics, overseeing incident responses, and driving automation projects. You ensure high SLIs and SLOs while resolving technical issues and optimizing processes. By day's end, you review project progress and plan the next steps, maintaining a high-performing, customer-centric SRE organization.
You Will Thrive In This Role If:
You have at least 3 years of experience with building and managing a 24/7 technical support team in a cloud operations environment.
You have a strong background in Linux, containerization technologies, and Kubernetes. You understand virtualization and cloud computing concepts.
You have worked with Prometheus, Victoria Metrics, exporters, against bare-metal endpoints
You have some experience with Infrastructure as it relates to Data Center Operations.
You’re interested in playing a key role in talent acquisition and retention. This includes diligent performance management and coaching/developing your team according to their individual needs.
You’ve developed training programs for new hires and ongoing professional development opportunities for your team members.
You like the idea of serving as a technical escalation point and ensuring the highest quality of support. You have experience with Implementing quality assurance measures.
You have supported, monitored, and handled Service Level Agreements (SLAs) for a variety of categories that enable an end customer
You have used technologies such as RabbitMQ, Kafka, Temporal, NATs
You can produce solid solutions in Golang or Python
You’re strategic about tracking and reporting KPIs, with a focus on team performance and customer satisfaction. You’ve played a big part in the strategic planning for a team’s growth and scalability.
You like the idea of working with other departments to align on technical escalations, live incidents, customer needs, and feedback.
Leadership & Communication: Demonstrated leadership ability and excellent communication skills.
Problem-Solving & Adaptability: Robust problem-solving skills and adaptability in a fast-paced environment.
Project Management: Experience with project management tools and methodologies.
Must be able to pass a background check
Embody the Company values
Benefits:
Hybrid work schedule
Industry competitive pay
Restricted Stock Units in a fast growing, well-funded technology company
Paid Parental Leave
Cell phone reimbursement
Tuition reimbursement
Subscription to the Calm app
Crusoe Energy is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
This job is no longer accepting applications
See open jobs at Crusoe.See open jobs similar to "SRE Manager" Climate Draft.