High Performance Computing Manager
First Street Foundation
This job is no longer accepting applications
See open jobs at First Street Foundation.See open jobs similar to "High Performance Computing Manager" Climate Draft.New York, NY, USA
Posted on Sep 22, 2024
Who we are: First Street is the industry standard for physical climate risk data. We use transparent and peer-reviewed methodologies to calculate the past, present, and future climate risk for every property in the world. We started eight years ago by working with the world’s leading climate scientists to create groundbreaking, climate-adjusted, property specific models and haven’t stopped.
Our mission: We exist to connect climate change to financial risk
Our data: We create physics-based, deterministic models of flooding, wildfire and hurricanes, and advanced statistical models of extreme heat, air quality, drought, hail, severe convective storms, winter storms, and more. All of this data is used to create property-level financial risk metrics and macroeconomic variables to quantify the impacts of climate change.
Our customers: We empower governments at the highest levels to make smart regulations, businesses to avoid bad investments, and everyday Americans to understand their personal risk from climate change. We are relied on every day by:
Come join us and use your talents to create solutions to address humanity's biggest problem.
Position Overview: The High Performance Computing Manager will be responsible for the administration and optimization of research and development as well as production activities on our on-premises Linux cluster, and managing computational workloads across various platforms, including AWS and other cloud services. This role will involve maintaining the linux-based compute environment, installing and maintaining compute libraries and software packages, utilizing Docker and related technologies, deploying and managing compute jobs using Slurm, developing and maintaining scripts in bash and python, and ensuring efficient operation of our GitHub repositories for collaborative development.
Key Responsibilities:
Our mission: We exist to connect climate change to financial risk
Our data: We create physics-based, deterministic models of flooding, wildfire and hurricanes, and advanced statistical models of extreme heat, air quality, drought, hail, severe convective storms, winter storms, and more. All of this data is used to create property-level financial risk metrics and macroeconomic variables to quantify the impacts of climate change.
Our customers: We empower governments at the highest levels to make smart regulations, businesses to avoid bad investments, and everyday Americans to understand their personal risk from climate change. We are relied on every day by:
- Agencies ranging from the U.S. Department of Treasury to Fannie Mae
- The world's biggest banks such as Bank of America and Wells Fargo
- Institutional investors like Nuveen and Blackstone
- Millions of users on Redfin, Realtor.com, Homes.com, and more
Come join us and use your talents to create solutions to address humanity's biggest problem.
Position Overview: The High Performance Computing Manager will be responsible for the administration and optimization of research and development as well as production activities on our on-premises Linux cluster, and managing computational workloads across various platforms, including AWS and other cloud services. This role will involve maintaining the linux-based compute environment, installing and maintaining compute libraries and software packages, utilizing Docker and related technologies, deploying and managing compute jobs using Slurm, developing and maintaining scripts in bash and python, and ensuring efficient operation of our GitHub repositories for collaborative development.
Key Responsibilities:
- Cluster Administration: Administer and maintain an on-premises Linux cluster running Ubuntu, including system updates, performance tuning, and troubleshooting.
- Cloud Compute Management: Deploy, manage, and optimize compute jobs on AWS and other cloud platforms, ensuring seamless integration with existing workflows.
- Job Management: Utilize Slurm for job scheduling and resource management, optimizing job queues and ensuring efficient use of computational resources.
- Scripting and Automation: Develop and maintain bash and python scripts to automate tasks, streamline workflows, and enhance computational efficiency.
- Repository Maintenance: Oversee and manage GitHub repositories, including version control, branching strategies, and collaborative code development.
- Collaboration: Work closely with scientists, researchers and developers to understand computational needs, provide technical support, and ensure that computational resources align with project requirements.
- Documentation: Maintain comprehensive documentation for system configurations, processes, and best practices.
- Education: Bachelor’s degree in Computer Science, Environmental Sciences, Applied Mathematics, or a related field. Advanced degrees or relevant certifications are a plus.
- Experience: Proven experience managing Linux clusters and commercial cloud computing platforms. Hands-on experience with Slurm job scheduling, bash, and python scripting is essential.
- Technical Skills:
- Proficiency in administering Linux-based systems, specifically Ubuntu.
- Experience with cloud computing platforms such as AWS, Azure, and/or Google Cloud.
- Strong knowledge of Slurm for job scheduling and resource management.
- Proficiency in linux utilities, bash and python scripting for automation and workflow optimization.
- Experience managing GitHub repositories, including version control and collaboration tools.
- Soft Skills:
- Strong problem-solving skills and attention to detail.
- Excellent communication skills and the ability to work collaboratively with interdisciplinary teams.
- Ability to manage multiple tasks and projects simultaneously in a dynamic environment.
- Nice to have skills:
- Experience with massively parallel, cloud-based High Performance Computing
- Knowledge of very large volume datasets and HDF/netCDF, Zarr, Xarray, and similar technologies
- Experience with running large physics-based models, including weather forecasting (e.g. WRF) and hydrology (e.g. HEC-RAS) applications.
- Drive: We are driven by the role we play in connecting climate change to financial risk
- Impact: We only focus on things that move the needle
- Urgency: We move quickly because the world depends on it
- Resilience: We have a growth mindset in all that we do
- Competitive salary commensurate with experience
- Ownership interest in the company via Employee Stock Option Plan
- Hybrid Schedule with in-office work days on Monday, Wednesday and Thursday
- 15 vacation days along with 13 company holidays and 10 sick days
- Health benefits covered at 100% for employee or a significant contribution for family plans
- Vision and dental benefits with partial employee contribution
- 12 weeks of paid parental leave
- Access to One Medical, Teledoc, HealthAdvocate, Kindbody, and Talkspace
- Company 401k program
- Commuter benefits
- Life Insurance
- Tech startup environment
- Weekly team meals and an office stocked with coffee and snacks
- Working on the world’s biggest issue with other passionate professionals
This job is no longer accepting applications
See open jobs at First Street Foundation.See open jobs similar to "High Performance Computing Manager" Climate Draft.