+
Login

Enter your email and password to log in if you already have an account on H512.com

Forgot password?
+
Създай своя профил в DEV.BG/Jobs

За да потвърдите, че не сте робот, моля отговорете на въпроса, като попълните празното поле:

65+13 =
+
Forgot password

Enter your email, and we will send you your password

Xebia

GCP HPC DevOps Engineer

ApplySubmit your application

The job listing is published in the following categories

  • Anywhere
  • Report an issue Megaphone icon

Report an issue with the job ad

×

    What is wrong with the job listing?*
    Please describe the problem:
    In order to confirm you are not a robot please fill the answer to the calculation in the field:
    Tech Stack / Requirements

    You will be:

    • leading the migration of on-premises SLURM-based HPC (High-Performance Computing) clusters to Google Cloud Platform,
    • designing, implementing, and managing scalable and secure HPC infrastructure solutions on GCP,
    • optimizing SLURM configurations and workflows to ensure efficient use of cloud resources,
    • managing and optimizing HPC environments, focusing on workload scheduling, job efficiency, and scaling SLURM clusters,
    • automating cluster deployment, configuration, and maintenance tasks using scripting languages (Python, Bash) and automation tools (Ansible, Terraform),
    • integrating HPC software stacks using tools like Spack for dependency management and easy installation of HPC libraries and applications,
    • deploying, managing, and troubleshooting applications using MPI, OpenMP, and other parallel computing frameworks on GCP instances,
    • collaborating with engineering, support teams, and stakeholders to ensure smooth migration and ongoing operation of HPC workloads,
    • providing expert-level support for performance tuning, job scheduling, and cluster resource optimization,
    • staying current with emerging HPC technologies and GCP services to continually improve HPC cluster performance and cost efficiency.

     

    Your profile:

    • 5+ years of experience with HPC (High-Performance Computing) environments, including SLURM workload manager, MPI, and other HPC-related software,
    • extensive hands-on experience managing Linux-based systems, including performance tuning and troubleshooting in an HPC context,
    • proven experience migrating and managing SLURM clusters in cloud environments, preferably GCP,
    • proficiency with automation tools such as Ansible and Terraform for cluster deployment and management,
    • experience with Spack for managing and deploying HPC software stacks,
    • strong scripting skills in Python, Bash, or similar languages for automating cluster operations,
    • in-depth knowledge of GCP services relevant to HPC, such as Compute Engine (GCE), Cloud Storage, and VPC networking,
    • strong problem-solving skills with a focus on optimizing HPC workloads and resource utilization,
    • work from the European Union region and a work permit are required. 

    Nice to have:

    • Google Cloud Professional DevOps Engineer or similar GCP certifications,
    • familiarity with GCP’s HPC-specific offerings, such as Preemptible VMs, HPC VM images, and other cost-optimization strategies,
    • experience with performance profiling and debugging tools for HPC applications,
    • advanced knowledge of HPC data management strategies, including parallel file systems and data transfer tools,
    • understanding of container technologies (e.g., Singularity, Docker) specifically within HPC contexts,
    • experience with Spark or other big data tools in an HPC environment.

    Recruitment Process:

    CV review – HR call – Technical Interview – Client Interview – Decision