+
Login

Enter your email and password to log in if you already have an account on H512.com

Forgot password?
+
Създай своя профил в DEV.BG/Jobs

За да потвърдите, че не сте робот, моля отговорете на въпроса, като попълните празното поле:

79+8 =
+
Forgot password

Enter your email, and we will send you your password

Yamasoft

Site Reliability Engineer (Senior DevOps Engineer)

ApplySubmit your application

The job listing is published in the following categories

  • Anywhere
  • Report an issue Megaphone icon

Report an issue with the job ad

×

    What is wrong with the job listing?*
    Please describe the problem:
    In order to confirm you are not a robot please fill the answer to the calculation in the field:
    Tech Stack / Requirements

    Who We Are

    Yamasoft is a leading software solutions provider specializing in IoT & IIoT technologies powered by AI/ML. With over 25 years of experience in the software industry, our team brings expertise in building high-performing teams and delivering top-notch software solutions. We focus on delivering high-quality software that aligns seamlessly with our customers’ objectives.

    Description

    We are building a new team around a biotechnological product for infectious disease diagnostics, pharmaceutical discovery, and microbiome analysis. The team will focus on designing and implementing the distributed, cloud-based SaaS bioinformatics solution for both research and clinical diagnostics.

    As an experienced Site Reliability Engineer (SRE), you will be responsible for designing, implementing, and maintaining scalable, secure, and efficient cloud infrastructure solutions. This role involves building and managing Infrastructure as Code (IaC) systems using tools like Terraform and Pulumi, ensuring adherence to security best practices, and managing incident mitigation procedures. As a technical expert, you will optimize AWS cloud service usage and ensure system reliability and availability.

     

    Key Responsibilities

    1. Infrastructure as Code (IaC):

    • Design, develop, and maintain Infrastructure as Code solutions using Terraform / Pulumi / CloudFormation.
    • Automate infrastructure provisioning, scaling, and configuration management.
    • Establish and enforce IaC standards, ensuring the modularity, reusability, and maintainability of configurations.
    • Continuously improve IaC pipelines for efficiency, reliability, and auditability.

    2. Cloud Architecture Expertise:

    • Provide technical expertise in designing and using AWS cloud services to optimize performance and cost.
    • Design highly available and resilient infrastructure solutions across AWS services.
    • Apply Amazon Well-Architected Framework principles across all architectural decisions.
    • Implement and optimize cloud resources by FinOps practices to ensure optimal cost efficiency.

    3. DevOps & Automation:

    • Build and maintain CI/CD pipelines to support automated deployment and testing.
    • Develop automation tools and scripts to reduce manual operational overhead
    • Collaborate with development teams to implement DevOps best practices
    • Create and maintain runbooks and operational documentation

    4. Monitoring & Observability:

    • Implement comprehensive monitoring solutions, including Site Performance Monitoring (SPM) and Application Performance Monitoring (APM)
    • Design and maintain alerting systems, dashboards, and SLI/SLO frameworks
    • Perform root cause analysis and post-incident reviews to drive continuous improvement
    • Establish monitoring best practices across development and operations teams

    5. Network Configuration:

    • Configure and maintain secure and efficient networks, including AWS VPCs, subnets, routing, and security groups.
    • Manage VPN solutions, such as OpenVPN, for secure connectivity.

    6. Security and Compliance:

    • Implement security best practices in AWS configurations, Linux systems, and IaC pipelines.
    • Conduct regular infrastructure audits for compliance and security risks (GDPR, HIPAA).
    • Maintain compliance documentation and evidence collection.

    7. Reliability Engineering:

    • Develop, test, and manage Business Continuity Plans (BCP) and Disaster Recovery Plans (DRP).
    • Conduct reliability testing, stress tests, and performance tests
    • Define and manage maintenance plans
    • Lead incident response efforts to minimize downtime and impact on operations.

    8. Advantage: Team Coordination is a plus:

    • Coordinate and mentor the operations team to ensure smooth execution of tasks.
    • Foster collaboration between cross-functional teams and drive process improvements.

     

    Technologies Scope

    1. Infrastructure as Code (IaC):

    • Tools: Terraform, Pulumi, CloudFormation, and associated AWS SDKs or APIs.
    • IaC pipelines integration with CI/CD platforms.

    2. Cloud Services:

    • Strong proficiency with AWS services: Active Directory (AD), LDAP, EC2, ASG, ECS, MQ, RDS, S3, VPC, subnets, security groups, etc.

    3. Operating Systems:

    • Linux: OS management, shell scripting, performance tuning, and security hardening.

    4. Automation and Containers:

    • Docker, k8s
    • Jenkins/Github Actions

    5. Networking:

    • Security, Proxy, Load balancing, OpenVPN etc

    6. Observability and Monitoring:

    • Prometheus, Loki, Grafana, ELK stack, or equivalent tools (such as Datadog or Sentry, Zipkin).

    7. Standards and Practices:

    • Security best practices, performance optimization, and high availability principles.
    • Knowledge of GDPR/HIPAA and data privacy principles

    8. Nice to have Data science, ML pipelines, and tools

    • Apache Spark/Solr/Iceberg/JupyterHub/etc

     

    Qualifications

    • 5+ years of experience in DevOps, SRE, or a similar role.
    • Proven expertise in Terraform or Pulumi for Infrastructure as Code, including advanced use of modules and state management.
    • Strong understanding of AWS cloud services, including automation, scaling, and cost optimization.
    • Solid Linux administration skills, including performance tuning and shell scripting.
    • Experience with containerization (Docker) and CI/CD tools (Jenkins).
    • Strong networking knowledge, including VPN solutions like OpenVPN.
    • Demonstrated experience with disaster recovery planning, business continuity strategies, and incident management.
    • Leadership and mentoring experience, with strong communication and collaboration skills, is a plus.

     

    What we offer

    • 25 Days Paid Time Off
    • Additional Health Insurance
    • Multisport card
    • The opportunity to be among the very first team members
    • Excellent career development opportunities
    • Attractive remuneration package

     

    If you are interested in this job offer, please send your CV in English.

    All CVs will be treated in strict confidentiality. Only shortlisted candidates will be contacted.