Yamasoft

Site Reliability Engineer (Senior DevOps Engineer)

Submit your application

The job listing is published in the following categories

DevOps 135 Infrastructure 325

Posted 3 weeks ago
Anywhere
Report an issue

Report an issue with the job ad

Tech Stack / Requirements

Who We Are

Yamasoft is a leading software solutions provider specializing in IoT & IIoT technologies powered by AI/ML. With over 25 years of experience in the software industry, our team brings expertise in building high-performing teams and delivering top-notch software solutions. We focus on delivering high-quality software that aligns seamlessly with our customers’ objectives.

Description

We are building a new team around a biotechnological product for infectious disease diagnostics, pharmaceutical discovery, and microbiome analysis. The team will focus on designing and implementing the distributed, cloud-based SaaS bioinformatics solution for both research and clinical diagnostics.

As an experienced Site Reliability Engineer (SRE), you will be responsible for designing, implementing, and maintaining scalable, secure, and efficient cloud infrastructure solutions. This role involves building and managing Infrastructure as Code (IaC) systems using tools like Terraform and Pulumi, ensuring adherence to security best practices, and managing incident mitigation procedures. As a technical expert, you will optimize AWS cloud service usage and ensure system reliability and availability.

Key Responsibilities

1. Infrastructure as Code (IaC):

Design, develop, and maintain Infrastructure as Code solutions using Terraform / Pulumi / CloudFormation.
Automate infrastructure provisioning, scaling, and configuration management.
Establish and enforce IaC standards, ensuring the modularity, reusability, and maintainability of configurations.
Continuously improve IaC pipelines for efficiency, reliability, and auditability.

2. Cloud Architecture Expertise:

Provide technical expertise in designing and using AWS cloud services to optimize performance and cost.
Design highly available and resilient infrastructure solutions across AWS services.
Apply Amazon Well-Architected Framework principles across all architectural decisions.
Implement and optimize cloud resources by FinOps practices to ensure optimal cost efficiency.

3. DevOps & Automation:

Build and maintain CI/CD pipelines to support automated deployment and testing.
Develop automation tools and scripts to reduce manual operational overhead
Collaborate with development teams to implement DevOps best practices
Create and maintain runbooks and operational documentation

4. Monitoring & Observability:

Implement comprehensive monitoring solutions, including Site Performance Monitoring (SPM) and Application Performance Monitoring (APM)
Design and maintain alerting systems, dashboards, and SLI/SLO frameworks
Perform root cause analysis and post-incident reviews to drive continuous improvement
Establish monitoring best practices across development and operations teams

5. Network Configuration:

Configure and maintain secure and efficient networks, including AWS VPCs, subnets, routing, and security groups.
Manage VPN solutions, such as OpenVPN, for secure connectivity.

6. Security and Compliance:

Implement security best practices in AWS configurations, Linux systems, and IaC pipelines.
Conduct regular infrastructure audits for compliance and security risks (GDPR, HIPAA).
Maintain compliance documentation and evidence collection.

7. Reliability Engineering:

Develop, test, and manage Business Continuity Plans (BCP) and Disaster Recovery Plans (DRP).
Conduct reliability testing, stress tests, and performance tests
Define and manage maintenance plans
Lead incident response efforts to minimize downtime and impact on operations.

8. Advantage: Team Coordination is a plus:

Coordinate and mentor the operations team to ensure smooth execution of tasks.
Foster collaboration between cross-functional teams and drive process improvements.

Technologies Scope

1. Infrastructure as Code (IaC):

Tools: Terraform, Pulumi, CloudFormation, and associated AWS SDKs or APIs.
IaC pipelines integration with CI/CD platforms.

2. Cloud Services:

Strong proficiency with AWS services: Active Directory (AD), LDAP, EC2, ASG, ECS, MQ, RDS, S3, VPC, subnets, security groups, etc.

3. Operating Systems:

Linux: OS management, shell scripting, performance tuning, and security hardening.

4. Automation and Containers:

Docker, k8s
Jenkins/Github Actions

5. Networking:

Security, Proxy, Load balancing, OpenVPN etc

6. Observability and Monitoring:

Prometheus, Loki, Grafana, ELK stack, or equivalent tools (such as Datadog or Sentry, Zipkin).

7. Standards and Practices:

Security best practices, performance optimization, and high availability principles.
Knowledge of GDPR/HIPAA and data privacy principles

8. Nice to have Data science, ML pipelines, and tools

Apache Spark/Solr/Iceberg/JupyterHub/etc

Qualifications

5+ years of experience in DevOps, SRE, or a similar role.
Proven expertise in Terraform or Pulumi for Infrastructure as Code, including advanced use of modules and state management.
Strong understanding of AWS cloud services, including automation, scaling, and cost optimization.
Solid Linux administration skills, including performance tuning and shell scripting.
Experience with containerization (Docker) and CI/CD tools (Jenkins).
Strong networking knowledge, including VPN solutions like OpenVPN.
Demonstrated experience with disaster recovery planning, business continuity strategies, and incident management.
Leadership and mentoring experience, with strong communication and collaboration skills, is a plus.

What we offer

25 Days Paid Time Off
Additional Health Insurance
Multisport card
The opportunity to be among the very first team members
Excellent career development opportunities
Attractive remuneration package

If you are interested in this job offer, please send your CV in English.

All CVs will be treated in strict confidentiality. Only shortlisted candidates will be contacted.

Submit your application

Company overview

Yamasoft is a 100% Bulgarian company, which is engaged in opening and operating R&D centers of international companies in Bulgaria, as well as in the development of various client projects. See more information and opinions about Yamasoft

More about the company

All job listings by the company

Register

Please wait…

Site Reliability Engineer (Senior DevOps Engineer)

The job listing is published in the following categories

Report an issue with the job ad

Tech Stack / Requirements