+
Login

Enter your email and password to log in if you already have an account on H512.com

Forgot password?
+
Създай своя профил в DEV.BG/Jobs

За да потвърдите, че не сте робот, моля отговорете на въпроса, като попълните празното поле:

113+58 =
+
Forgot password

Enter your email, and we will send you your password

Tradu

Site Reliability Engineer

ApplySubmit your application

The job listing is published in the following categories

  • Anywhere
  • Report an issue Megaphone icon

Report an issue with the job ad

×

    What is wrong with the job listing?*
    Please describe the problem:
    In order to confirm you are not a robot please fill the answer to the calculation in the field:
    Tech Stack / Requirements

    We are seeking an experienced Site Reliability Engineer (SRE) to join our technology team. The SRE will be responsible for ensuring the reliability, scalability, and performance of our systems and services, with a strong focus on AWS, automation, and infrastructure as code. This role blends software engineering with systems engineering, driving resilience and efficiency across our production platforms.

    Primary responsibilities (not limited to)

    • Design, build, and maintain reliable, scalable, and performant systems across AWS-based and on-premises environments, with a cloud-first approach.
    • Implement monitoring, alerting, and observability solutions to ensure visibility into system health and application performance.
    • Automate operational tasks, deployments, and configuration management to reduce manual intervention and improve efficiency.
    • Participate in incident response and postmortem processes, driving improvements to system reliability and reducing mean time to recovery (MTTR).
    • Collaborate with development teams to embed reliability, performance, and scalability into the software development lifecycle.
    • Manage capacity planning, performance tuning, and cost optimization within AWS.
    • Ensure security, compliance, and audit requirements are met in all infrastructure and operational practices.

    Requirements

    • 5+ years of hands-on experience in Site Reliability Engineering, DevOps, or related roles.
    • Strong background in Linux systems administration.
    • Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.).
    • Deep experience with AWS services (EC2, ECS/EKS, RDS, S3, IAM, networking, etc.).
    • Proven expertise with tools like Puppet, Chef, Ansible for configuration management and
    • Terraform for infrastructure as code.
    • Strong knowledge of CI/CD pipelines and deployment automation (Jenkins, GitLab, or similar).
    • Hands-on experience with monitoring/observability tools (Prometheus, Grafana, ELK, Datadog, etc.).
    • Solid understanding of networking, load balancing, and DNS fundamentals.
    • Excellent problem-solving skills and ability to work effectively under pressure during incidents.

    Preferred Skills

    • Experience with Kubernetes or other container orchestration systems.
    • Knowledge of service-level objectives (SLOs), SLIs, and error budgeting.
    • Background in financial systems or other mission-critical, high-availability environments.

    Working Hours: 40/week, Monday–Friday. Hybrid: 3 days in-office.

    Please submit your CV in English. Only shortlisted candidates will be contacted for an interview.

    All Stratos Support EAD employees must be eligible to work in Bulgaria.

    Company Description

    Tradu is a new multi-asset global trading platform and is part of the Stratos group of companies. Tradu, built by traders for traders, provides the most sophisticated traders with a serious platform that allows them to move easily between asset classes such as stocks, CFDs and crypto, depending on the regulations that govern the trader’s market.

    Equal Opportunity Employer