Site Reliability Engineers

Remote / Greece / Limassol / Nicosia

Engineering – Cloud DevOps /

Full-time /

Hybrid

apply for this job

Site Reliability Engineers - Multiple Openings

The Role:

You will join a team working with Observability, Escalations, Post-mortems, Correction of Errors, and other practices that will contribute to the company's goal of cloud resiliency. You will be responsible for driving processes around reliability, best practices, cultural change, and enforcement of these practices.

The main responsibilities of the position include:

Honor and practice the Resiliency pillar of the Well Architected Framework in all tasks and responsibilities
Conduct Chaos Engineering experiments and relevant exercises to improve resiliency and fault-tolerance
Research workloads for migrating to the cloud with minimal disruption and impact
Monitor cloud migration projects to ensure seamless transitions
Design, consult, re-platform, and re-factor the observability of current cloud infrastructure
Coordinate with other IT departments and teams regarding observability for both individual and organizational needs
Regularly assess cloud deployments for compliance with the company’s standards and best practices
Investigate and correct areas where observability is lagging
Stay up to date and provide training on new and current technologies, services, tools, methodologies, and practices
Occasionally participate in service capacity planning, software performance analysis, and system tuning
Mentor colleagues in technical skills and knowledge
Analyze, oversee, and remediate the company’s resiliency
Participate in on-call support 24/7 based on a rotation schedule

Main requirements:

BSc/MSc degree in Computer Science or related field
5+ years of cloud services experience, with at least 3 years on AWS cloud
3+ years of experience in SRE or a similar role
Experience with monitoring, APM, logging, and notification tools
Familiarity with incident, problem and change management procedures and practices
Advanced knowledge of SRE practices and methods
Understanding and practice of Service Levels
Strong troubleshooting skills and the ability to mentor others
Extensive experience with Kubernetes and related technologies, services, and ecosystem
Advanced knowledge of CI/CD, Infrastructure as Code (IaC) concepts and tools, especially HCL Terraform and AWS CloudFormation
Experience with versioning tools like Git
Strong organizational and documentation skills
Exceptional time management and research abilities
Advanced Linux, networking, and scripting skills

The following will be considered an advantage:

Experience with platforms like Kafka (MSK)
Experience with RDBMSs, particularly Postgres and MySQL
Knowledge of scripting languages such as Python or Go

Benefit from:

Attractive remuneration package and perks
Intellectually stimulating work environment
Continuous personal development and international training opportunities

The Hiring Experience: What Awaits You

Show Your Skills – Online Technical Challenge
Let’s Connect – Intro Chat with Talent Acquisition
Deep Dive – First Interview with Your Future Team
Final Connection – Final Interview

All applications will be treated with strict confidentiality!

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

apply for this job