Platform Engineer - Observability
As an SRE you will work on addressing reliability, availability, scalability, efficiency for your team and the rest of the company.
What you’ll do:
- You will contribute to the architecture, design, standards, and implementation of new and existing observability systems to enhance their reliability, efficiency, and scalability with a high level of autonomy.
- You will manage existing Observability tools and help customers master them.
- You will scale systems and increase their reliability in a diverse environment ensuring the mission critical systems meet 99.95 % SLOs.
- You will participate in an on-call rotation with a sustainable schedule and strong tooling
- You will be an important part of the Incident Management process and will help improve TomTom’s MTTR.
- You will support navigation tooling in the world.
What you’ll need:
-
At least 2 years of experience working in SRE/Platform/Infra organizations with cloud-native infrastructure, including Kubernetes on public cloud platforms and related technologies and tooling for cloud runtime platforms
-
Experience managing a wide scale vendor and/or self-hosted Observability tools responsible for at least several following areas: logs, metrics, traces, black box monitoring and alerting.
-
Proficiency in modern coding/infrastructure related languages such as Python or similar
-
GitOps practices and infrastructure automation, including infrastructure as code. Building tooling around Kubernetes (Helm/Operators/CRDs a plus)
-
Experience with SRE best practices, observability, incident management and SLIs.
-
Experience with Cloud-native architecture, technology, engineering, and operations (CNCF)
-
Understanding of security, privacy, and compliance essentials
-
Strong fundamentals Linux, containers, networking (TCP/IP, DNS, TLS), distributed systems knowledge
General skills and experience:
-
Experience building and operating mission critical systems.
-
Product mindset
-
Excellent communication and collaboration skills
Nice to have:
- CI/CD technologies, including source code management (GitHub), pipelines, secrets, artefacts.
-
Experience with OpenTelemetry.
-
BSc in computer engineering or equivalent desirable
-
Customer support experience
