AI Quality Analyst

Spain / Estonia / Greece / Poland / Portugal / United Kingdom / Cyprus

Operations – Operations /

Full Time /

Remote

About Finom

Finom is a European tech startup headquartered in Amsterdam, and we’re on a journey towards revolutionizing the financial landscape for entrepreneurs worldwide. Our mission is to develop an all-in-one financial B2B solution that integrates banking functions, accounting, financial management, and invoicing into a seamless, mobile-first platform.

We recently closed a €115 million Series C equity round (around $133 million), bringing our total funding to approximately $346 million. This significant investment follows a $105 million growth funding round from General Catalyst, a long-term backer since 2021 known for supporting companies like Airbnb, HubSpot, KAYAK, and Stripe.

Finom's platform goes beyond traditional banking, offering invoicing and a growing suite of features, including AI-enabled accounting, aiming to simplify financial management for entrepreneurs. We're actively expanding our reach across key EU markets like Germany, France, the Netherlands, Italy, and Spain.

At Finom, we’re not just redefining the entrepreneurial experience — we’re empowering our employees to make a real difference. Your work matters, and your impact extends far beyond product metrics. We nurture innovation and an inspiring work environment where bold ideas thrive, prioritizing thorough research, swift implementation of solutions, and ensuring that every effort we make benefits our users, employees, partners, and our business as a whole.

Maintaining our start-up spirit, we prioritize thorough research, swift implementation of solutions, and ensuring that every effort we make benefits our users, employees, partners, and, of course, our business.

In this highly cross-functional role, you will be the gatekeeper of AI safety, performance, and deterministic behavior across non-deterministic multi-agent systems. You will not be filling out generic manual testing spreadsheets or operating in a vacuum

What You Will Be Doing

Architect Automated Evaluation Frameworks: Design, implement, and maintain scalable evaluation pipelines (Evals) for LLMs and agent graphs using modern tooling like LangSmith, DeepEval, Ragas, or Opik.
Curate Ground-Truth Benchmarks: Collaborate with domain experts to build, version, and sanitize robust gold-standard datasets, synthetic evaluation profiles, and edge-case testing matrices reflecting real-world business scenarios.
Own Non-Deterministic Quality Tracking: Define, monitor, and enforce quality KPIs across multi-agent workflows—specifically focusing on tool-calling accuracy, intent-recognition safety, structured output formatting, and context-retrieval (RAG) precision.
Mitigate and Quantify Systemic Risk: Lead rigorous failure and hallucination analyses on production outputs. Implement structured LLM-as-Judge patterns, validation metrics, and guardrail heuristics while actively ensuring the judge profiles remain free of baseline evaluation bias.
Enforce CI/CD Evaluation Gates: Partner directly with MLOps and Backend Engineering teams to integrate automated testing gates into our deployment pipelines, proactively preventing regressions or behavioral drifts from reaching production runtime environments.
Drive Optimization for Latency & Cost: Regularly analyze the efficiency of prompt templates, few-shot structures, and model selections (e.g., GPT, Claude, LLaMA) to ensure a highly calibrated balance between execution throughput, sub-second latency, and platform compute costs.

Who You Are

A Data-Savvy Automation Advocate: You possess strong software engineering fundamentals and concrete Python coding experience, allowing you to seamlessly script custom evaluation routines and query multi-tenant databases.
An Analytical Thinker with an AI Lens: You understand that testing non-deterministic LLMs requires a completely different mindset than traditional QA. You possess deep intuition for token behaviors, retrieval dynamics, prompt engineering nuances, and failure states.
Radically Autonomous & Collaborative: You do not wait around for static technical specifications. You independently coordinate syncs with AI leads, domain backend engineers, and product stakeholders to identify and patch system vulnerabilities.
Rigorously Quality-Oriented: You hold a low ego but maintain high standards for system stability. You are deeply passionate about separating market hype from practical, measurable production metrics.

What You Will Get In Return

Make a genuine impact on the product

Join our upward trajectory, and grow with us. We provide the resources and opportunities for continuous personal and professional development, empowering you to make a genuine impact on our evolving product.

Work in the EU

Embark on this exciting journey with us and enjoy the flexibility of traveling and working remotely or in a hybrid model across Europe.

Become a stock options holder

Unlock your inner entrepreneur and align your aspirations with ours through our Stock Options Program. This exciting opportunity is available to every team member, from junior team members to our founders.

Receive unwavering support and care

Finom stands by you at every step, embodying our commitment to your well-being and success reflected in our modern, friendly, and eco-conscious corporate culture. We offer constant support and care to ensure your Finom experience is successful and fulfilling.

Work & Swim program

Immerse yourself in our exclusive Work & Swim Program. Spend one month in a comfortable corporate apartment in enchanting Cyprus. It's the ideal opportunity to strike the perfect work-life balance while enjoying breathtaking Mediterranean views.

Equal Opportunity Statement

At Finom, we're an equal opportunity employer and value diversity at our company. We embrace diversity and invite applications from all walks of life. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, disability status, or other applicable legally protected characteristics.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

apply for this job