How a Fortune 500 FinServ Accelerated GCP Dataproc Platform Upgrade by 6 Months

Learn how a Fortune 500 Financial Services leader accelerated data platform upgrade by 6 months with deep data and performance observability and seamless workload validation.

80%‍

workload validation effort reduction

50% ‍

faster platform upgrade

$3M

resource savings

Highlights

Industry: FinTech / Digital Payments

Size: 30,000 employees

HQ: California, United States

Use Case: Platform Upgrade, Platform Modernization, Workload Validation

Data Platform: GCP Dataproc

Table of contents

about the company

A Fortune 500 financial technology company powering digital payments for hundreds of millions of consumers and businesses worldwide. Its data platform processes billions of transactions annually and supports large-scale analytics and machine learning workloads in near real time, critical to fraud detection, customer insights, and the seamless global operation.

Overview

The company operates one of the largest data platforms in the world, running thousands of Spark jobs across hundreds of teams. Over the years, this environment had grown into a massive footprint on GCP Dataproc, complemented by significant operations in BigQuery.

While data application (data engineering) teams were focused on delivering their product roadmap and support business value, the data platform team faced a harsh reality – a significant portion of the platform was still running on older (legacy) Dataproc and Spark versions (e.g., Dataproc 1.x and Spark 2.x), for which Google announced end of support.

This created a major challenge at the enterprise platform level – resulting in a high risk posture due to potential security vulnerabilities, introducing meaningful platform performance limitations, and increasing operational risk.

The Challenge

While the enterprise platform team needed to upgrade the platform holistically, data application teams were effectively stuck on legacy versions – there was no reliable or scalable way to validate upgrades while ensuring both data output correctness and performance parity.

At a foundational level, the platform lacked the necessary monitoring and observability infrastructure to properly understand workload behavior. Teams had no ability to deeply profile data, pipeline execution, or performance characteristics across jobs, making it difficult to establish a baseline or evaluate the impact of any change.

Secondly, there was no infrastructure to safely test workloads before production. Running the same pipeline pre- and post-upgrade on real production data in a staging environment required manual code changes – copying tables, redirecting outputs, and reconnecting inputs. This process was tedious, error-prone, and not scalable, while introducing risk to production systems.

Thirdly, even when teams managed to run two code versions on the same input data, there was no structured way to analyze execution differences or compare outputs across runs at a granular level. There was no visibility into how data or behavior changed at each step of the pipeline, and no way to trace issues back to a specific transformation or line of code.

So when differences did occur, debugging was slow and highly manual. Teams had no clear way to isolate whether issues were caused by engine-level changes, data discrepancies, or logic differences. Root-cause analysis often required deep investigation across multiple systems, significantly increasing time to resolution.

As a result, every upgrade became a high-risk, high-effort effort:

Migrations required heavy manual validation and coordination across teams
Engineering teams were pulled away from roadmap work to support upgrade efforts
Timelines extended significantly due to lack of confidence and repeatability

Without a scalable validation approach, the platform team could not confidently drive the modernization forward. What began as a necessary upgrade evolved into a strategic, enterprise-wide platform modernization effort, initially projected to take over 12 months.

Why definity

definity was introduced as the observability and validation layer for the company’s Spark and Dataproc modernization initiative, enabling teams to deeply monitor workloads and to test and compare them across platform versions in a structured and repeatable way.

At its core, definity provided a foundation of deep, out-of-the-box observability across the entire data platform. Teams gained the ability to monitor and profile behavior across data, pipelines, execution, lineage, performance, and cost – at every step of the workflow, without requiring code changes or custom instrumentation.

With definity, teams were able to take existing production Spark jobs and automatically replay them using real production inputs, while redirecting outputs to a controlled staging environment. This made it possible to run legacy and upgraded versions side-by-side without impacting production systems.

definity provided deep visibility into both executions, including:

Granular comparison of data outputs, at every interim step
Detailed tracking of execution behavior and performance characteristics
Detection of schema changes, data mismatches, and logical differences

Instead of relying on manual validation, teams received automated comparative analysis between versions, including clear compatibility reporting and precise identification of deltas. When differences were detected, definity provided context to help engineers pinpoint the exact stage – down to the transformation level – and resolve issues quickly.

This transformed what was previously a long, manual, fragmented, and high-risk process into a standardized, scalable, and data-driven workflow for platform upgrades and code changes validation.

“ Previously, we had to manually compare output tables to ensure correctness and then manually test performance to ensure parity. It used to take days & weeks. When you own tens of pipelines, this doesn’t scale. With definity, all we have to do is instrument the pipeline, and that's it. This is a huge step.“
Shay, Data Engineering Tech Lead

The Impact

By standardizing upgrade validation on definity, the company was able to fundamentally change how platform migrations were executed.

Platform teams could enable safe, large-scale side-by-side validation across thousands of workloads, removing the need for ad hoc testing and reducing dependency on individual teams. Data engineering groups gained confidence to upgrade without risking silent data regressions, which had previously been a major blocker.

Debugging and validation cycles were significantly reduced, allowing teams to identify and resolve issues quickly without prolonged investigation. This also simplified cross-team coordination, as validation became a shared, repeatable process rather than a fragmented effort.

Key business results included:

80% reduction in required engineering effort for validation and debugging of workloads
Overall 50% acceleration in Spark and Dataproc modernization program upgrade – delivered 6 months faster
Estimated $3.1M in infrastructure and engineering resource savings
Successful de-risking of a critical enterprise modernization initiative

By enabling reliable, repeatable validation at scale, definity allowed the company to migrate off unsupported infrastructure significantly faster – without compromising trust in data correctness or pipeline performance.

“ definity helped us complete our platform upgrade 50% faster. Workload validation could not have been easier ”
Dan, Data Engineering Manager

Looking forward

With a standardized validation framework in place, the company is now evolving its data platform to support agentic upgrades and validation of any pipeline code change – enabling faster and safer data delivery at scale.

Built on definity’s seamless validation foundation, deep runtime context (MCP), and intelligent code recommendations, this new model allows teams to automatically validate changes, compare outcomes, and deploy with confidence.

In parallel, the platform is extending this approach to continuous performance and cost optimization, using the same automated recommendations with built-in validation before rollout.

“ The future of our platform is AI Agents that seamlessly upgrade pipelines, validate code changes, and optimize code. definity sits at its core it with its runtime MCP, pipeline control, and auto-validation“
‍Prasanna, Sr Manager, Data Platform

Ready to shift to proactive observability?

Easily optimize jobs, prevent incidents in real-time, and troubleshoot issues.
No code changes. Secure in your environment.

Book a Demo

Try definity Now