Case Study 04
Replacing a Legacy ETL Tool with AWS: A SaaS Migration Case Study
Flexible office marketplace SaaS · ~150 employees · Fast-growing, investment-backed · Full Pentaho to AWS migration
Results at a Glance
6 months
to fully replace Pentaho with a modern AWS stack
4 sources integrated
PostgreSQL, Pipefy, HubSpot, Segment.io
30+ pipelines
running in Airflow
The Challenge: A Legacy Tool Holding Back a Growing Data Team
The startup had closed an investment round and was scaling fast. With two data analysts on the team and five more being hired, the ambition was clear: get the whole company following KPIs and making decisions from data. Marketing, sales, product, operations: everyone needed access to reliable, centralised metrics. Pentaho was standing in the way. It was built for a different era: on-premise, rigid, expensive to maintain, and limited to the database. It could not pull from the range of sources the team needed, and it had no room for anything more advanced. It needed to go. The goal was to replace it entirely with a modern cloud stack, without blowing the budget, and without making it so complex that only a senior engineer could keep it running.
How We Replaced Pentaho with an AWS Data Pipeline Infrastructure
We rebuilt the entire data infrastructure from scratch on AWS, removing Pentaho completely and replacing it with a stack built around Airflow for scheduling, AWS Lambda for data processing, and AWS S3 for storage of partitioned Parquet files.
The first step was integrating all the sources the business actually ran on: PostgreSQL for the product, Pipefy for the sales funnel, HubSpot for marketing, and Segment.io for product analytics and user behaviour. Each source had its own shape, its own quirks, and its own data quality challenges. We worked through each one, modelled the data properly, and made it available in a consistent, queryable format.
A key part of the architecture was making it straightforward to add new sources going forward. The pipeline structure was designed to be reusable, so onboarding a new integration did not mean starting from scratch every time.
Beyond the infrastructure itself, we worked closely with the analytics team to help them understand each source: what the data points meant, where they came from, how to interpret them correctly. We also built calculated fields to give the 7 analysts the tools they needed for deeper analysis, and were heavily involved in data collection decisions alongside the development team.
The angle throughout was the same: complex, multi-source analytics on a stack that did not require an enterprise budget to build or maintain. The Segment.io pipeline, the first one built on this new foundation, is covered in its own case study here.
Results
- Pentaho fully replaced with a modern AWS stack in 6 months
- 4 sources integrated into a single, consistent infrastructure: PostgreSQL, Pipefy, HubSpot, Segment.io
- 30+ Airflow pipelines running reliably across all sources
- 7 analysts supported by infrastructure that did not exist when the engagement started
- Reusable pipeline structure makes adding new sources straightforward without rebuilding from scratch
- Analytics team educated on every source so they could drive real value from the data
- Complex, multi-source analytics delivered without enterprise tooling costs
What's this costing your company?
Run our 2-minute calculator and find out where your company stands.
Working with a similar challenge?
Book a free 1-hour audit call and we'll tell you exactly what we'd build and why.
Book Your Free Audit Call →