Scaling Data Solutions the Right Way
Disclaimer: The views and opinions expressed in this article are those of the author and may not reflect the perspectives of IIBA.
Highlights
-
Scaling is not doing the same work in more countries — it is building processes strong enough to survive the volume.
-
At scale, undocumented decisions become risks.
-
Speed without structure is just organised chaos.
At one point in our project, we had over 50 source-to-target mapping documents — one for each data source across multiple countries. Each document described field mappings, transformation rules, and ETL logic. Maintaining them by hand in Excel was becoming impossible. Things got out of sync. Mistakes appeared. The team was spending more time managing documents than doing analysis.
That moment forced us to change how we worked. We moved from Excel-based mappings to YAML format and built a small framework in Python and Jinja to generate documentation automatically from templates. By moving to a 'Docs-as-Code' approach, we ensured that a change in a global business rule was automatically reflected across all 50+ specifications, eliminating the risk of requirements drift. Creating a source-to-target mapping document dropped from roughly five days to two. When transformation logic changed, we updated one file and regenerated everything automatically instead of editing each document by hand.
Scaling is not doing the same thing in more countries. It is doing everything at a volume where informal processes break down and only structured ones survive.
I went through this rolling out a data solution for a global cosmetics company across multiple countries over several years. Here is what we learned.
1. Define Scope Before You Start — Properly
Before any meeting is scheduled or pipeline touched, you need a clear picture of what you are scaling into: what systems each country uses, and how their business actually works.
The first part is a quick check — does this country use the same system for processing transactions as the MVP country, or something different? Are there additional systems to include? Not a deep technical review — just enough to know whether you are working with something familiar or something new.
The second part is defining each country's edge cases. Every country has business rules you will not see until you ask the right questions. In our case, some countries had loyalty programs where customers earned bonuses they could spend on future purchases — a concept that did not exist in the MVP country at all. Some markets allowed partial returns of goods and others did not. Tax handling was different: in some countries tax was included in the product price, in others it was calculated separately. Each of these differences meant a different data model, different transformation logic, and different validation rules. None of them were visible from a system comparison alone.
Every issue found at this stage became a clear requirement: changes in a data model, a mapping rule, extra transformation logic, or a documented decision to leave it out of scope. Finding these things early costs hours. Finding them in testing costs weeks.
2. Agree on a BA Approach Before Any Analysis Starts
Before any analysis started, I put together a Business Analysis Approach document — tasks, deliverables, inputs needed, and who signs off on what. I shared it with the client and we agreed on it before touching anything.
I also put together a SIPOC diagram — one PowerPoint slide covering inputs, process steps, outputs, and who owns what. For the data ingestion preparation phase, it looked something like this:

It removes ambiguity about who provides what and who approves what — exactly the kind of ambiguity that causes delays in a multi-country rollout.
When you work across multiple countries with different local contacts, you cannot rely on memory or verbal agreements. People join late, contacts change, decisions get forgotten. This document became the reference point everyone could go back to. At one country, a missing agreement is a conversation. At two countries in parallel, it is already a coordination failure.
3. Build Operational Infrastructure Before You Scale
Three things made our work faster and more consistent across every country we onboarded:
Templates and guidelines. We created standard templates for every document — data gap analysis, source-to-target mapping, data catalog — each with step-by-step instructions. When the team grows as you add countries, quality stops depending on one person and starts coming from the process.
Automated data profiling. Checking data quality manually across multiple countries takes too long and introduces inconsistency. We built PySpark scripts in Azure Databricks to check each new dataset automatically. The scripts covered completeness (missing fields), uniqueness (fields that should never repeat), consistency, and specific business logic rules. For example: a customer paying a negative amount for goods — something to investigate. A product priced at $0.001 or $100,000 — also suspicious. A customer birth date from 1901, or a membership expiration date set before the issue date — clear data errors. These checks ran automatically for every new country dataset and produced a profiling report the BA team used to raise issues with stakeholders before any pipeline work began. Manual profiling previously took two to three days per dataset. After automation, the team spent one to two days reviewing results and writing the report — the profiling itself was done in minutes.
A question and decision log. We kept a shared file with every open question and every agreed decision. When working across several countries in parallel, a decision made on Tuesday cannot be forgotten by Thursday.
4. Follow the Delivery Order — It Matters More Than You Think
Each deliverable feeds the next. Produce them out of order and you go back to redo work. The sequence we followed for every country:
- Data Gap Analysis — identify which business requirements can be met by the available source data, and which cannot.
- Data Ingestion Requirements — specify what entities to load, how to load them, and how often, before anything reaches the staging area.
- Data Profiling — check the quality of ingested data before it reaches the target model.
- Data Asset Catalog — describe and classify all ingested datasets for the team and stakeholders.
- Source-to-Target Mapping — define how each source field maps to the target model, including transformation logic.
- Reference Data Analysis — confirm that all reference data required by the model exists in the MDM system.
We tracked all the deliverables in a shared Excel file per country, open to the client at any time. They could see what was done, in progress, or blocked — which significantly reduced status update requests.
5. Two Problems We Did Not Anticipate
These came up repeatedly. Worth knowing before they happen to you.
Missing data for key metrics. Some systems did not contain the data needed for agreed KPIs. This led to long discussions. The fix: always come with options, not just the problem. Add the data at the source, pull it from another system, use a default value, or remove the metric from scope. A clear set of options makes these conversations much shorter.
System changes during the project. Because the rollout ran for several years, some countries changed their transaction systems while we were still working. Systems we had already analysed were replaced or upgraded mid-project, and we had to restart the analysis from scratch. This consumed resources we had not planned for. If your project runs for more than a year, plan for this explicitly. Decide in advance what the team will do when a source system changes — do not wait until it happens.
6. Do Not Skip User Acceptance Testing
When the pipelines were ready, we ran UAT in the live environment with limited access. Each test case defined what to check, what the expected result was, and how to flag a problem.
Automated checks validate structure and rules. They cannot validate meaning. UAT is where a local business contact looks at the numbers and says — this does not look right. That conversation, before go-live, is worth more than any test script.
What It All Comes Down To
Scaling is not a technical problem. It is a process and management problem.
The tools, the systems, the pipelines — those are solvable. What breaks at scale is everything that was never written down: the verbal agreements, the assumptions carried over from the MVP, the documents no one owns anymore. When you add countries, you add people, and every undocumented decision becomes a risk.
What worked for one country becomes a liability at five. The only way to stay in control is to make your processes explicit before the pressure hits — not after. Templates, delivery order, decision logs, client visibility: none of these are overhead. They are what lets the work move at all.
The pressure to move fast is always there. But speed without structure, at scale, is just organised chaos.
About the Author
Dzianis Kuziomkin is a Lead Business Intelligence Analyst with 8 years of experience in enterprise data projects. He specialises in data platform delivery and business analysis. He holds IIBA's CBAP and CBDA certifications and currently works at EPAM, based in Málaga, Spain.
Discover practical insights from Business Analysts worldwide.