Replica brands

Why DTC brands need to add a layer of trust to their data stack

The modern data stack is not dead, despite the many conversations circulating the internet at this rate. Hard? For sure. Swollen? Yes. Disconnected from reality? Most of the time. But thinking about the purpose of these combined technologies – to use data for analysis and activation – the need is greater than ever. Maybe we’ll start to call it something else, but this set of technologies and the purposes they serve still represent a major growth opportunity for most companies today. Especially for brands in the DTC space, which have grown rapidly without having the time to set up the right infrastructure.

Yes, the data is complex. Data modeling is complex. Finding the right tools to enhance data is… complex. Most of the time it looks more like a swamp than a pile. Either way, knowing what customers want and what they will want in the future is just as important as ever (perhaps more so). Analytics are invaluable for making critical decisions and enabling data remains the best way for operations and marketing to be more efficient.

How Data Trust Happens Today

For most DTC brands today, data trust is so far down the to-do list that it’s barely visible on the horizon.

The mature stack you see today typically looks like this: data sources > data integration > data lake/warehouse > reverse ETL > destination. Instead of a data lake/warehouse, there could be a Customer Data Platform (CDP). More likely, unless you work at a large company with sophisticated technical capabilities, your stack looks a lot like this. Very occasionally, you’ll see a “data quality” component included, which will typically test and document data issues for the internal team to resolve. Data trust is an afterthought, if considered at all.

Depending on the solution used for each of these components, there may be some level of data cleansing or even identity resolution built into one or more of these layers – something that makes you believe that the data on which your business builds are “pretty good.”

Maybe you have a data monitoring solution that filters data that doesn’t meet certain criteria into a separate workflow, or maybe you even spent time writing the code to add it to your data storage. Maybe you have someone in-house who manually cleans the data or has spent time building a custom solution. You may know that data is a mess, but there are just too many other priorities for customers, and it’s not urgent. With the current talent shortage and lack of expertise available in this area, it’s easy to deprioritize.

For DTC brands trying to scale in the midst of stiff competition, data confidence might just be the edge that will keep you ahead of the competition. According to Gartner, “Each year, poor data quality costs businesses an average of $12.9 million. Besides the immediate impact on revenue, in the long term, poor quality data increases the complexity of data ecosystems and leads to poor decision making..”

Why is quality even an issue with first-party data?

At first glance, it might seem that first-party raw data would give a complete and accurate picture of what is happening for a business – that each data point would represent something real that happened and could therefore be transformed into information by asking the right questions. In truth, the data is messy and full of errors.

In the DTC e-commerce industry, there are over 75 data elements associated with the average customer. It is a combination of correct/valid data and inaccurate/invalid data; more than two-thirds of them usually require cleaning or validation. This number is even higher for businesses that rely on many promotions, have changed vendors, or run multiple marketing campaigns at once. And when data is spread across systems that don’t integrate with each other, it’s obvious that a first-party data set can have major trust issues.

Duplicate data is a small example that can have a major business impact. Lifetime value is an important metric used by DTC brands, and even 3% duplicate customer records can mean the calculation is off. This can impact valuation, profitability calculations, and other very important metrics. There are endless examples like this.

Autonomous data activation is simpler, faster and more efficient

For most companies that have a mature data stack, data is cleaned by some sort of rule-based filtering after it is ingested into the data warehouse or lake. But data cleaning is complicated, and there are hundreds of specific scenarios that data scientists need to plan for just to make the data exact. Then it still needs to be verified, linked across data sources, and unified around real customers to return useful analytics that translate into effective decisions.

If these steps are performed, they are usually handled by an internal resource. The first step would be to manually check each data point, which can be a never-ending task. (Do you know if that phone number belongs to the White House, or if the area code is even real?) Then they could write a code that can match unique identifiers like email or phone to across sources, and maybe even outsource identity resolution, if they can afford it.

A data trust layer completely removes the hands-on time (weeks or even months) spent by employees, as well as the margin for error that comes with building something from scratch. It’s not tied to a legacy system, won’t bother changing or adding solutions as the business grows, and doesn’t require the understanding or use of a data scientist. This means the marketing department can get their data cleaned up before the next big campaign, without IT having to think twice.

Deterministic identity resolution is not enough

If your company uses a CDP to enable data – an expensive option that is only effective if you use all the benefits – there is likely a level of identity resolution based on deterministic matching. While matching based on exact information is an important first step in identity resolution, it leaves a lot on the table. Yes, there are unique phone numbers and email addresses across all systems that can be easily assigned to specific users. It is essential to do this step. But if your CDP only matches deterministically, that leaves a lot of unresolved data.

One of the most obvious reasons deterministic identity resolution fails to find matches is when customers use different phone numbers or email addresses (or make mistakes) when interacting with A brand. Deterministic matching can also create incorrect matches if the assumed unique identifiers are not valid. For example, many people often use fake or public phone numbers or email addresses and can then be connected around a common user ID.

Although there are obvious problems with the exclusive use of deterministic matching, many probabilistic matching algorithms are not the solution. This method only allows data to be compared within the same source. It is also very computationally intensive, may require manual labeling, and requires frequent fine tuning.

For identity resolution to create a complete and accurate picture of the customer, it requires both types of data matching. First matching on deterministic data, then advanced machine learning to perform probabilistic matching. This is not offered by any warehouse today and less than a handful of CDPs provide the solution at an affordable price. Decoupling these solutions is likely a cheaper and more effective solution for your brand.

Trust in data drives decisions that result in growth

No matter where the modern data stack goes, trust is an essential piece of the puzzle for DTC brands. Given the purpose of modern data – analysis and activation – there’s really no point if the data doesn’t represent what’s actually happening IRL. If your brand doesn’t fully understand the customer journey, making decisions that result in sustainable, scalable growth is a shot in the dark.


Orita Co-founder and CEO Daniel Brady (DB) loves a challenge, as evidenced by his doctorate in neurobiology from Harvard University. After solving the messy data problem for e-commerce brands time and time again, he set out with partner Zack Gow to solve it once and for all. Using advanced machine learning technology and extensive experience in the hard way, they have made it easier and cheaper than ever to clean, verify and unify customer data around IRL customers. Today, DB spends its time helping DTC brands get the clean data they need to make better decisions.