Fintech: Data Standardization is Your Next Competitive Advantage

When I took my first computer science courses in college, my professor often cited the old adage “Garbage in, Garbage out”. Most of the students in the classroom chuckled. For most of us, that was the first time we heard that expression.

What he meant was that the accuracy and completeness of the data input is the most important aspect of any analysis or experimental design. Most of us didn’t think too much of it given our projects, reports and desertion all had nicely formatted and packaged information as their input and, often the data is biased to help us understand the algorithm of theory at hand.

The Struggle

Fast forward to my career at Intel and JPMorgan Chase where I spent 80% of my time cleaning data. I never truly appreciated the importance of having accurate data until I worked over night until 5am to get the latest manufacturing master planning schedule at Intel and worked for 48 hours straight at Chase to get the latest Checking & Savings Account Profitability reports out that shaped the decisions of our marketing and pricing decisions.

The Struggle Continues

Recently, I found myself working at some of the most advanced data driven Fintech companies such as Kabbage, LoanDepot, RocketLoans and What these firms all have in common is that almost all their major decisions in Marketing, Product Management, Operations and Compliance all rely on accurate data.

Most of the Fintech companies in marketplace lending also report their customer data, payments information and loan performance status to centralized databases such as CRAs, Consumer Reporting Agencies which include Consumer Credit Bureaus, Employee Screening Bureaus, Tenant Screening, Insurance and Medical databases, the list goes on and on.

Although CRAs enforce strict reporting and data format from FinTech companies, the derivation and meaning behind each of the data elements differ from company to company. Some of the examples and side effects are documented in Orchard Platform’s white paper “Making Loan Data Actionable: Transforming, translating, and assuring data quality and consistency across originators.”

The Importance of Good Data

Orchard Platform’s white paper shows us how vastly different one lender reports payment cycles and default from another. These differences make understanding the fundamental of the asset performance difficult. It also makes it difficult to compare the performance of one lender to another near impossible. To create a more efficient secondary market for marketplace lenders, we must work together and standardize loan performance definitions, policies and procedures around non-performing loans etc.

Having an industry wide standardized reporting method is the only way to differentiate and highlight the advantages of each originator. Does one originator truly have better marketing, underwriting, collections and cross-sell capabilities than the rest? We can only answer these questions when we standardized our loan performance reporting.

“Garbage In, Garbage Out”

I would like to follow up Orchard Platforms white paper on data standardization with a few more practical issues I’ve encountered in the field of credit risk statistical modeling. Most of the originators have statistical models governing their marketing, fraud, credit risk and operational activities. Sophisticated firms with access to vast amount of data can have a real advantage over their competitors, anywhere from fraud detection to pricing guidelines,

However, let’s recall what my Computer Sciences professor once said, “Garbage in, garbage out.”

If an originator uses their own loan performance to build a strategy or a model to reduce fraud issues as simple as first payment default, one may argue that it is fairly easy exercise. Everyone knows what a first payment default looks like. But do we?

Does it depend on the intent of the modeling exercise? Do we want to deflect borrowers with tendencies for a first payment default and to what probability, or to what degree of that said tendency? If a borrower misses their first payment, do we wait 60, 90 or 120 days before the originator calls it first payment default? Do we start counting after the payment due date, or after the grace period? Do we know if this First Payment Default was inability to pay or first party fraud?

These idiosyncrasies are often redefined depending the acumen and experiences of leadership. Even worse, the idiosyncrasies are ignored completely and reported to various CRAs as the gold standard. When the next generation of originators sample data from CRAs to build their own underwriting strategy, these inaccuracies further perpetuate; and inadvertently newer originators set incorrect loss adjusted yield strategy, concentration limits and other charge off and default covenants with their secondary markets partners that result in headline risks.

Data is your competitive advantage

I am always in the camp of saving as much data as you can from day one. Many of the CEOs and CRO (Chief Risk Officers) that I know struggles with data. Not because they don’t appreciate the value of data and information. It is because data and especially standardization of data is always an after thoughts. The “let the new guy figure it out” mentality runs rampant in multi-billion dollar marketplace lending companies all the way to FinTech startups.

What do I mean by standardized data? It begins with a deep understanding of all your data needs in the next 100 years. That might be impossible, but let’s begin with a few of them as a start.

First, the understanding of your customer performance. Whether you are running a lending or payments company, to truly understand marketing, underwriting, transactional performance and operations efforts begins by saving all elements of that interaction, time stamped from beginning to end.

In lending, a key aspect of measuring portfolio performance is the ability to properly describe what is called a “transition metrics”, that is how does a customer’s loan transition from one account status to another over time. Most of the firms does not save any of this transitory information so your $200k a year analyst can don’t much as his and her hands are tied to properly describe what is truly happening in the system. That is to say, no one will never know how your 1-30 “days passed due” customers transition into 30-60 “days past due customers” over time.

One may argue, why that is important to my eventual securitization efforts. To start, properly disclose your lending portfolio doesn’t stop at a report that shows your investors average FICO, DTI (Debt to Income) ratio, utilization rate and or percentage of charge offs.

If you show your investors that the rate of which 1-30 DPD customers transitioning into 30-60 days is slowing down while your book of loans in increase gives your investors a lot of comfort in that fact that not only would be you able to grow your portfolio, you are also properly managing collections of debt and that’s measurable.

Default happens to all portfolios, doesn’t matter how many version of credit risk underwriting model you may have released in the past year. However, tracking defaults in a meaningful way such as the rate of which your portfolio transitions from 60 to 90 to 120 days past due is critical in understanding when you will be breaking your covenant or concentration limits with our investors.

We’ve all read the these news articles that certain originators has breached their covenant (risk tolerance) with their debt buyer, 12 to 18 months into their agreements and a few months into their securitization efforts. These breaches can easily prevents and corrected if you have data that report a accurate transition metrics. For example, if your 1-30 DPD customers are transitioning into 30-60 DPD at an increasing speed and you’ve recently changed your marketing, underwriting and even pricing structure, you probably want to revert back to your old approach before you figure out what’s going on with your portfolio. Instead of letting these credit issues flow all the way into your charge off (120 days, 4 months), it is way too late and the conversation with your investors would not be fun.

Of course, to make these aforementioned underwriting changes require your investors agreement, but they are making decisions based on the available data which you’ve provided to them, even if they have their analytics team churning this information (sometimes faulty).

The other competitive advantage to having this information is that your analytics and modeling team now have the power to create statistical models that can help your marketing, underwriting, and operations team to focus on the prospects that are most responsive to your mailer, the customers you must chase after as soon as their an payment issue and sell off debt that statistically impossible to collect doesn’t matter how many man hours you spent on the phone.

Having consulted at various large and small lending firms, there are less than a handful of firms that truly look at data as their competitive advantage. After I help them understand how valuable their data is, their origination volumes increased and their default rates are going down. Most importantly, they start driving their businesses with meaningful information from standardized data.

Yes, we can

I applaud Orchard Platform for starting the conversation and leading the charge of championing data standardization. Data standardization has far reaching consequences beyond building a more efficient secondary markets. If we have a common definition of first payment default, we can build a better consortium like Marketplace Lending Association to protect borrowers from identity fraud.

Committing to standardizing the way we define and report loan performance will benefit marketing, underwriting strategies and Consumer Report Agencies. In turn, this will help all of us. It will bring consistency and transparency to create a more efficient second markets, and encourage investments into this asset class. It will help your organization to measure against industry benchmark and improve upon your existing processes and strategies. Most importantly, we will provide the best priced product for our borrowers which is our original mission.

I’d like to ask my readers to restart the data conversation within your respective organizations and put data accuracy and data standardization first in your next discussion.


Timothy Li is founder and CEO of Kuber Financial. He has over 12 years of experience in Finance, Technology, Risk Management and is passionate about changing the finance and banking landscape. Li is also the creator of Fluid App,  the next generation of credit products for Generation Z and co-founder and President of P2P Protect, a company that offers P2P insurance products in the US.

Sponsored Links by DQ Promote



Send this to a friend