Fintech Plaid has noted in a blog post that credit underwriting has long relied on credit bureau data, which provides many years of structured, standardized information on loans, repayments, and delinquencies. However, Plaid also mentioned that these files have certain “blind spots, saying little about individuals without significant credit exposure and missing early warning signals that might be seen in other financial behaviors.”
Fintech firm Plaid further stated that “bank transaction data, or cash flow data, tells a richer story.”
Plaid explained in a blog post that it “reflects how consumers earn, spend, save, and manage money day to day.” According to the financial infrastructure provider, it is indicative of “dynamic, real-time, and much broader financial behaviors.”
When used alongside bureau data, cash flow data can “enable lenders to spot hidden risk, identify missed opportunities, and more accurately price loans.”
However, cash flow data introduces new complexities:
- Unstructured data: Transactions come as text strings and amounts that lack consistent formatting across institutions or even across time.
- High variability: Incomes fluctuate, discretionary spending shifts seasonally, and behaviors evolve.
- Regulatory transparency: Cash flow data exposes lenders to Fair Lending and explainability concerns, which require careful treatment to avoid regulatory pitfalls.
Building a credit risk model on such data means “solving problems across data processing, feature generation, and decisioning—all while meeting compliance and interpretability expectations.”
And that’s where LendScore comes in.
Fintech firm Plaid further explained that they set out to create a “solution that distills complex data to a single market-ready score to improve credit risk assessment when used alongside traditional data.”
At the foundation of LendScore is Plaid’s so-called “consumer-permission transaction data—billions of records representing consumers’ inflows, outflows, and balances over time.”
According to the update, the first challenge was to “transform this unstructured data into a stable analytical base.”
They also aim to further aggregate certain information into higher-level categories like “essential/discretionary spending or credit repayment, and also cluster transaction streams to separate recurring and one-off expenses.”
Once categorized, they aggregate across “time windows, e.g., 30, 60, and 90 days, to create features such as average monthly inflows.”
This aggregation turns unorganized transaction streams “into structured, behavioral signals suitable for modeling.”
From these structured signals, Plaid said that they “engineered hundreds of attributes and chose the top 145 most predictive to include in their score to balance the trade off between model lift and complexity, reducing any concerns about explainability.”
Plaid further stated that the vast majority or “about 81% of the predictive power comes from cash flow features, while the remaining 19% originate from Plaid’s Network Insights.”
Plaid’s network data offers a layer of “predictive power.”
Since 1 in 2 or around 50% of US consumers connect their financial accounts through Plaid (nearly 1 million connections daily), they can “observe network-level patterns that correlate with repayment outcomes—while keeping track of consents to ensure that consumers understand how their data is being used.”
Network Insights includes attributes such as:
- The number of connections to certain types of financial apps (e.g., lenders, savings tools, earned wage access products), e.g., more lending accounts connections increase risk.
- Patterns of connection that distinguish between recent and historic interactions and differentiate stable relationships vs. risk seeking behaviors, e.g., recent lending connections are riskier than long standing ones.
These signals are described as being Plaid-only features “that come from their network of 12,000 FIs and 7,000 apps, and are not available through traditional credit bureaus or other cash flow underwriting data providers.”
They help differentiate risk among consumers who “look identical based on traditional credit or cash flow data.”