For consumers and small businesses, access to affordable credit plays a crucial role in bridging the gap between short-term cash inflows and outflows as well as supporting long-term financial health.
Traditionally, access to credit has depended on the credit reporting and scoring systems that most financial institutions use to assess applications. This leaves out consumers who may have the ability to pay but are poorly rated, and younger consumers who haven’t had enough time to build credit. Small business credit often depends on the creditworthiness of the business owner rather than the potential of the business, potentially limiting their access to credit.
In the data boom of the last decade, financial institutions now have access to much more information about their consumers. Using this non-administrative “alternative” credit data can help to better understand a consumer’s risk. This data can also help reduce operational expenses needed to collect and verify consumer documents. For example, having access to a consumer’s bank transactions can help verify the validity of a pay stub when the lender can see income deposits every two weeks.
Lenders may have access to a variety of different alternative data about their potential customers. These are roughly divided into two groups: data from providers and data generated by lenders. Lenders can generate data about their consumers in several ways. They have visibility on the source of solicitation, such as offers by mail or direct visits. Cookies can help identify where the consumer is coming from, for example a consumer reads a newspaper article about loan consolidation and clicks on an advertisement posted by the lender. Lenders may also use information about how a consumer interacts with the online application and how much time they spend on it.
Supplier-based data sources can provide much more detailed consumer information. Information about utility bills and rents can be very helpful in predicting a consumer’s creditworthiness. Educational history can be an indicator of a consumer’s future ability to repay loans. Sellers can also provide access to a consumer’s online activity, which can be an indicator of lifestyle and social choices. Device types and location information using IP addresses can help personalize user experiences, but can also be a signal of creditworthiness. Finally, lenders can access consumer cash flow and bank account information through providers. This cash flow data provides signals that are not present in traditional office scores, but – more importantly – it is continuous and in real time.
Many fintechs have entered the credit market over the past decade. They operate almost exclusively online and rely on automated underwriting models and algorithms. Many fintechs issue and hold these loans, while several operate as service providers for traditional banks. Lenders often use the services of other technology companies operating in the payment, accounting and data transfer networks. All of this information needs to feed into the lender’s automated underwriting algorithms to be able to make an offer to a consumer in real time.
Use of alternative data for subscription
Alternative data can be used to support various aspects of loan underwriting, in calculating intent, price elasticity, advanced credit scoring, and fraud detection. Different data sources can help with one or more of these tasks.
Competitive loan offers can be made if lenders can generate measures of consumer “intent”, that is, the likelihood that a consumer will accept a given offer. In addition, offers can be adjusted if an “elasticity” of prices to consumers is known (price elasticity gives us a percentage change in intention relative to a unit change in supply).
Intent can be estimated using several different data sources. In-session signals that follow the source of the consumer are very useful. For example, when a consumer arrives after specifically searching for “loan consolidation,” they are more likely to accept an offer. Consumers can arrive from blog posts or published content from a lender, their affiliates, or after clicking on advertisements.
Clickstream data can help understand a consumer’s price elasticity. Visiting or applying for a loan from a competitor not only indicates high intent, but also that the consumer is “shopping around” for a deal. A lender can then make a better offer or alternative plans. Other variables that can be used to calculate intent/elasticity include educational background, cash flow, income, and other signals generated during the session while the consumer fulfills the request.
Lenders can add intent/elasticity modules to their automated underwriting algorithms only if they see significant improvement potential for their business. Lenders must record all of these variables for loans issued and rejected. Once they have a reasonably large sample, simple statistical tests can show its potential. Intent and elasticity metrics can be generated using models based on logistic regression. Other advanced methods or algorithms can also be used depending on the use case.
Cash flow data acquired from transactions in a consumer’s bank account can be used for two specific purposes in the subscription. It is often used to verify the information provided by the consumer in the loan application, but can also be used to “boost” traditional credit scores. Transaction data allows lenders to take a “second look” at a loan application when the standard approval process based on traditional office-based variables would reject the loan. This allows lenders to extend credit to consumers who otherwise would not be served.
Lenders can access this transactional information with the consumer’s approval. Typically, the lender wants to verify the information the consumer enters in the loan application, such as identity, income, and employment. The verification task is successful when we observe a series of bi-weekly or monthly payments that add to the annual income entered by the consumer. The source of these deposits can verify employment information. Integrating this into an automated underwriting algorithm is quite simple and easy.
Cash flow data can also be used to “increase” credit ratings. The Bureau report credit scores, but these are often static in nature and do not take into account all of a consumer’s financial activities. These activities include rent or utility payments (which are not always reported to consumer reporting agencies), income and expenses. For example, a recent graduate without a significant credit history will receive a low rating, but may have a stable income, strong evidence of cash flow, and a good positive balance history. Such a consumer is likely to behave like a consumer with a much higher credit score. On the other hand, a consumer with a smaller income-expenditure differential may not have the same ability to repay a loan as their credit score would otherwise suggest. Thus, transactional data can be useful both to increase or decrease the assessment of a consumer’s ability to repay a loan.
Calculating these score “differentials” and integrating them into an automated underwriting algorithm is not trivial. Risk models will have to be developed and tested empirically to measure their predictive character. This requires a reasonably sized sample of loan applications coupled with their detailed cash flow based data and loan repayment history. These models can be simple classification models with clustering of scores or more advanced machine learning algorithms when the available data is large enough. Lenders can develop internal teams to develop these models. Alternatively, they may contract with third-party vendors and use their proprietary algorithms.
Data infrastructure for the use of alternative data
Thanks to advancements in cloud computing, we can now ingest, process, and send triggers back to online applications in seconds. An illustration of the data flow is shown above. The data warehouse serves as a central repository for all current and historical applicant information, which can then be fed into online and batch systems.
To better illustrate the workflow, we consider the following use case. A customer with an existing banking relationship wants to apply for a personal loan. She received a pre-approved offer, so she has an invite code. Her FICO score is marginal, but she has a direct deposit with us and has stable cash flow.
Using the power of cloud computing, we can enhance our traditional subscription models to include other relationships that customer has with us. Also, as she is already a client, we can initiate an “in-session” conversation for any additional information required, so we make a decision while the client is available. Let’s take a closer look at how this will work from a data perspective.
- The customer enters her invitation code in the start page of the application
- This information is transmitted to our application service in the middleware layer which returns a client identifier associated with the invitation code.
- We then ping the pHub (People Hub) to pre-populate other information associated with the customer
- This allows the client to authenticate that it is indeed her and that the information provided is correct
- Once authenticated, we now have the following client information (all linked via client ID)
- Risk profile
- Their credit profile using traditional credit data
- Its cash flow using banking information
- All potential regulatory indicators, if any
- Offer Preferences
- Their primary channel of choice for inbound and outbound communication
- Any friction points she faces throughout the application flow, so that we can engage her “in session” or “out of session”, as appropriate
- Risk profile
With the above information, we can now adjust the loan decision in real time with a personalized message, increasing customer engagement.
As can be seen in the flow above, the enhanced subscription package is modular and one can include various other data sources and relationships. This greatly improves the ability to decide margin and examine data beyond traditional desktop data.