Skip to content

cap hpi – Large-Scale Data Processing Platform

Creating a new way to generate retail valuations

cap hpi is a data business supplying the automotive industry with used car pricing and technical information. cap’s Retail Valuation (i.e. the value at which a used car is advertised on the forecourt) has long served the automotive industry as a guide to the potential margin available on used vehicles.

Data business supplying the automotive industry with used car pricing and technical information

With dealers’ margins coming under increasing pressure, cap wanted to create a completely new methodology for generating their published Retail Values – one which responded to the used car market more closely, and would assist dealers in finding the pricing ‘sweet spot’.

Working closely with cap’s operational, technical and research teams, Hippo developed a solution which takes in a daily feed of adverts from multiple sources, applies a set of data quality and validation processes, then feeds the cleaned adverts through multiple data mining algorithms to output a value for every possible vehicle (there’s around 65,000 vehicle variations), across 10 different plates, and over 6 mileage points.

The key to this project was to find a way to incorporate existing industry knowledge into a technical data solution. Often, the data would suggest one conclusion, and industry knowledge would disagree with the conclusion. Finding a way of ensuring that the output of the models both reflected the market, while also incorporating years of experience, was particularly challenging.

The eventual solution was an automated daily feed into cap’s core product, and included multiple processing steps:

  • Clean & validate data
  • Clustering algorithm to define broad segments of vehicles
  • Decision tree to predict Retail Value
  • Evaluate all scored values against 20+ business rules
  • Adjust values where significant variance from business rules occurs
  • Publish values to daily product

3 consultants delivered a large-scale data processing platform using the Microsoft BI stack.


Data mining structures, reporting cubes and an application to manage the process and analyse the output.


The solution pulls and processes around 700,000 live adverts before completing approximately 4 million price points, every day.


Price points are rigorously tested against a wide range of business rules, adjusted where necessary and then published into cap’s Black Book products.

Delivering the project, although critical, wasn’t the end of it. We also helped cap find analytical people, recruit them and set up an analysis team to support the ongoing maintenance and extension of the solution.