Imagine you’re trying to predict someone’s income based on their education and work experience. In a perfect world, you’d have exact salary figures for everyone. But in reality, many people are uncomfortable sharing precise numbers – they might tell you they earn ‘between £35,000 and £45,000’ instead of their exact salary.
This is the challenge that economists and data scientists face every day: how do you make predictions when your data come in ranges rather than precise values? In recent research with Weiguang Liu (UCL) and Elie Tamer (Harvard University), we tackle this thorny problem with an innovative approach that could change how we handle uncertain data (Liu et al, 2025).
The range problem
Traditional prediction methods assume we have exact measurements. If you want to predict house prices, you’d typically use a dataset with precise sale prices like £234,567. But what if you only knew that houses sold for ‘between £200,000 and £250,000’?
These ‘interval data’ happen more often than you might think:
- Survey respondents report income in brackets rather than exact amounts.
- Medical studies track survival times only up to a certain point in time (‘censoring’).
- Environmental sensors may only detect pollution levels above or below certain thresholds.
- Job adverts often report salaries in ranges.
- Auction houses advertise lowest and highest estimated valuations.
When faced with such data, researchers have traditionally taken one of two approaches: throw out the imprecise data (which is wasteful); or guess the exact values within each range (which is potentially misleading). Both approaches have serious drawbacks.
Figure 1: Examples of ranges found in internet searches
Panel A: prices on auction house website

Panel B: salaries listed on academic jobs posting website

Source: Author’s search results
A smarter approach
Our study proposes a fundamentally different solution. Instead of trying to pin down exact predictions, why not embrace uncertainty and create ‘prediction sets’ – ranges of likely outcomes that account for all the uncertainty in our data and estimation strategies?
Our method adapts available ones to accommodate ‘range data’ and produces the smallest possible range that still captures the true outcome with a specified level of confidence. Think of it as the Goldilocks solution: not too wide (uninformative) and not too narrow (unreliable), but just right.
Here’s how it works in practice: instead of predicting ‘John will earn exactly £65,000,’ the method might predict that ‘John will earn between £58,000 and £72,000 with 90% confidence’. This range accounts for the inherent uncertainty in predicting human outcomes, uncertainty from the estimation and the additional uncertainty introduced by having range data.
Making it work in practice
The technical innovation lies in how we handle the mathematical complexity of partial information. When you only know that someone earns ‘between X and Y’, there are infinitely many possible exact salaries that they could have. The challenge is finding prediction methods that work reliably across all these possibilities.
Our solution adjusts a technique called ‘conformal inference’ – a powerful statistical method that provides guarantees about prediction accuracy even when you don’t know the underlying data distribution – to the setting where outcomes of interest are recorded in ranges.
Conformal inference protocols typically deal with exact values although some references allow for particular types of range data like the ‘censoring’ case. It’s like having a quality control system that ensures that your predictions are reliable, regardless of whether incomes follow a bell curve, have multiple peaks or follow any other pattern.
Real-world validation
To test our approach, we apply it to data from the US Current Population Survey, where just over a fifth of respondents report their income in ranges rather than exact figures. Our method consistently outperforms traditional approaches that either ignore the range data or try to guess exact values within ranges.
In one test, we compare our ‘range-aware’ predictions against methods that fill in missing data using common statistical techniques. The results are striking: our approach achieves much better coverage of actual outcomes, meaning that the prediction ranges are more likely to contain the true values.
Beyond income: broader applications
While our study focuses on income prediction, the implications stretch far beyond economics:
- Healthcare: predicting treatment outcomes when patient data include ranges (blood pressure readings, symptom severity scales, etc.)
- Environmental science: forecasting pollution levels when sensors only detect ranges.
- Market research: predicting consumer behaviour when survey responses use scales rather than precise measurements.
- Quality control: manufacturing settings where measurements come with inherent uncertainty ranges.
The bigger picture
Our research represents a shift toward ‘uncertainty-aware’ machine learning – acknowledging that perfect data are rare and building methods that work well with the messy, incomplete information that we actually have.
Traditional statistics often assumes that we can measure everything precisely. But in our complex world, uncertainty is the norm, not the exception. Methods that embrace this uncertainty, rather than trying to eliminate it, often prove more robust and reliable.
What this means for you
If you work with data – whether you’re a business analyst, researcher or policy-maker – this approach offers a more honest way to handle uncertain information. Instead of pretending that we know things precisely when we don’t, we can quantify our uncertainty and make better decisions accordingly.
For the general public, this research is part of a broader movement towards more transparent and reliable data science. When news reports cite predictions or when companies make claims based on data analysis, approaches like this help to ensure that those conclusions account for real-world uncertainty rather than assuming perfect information.
The days of forcing messy real-world data into oversimplified models may be numbered. Instead, we’re moving toward sophisticated methods that embrace uncertainty while still providing useful and reliable insights. In a world awash with data but starved of wisdom, that’s exactly the kind of progress we need.




