When economists try to measure how one thing affects another – for example, how much a £1 increase in government spending contributes to growth of the economy – they often face a major hurdle: real data exhibit a complicated web of cause and effect. A simple comparison might show that higher spending correlates with economic growth, but does the spending cause the growth or does the government just spend more when the economy is already doing well?
To solve this analytical challenge, researchers use ‘instruments’ – external factors that provide a clean way to isolate the true cause-and-effect relationship. But if these instruments are only weakly connected to the variables being studied, the resulting estimates can be highly misleading – a problem known as ‘weak instruments’.
Our study, published recently in the Review of Economic Studies, provides a powerful new testing method to ensure that economic research doesn’t fall into this trap (Lewis and Mertens, 2026).
The problem: when traditional tests fail
For decades, researchers have relied on standard tests to check if their instruments are strong enough. But these older tests have a major ‘blind spot’: they assume that the data are relatively simple and uniform, and that their basic properties are the same for each observation.
In reality, most modern economic data is complex. It often has ‘clusters’ (where data points are related, like people in the same city) or correlation over time (where data from one month are related to the next). When researchers use old tests on this complex data, the results are often invalid.
Furthermore, previous attempts to generalise these tests could only handle scenarios where there was just one factor being studied. If a researcher wanted to look at multiple factors at once – such as how government spending and taxes both affect the economy – there was no reliable way to test whether the instruments were reliable.
What’s new: a robust solution for complex models
We have developed a test that fills this gap. Our method is designed to work even when the data exhibit complicated properties and when researchers are studying multiple variables at the same time.
The key breakthrough is that this test can account for various types of patterns commonly present in real-world data. The test provides a way to measure the ‘bias’ in an estimate – the risk that the result is systematically wrong because the evidence is too thin. Researchers can set a threshold for the maximum bias they are willing to tolerate, and use this new test to see if their instruments meet that standard.
Putting it to the test: government spending
To show why this matters, we apply our test to a well-known study of government spending (Ramey and Zubairy, 2018). The original study looks at how ‘multipliers’ (the effect-per-dollar of government spending) change depending on whether the economy is in a recession.
When we use our new, more rigorous test, we find that the instruments are sometimes weaker than previously thought. This doesn’t necessarily mean that the original findings were wrong, but it highlights a critical risk: without the right tools to test instrument strength, economists might be building their conclusions on a shaky foundation.
Why this matters for policy and public debate
Economic research often informs major policy decisions, from tax changes to stimulus packages. If the underlying research is based on unreliable instruments, the resulting policy advice could be based on a statistical fluke rather than a real-world effect.
By providing researchers with more accurate and flexible tools, we are helping to bridge the gap between complex academic theory and the practical need for reliable evidence. Our method ensures that when an economist says ‘Factor A causes Result B’, the evidence is strong enough actually to support that claim, even when the data are complex.
For the public and for policy-makers, this means more dependable research and, ultimately, better-informed decisions. To encourage wide adoption, we have released free software that allows other researchers easily to apply these rigorous new standards to their own work.




