Beyond Backtesting

The Case for a Simulation Framework in Long-Term Investing

Written by Sabr Research · August 2025

In our previous blog Backtesting at Scale, we discussed why building a fast, modular, and reproducible backtesting infrastructure is a competitive advantage for research and deployment. But even the best backtesting framework has limits: it can only tell us how a strategy would have performed on one realized historical path. In this post, we go a step further — exploring how a novel simulation framework enables us to move beyond backtesting by generating multiple plausible market universes. Such an approach allows us to measure robustness, quantify uncertainty, and build genuine confidence in a strategy’s resilience across hypothetical but plausible scenarios. It also enables us to place confidence bounds on key variables such as maximum drawdown, volatility, or Sharpe ratio — moving from point estimates to a deeper understanding of risk.

A Trust Gap

In most industries, rigorous simulation is non-negotiable. No aircraft is certified to fly without exhaustive stress tests and digital simulations; no new drug reaches patients without being validated across countless scenarios. Failure in these fields is unacceptable and simulation is one of the main safeguards.

The financial sector, interestingly, often takes the opposite approach. Despite the fact that failure here may cost fortunes, robust simulations are rare. The most traditional form of evaluation, backtesting, is frequently treated with suspicion — dismissed as unreliable, or conducted so poorly that managers fall back on intuition, incremental live testing with small capital, or even trial by fire in the market itself. In effect, vast amounts of capital are deployed without the simulation standards that are taken for granted in most industries.

At Sabr, we believe the financial sector should meet the same scientific standards as other high-stakes industries. Before capital is exposed, strategies deserve a similar level of rigorous, scenario-based testing that engineers demand in aviation or pharmaceuticals. This means establishing a structured process where strategies are stress-tested against diverse market conditions, rare but impactful events, and long-horizon dynamics.

Market Simulation Techniques in Modern Investing

When evaluating investment strategies with models that go beyond the static nature of historical data—moving past traditional backtesting—two distinct approaches can be identified, reflecting different use cases: one for high-frequency trading and another for long-term asset management.

On one end are high-frequency traders, whose focus is the mechanics of market microstructure: order placement, latency, and queue dynamics. They simulate the limit order book at the microsecond level to estimate how long quotes may sit in the queue and the probability of execution. Some go further, building agent-based models with virtual “market makers,” “informed traders,” and “noise traders” interacting in a synthetic market. The aim is to replicate the constant push and pull of liquidity provision and informed order flow. In this context, simulation tests outcomes where profits hinge on fractions of a cent and decisions occur in microseconds.

Our focus in this blog, however, lies at the other end of the spectrum: the long-term horizon of asset management. Here, simulation relies heavily on factor models to explain sources of return and risk. Managers ask: What is my portfolio’s exposure to value, momentum, or quality factors? They then run stress tests: What happens if interest rates rise by 200 basis points? If the euro weakens sharply against the dollar? If momentum collapses as it did in 2009? These exercises help identify vulnerabilities to known systematic risks.

Yet this framework is still insufficient. Factor models remain anchored in historical covariances and sensitivities; they can highlight how portfolios react to shocks resembling the past, but they struggle to illuminate how strategies might behave if the future unfolds in fundamentally new ways. In other words, the long-term simulation paradigm itself must evolve beyond static assumptions to capture worlds we have not yet observed.

An Alternative Approach: Model-Based Scenario Generation

At Sabr, we believe that decision-making in financial systems driving investment processes should follow the same scientific rigor applied in other critical fields: models must undergo thorough testing. Traditional backtesting offers one perspective, but a single historical scenario cannot capture the full range of conditions a system may encounter over time. Our work focuses on building robust models that explore a variety of potential futures, providing a foundation for informed, long-term quantitative analysis.

Instead of relying on a single historical path, we simulate thousands of alternative, yet plausible, scenarios and evaluate how a strategy performs across each of them. This transforms risk assessment into a distribution of potential outcomes, allowing us to produce confidence intervals for metrics such as maximum drawdown, Sharpe ratio, or volatility that better reflect real-world uncertainty. These distributions reveal whether a strategy’s performance is resilient across different conditions or sensitive to small changes in assumptions.

Descriptive Alt Text

Figure 1: Model Based Simulation Framework

Achieving this requires powerful scenario-generation tools. One approach is to train models to learn the joint distribution of key market variables—such as fundamentals, prices, and volumes across many stocks—and then sample scenarios from the learned joint-distribution. These synthetic, yet plausible, scenarios behave like realistic markets without being exact replays of history. They allow us to explore counterfactuals: what if fundamentals evolved differently, volatility clustered more severely, or correlations spiked across sectors? Figure 1 provides an illustrative example: historical data first feeds into a joint-distribution fitting stage, where relationships among variables are modeled. Thousands of alternative scenarios are then sampled and run through repeated model executions, with each run generating performance outcomes that are subsequently aggregated into a robustness evaluation.

The result is not a single number but a probabilistic picture of potential scenarios. For example, a model might outperform its benchmark in 70% of simulated worlds, underperform meaningfully in 10%, and fall somewhere in between for the remainder. Strong performance across simulated scenarios not only validates the model’s predictive capabilities but also demonstrates that it has learned a robust set of underlying market patterns, rather than simply overfitting to historical data. Moreover, this approach makes it possible to understand the specific conditions under which the model is likely to perform well versus poorly, providing deeper insight into its strengths and vulnerabilities.

A Philosophical Shift

Ultimately, moving beyond backtesting toward model-based scenario generation aligns financial research with the scientific standards of other high-stakes industries. Just as aviation and pharmaceuticals rely on rigorous simulations before exposing lives to risk, asset managers can rely on these tools to safeguard capital and improve decision-making. At Sabr, we see this as more than a methodological upgrade—it is a philosophical one: treating financial systems with the same discipline, humility, and rigor that complex, high-stakes systems demand.