Numerical Methods: Theoretical Background¶
This document provides rigorous mathematical foundations for numerical methods used in derivatives pricing and risk management.
Table of Contents¶
- Monte Carlo Methods
- Variance Reduction Techniques
- Quasi-Monte Carlo Methods
- Multi-Level Monte Carlo
- American Options: Longstaff-Schwartz
- PDE Methods
- Fourier Methods
- Greeks Computation
Monte Carlo Methods¶
Fundamental Theorem¶
Monte Carlo pricing is based on the Law of Large Numbers and risk-neutral pricing:
where \(S_T^{(i)}\) are independent samples from the risk-neutral distribution.
Convergence: By the Central Limit Theorem:
Convergence rate: \(O(N^{-1/2})\) (independent of dimension - key advantage)
Reference: [Glasserman, 2004], Chapter 2; [Shreve, 2004], Chapter 8
Implementation: src/neutryx/engines/mc.py
Standard Error¶
The standard error of the Monte Carlo estimator is:
95% confidence interval:
To reduce SE by factor of 10: Requires \(100\times\) more samples (expensive!)
Solution: Variance reduction techniques
Reference: [Glasserman, 2004], Section 2.3
Path Simulation¶
Discretization Schemes¶
For SDE \(dX_t = \mu(X_t, t) \, dt + \sigma(X_t, t) \, dW_t\), discretize on \([0, T]\) with time step \(\Delta t = T/M\):
Euler-Maruyama Scheme:
where \(Z \sim \mathcal{N}(0, 1)\).
Properties: - Strong convergence: \(O(\Delta t^{1/2})\) (path-wise error) - Weak convergence: \(O(\Delta t)\) (distribution/expectation error)
Milstein Scheme (higher-order):
Properties: - Strong convergence: \(O(\Delta t)\) - Weak convergence: \(O(\Delta t)\)
Reference: [Glasserman, 2004], Chapter 6; [Kloeden & Platen, 1992]
Exact Simulation¶
For some models, exact sampling is possible (no discretization error):
-
Geometric Brownian Motion: $$ S_T = S_0 \exp\left((r - \sigma^2/2)T + \sigma\sqrt{T} Z\right) $$
-
Vasicek/Ornstein-Uhlenbeck: $$ r_T = r_0 e^{-\kappa T} + \theta(1 - e^{-\kappa T}) + \sigma\sqrt{\frac{1-e^{-2\kappa T}}{2\kappa}} Z $$
-
CIR: Non-central chi-squared distribution
Advantage: Eliminates discretization bias, faster (fewer time steps)
Reference: [Glasserman, 2004], Section 3.3
Variance Reduction Techniques¶
Antithetic Variates¶
Idea: For every path using \(Z \sim \mathcal{N}(0, 1)\), also simulate using \(-Z\).
Estimator:
Variance reduction: If \(g\) is monotonic in \(Z\):
Typical reduction: 30-50% for European options
Reference: [Glasserman, 2004], Section 4.1; [Clewlow & Carverhill, 1994]
Implementation: src/neutryx/engines/variance_reduction.py:antithetic_variates()
Control Variates¶
Idea: Use a correlated variable \(Y\) with known expectation \(\mathbb{E}[Y] = \mu_Y\).
Estimator:
Optimal coefficient:
Variance reduction:
where \(\rho = \text{Corr}(V, Y)\) is the correlation.
Example: Pricing Asian option
- \(V\): Asian option (unknown)
- \(Y\): Geometric Asian option (analytical formula available)
- Often \(\rho > 0.9\) → variance reduction \(> 80\%\)
Reference: [Glasserman, 2004], Section 4.2; [Lavenberg & Welch, 1981]
Implementation: src/neutryx/engines/variance_reduction.py:control_variate()
Importance Sampling¶
Idea: Sample from a different distribution \(\mathbb{Q}\) to reduce variance in the tail.
Estimator:
Optimal measure: \(\mathbb{Q}^*\) such that \(X \frac{d\mathbb{P}}{d\mathbb{Q}}\) is constant (zero-variance).
Application: Deep out-of-the-money options, rare events (VaR)
Example: For OTM call, shift drift to increase probability of \(S_T > K\).
Radon-Nikodym derivative (Girsanov):
Reference: [Glasserman, 2004], Chapter 5; [Glasserman et al., 1999]
Implementation: src/neutryx/engines/variance_reduction.py:importance_sampling()
Stratified Sampling¶
Idea: Partition sample space into strata, sample proportionally from each.
Estimator:
where \(p_j\) is the probability of stratum \(j\), and \(\hat{V}_j\) is the estimate from stratum \(j\).
Variance:
Optimal allocation (Neyman):
Application: Digital options, barrier options
Reference: [Glasserman, 2004], Section 4.6
Implementation: src/neutryx/engines/variance_reduction.py:stratified_sampling()
Moment Matching¶
Idea: Adjust samples to match theoretical moments (mean, variance).
Procedure: 1. Generate samples \(Z_1, \ldots, Z_N \sim \mathcal{N}(0, 1)\) 2. Transform: \(Z_i' = \frac{Z_i - \bar{Z}}{\hat{\sigma}_Z}\) (force mean=0, variance=1)
Benefit: Eliminates sampling error in first two moments (especially helpful for small \(N\)).
Reference: [Glasserman, 2004], Section 4.5
Implementation: src/neutryx/engines/variance_reduction.py:moment_matching()
Conditional Monte Carlo¶
Idea: Condition on a subset of random variables to reduce variance.
Formula:
Since \(\text{Var}(\mathbb{E}[X | Y]) \leq \text{Var}(X)\), estimating \(\mathbb{E}[X | Y]\) is more efficient.
Example: Asian option
Condition on geometric mean (analytical), reduce variance of arithmetic mean estimation.
Reference: [Glasserman, 2004], Section 4.3; [Curran, 1994]
Implementation: src/neutryx/engines/variance_reduction.py:conditional_mc()
Quasi-Monte Carlo Methods¶
Motivation¶
Monte Carlo uses pseudo-random numbers → \(O(N^{-1/2})\) convergence.
Quasi-Monte Carlo (QMC) uses low-discrepancy sequences → \(O((\log N)^d / N)\) convergence (for smooth integrands).
Key idea: Fill space more uniformly than random points.
Reference: [Glasserman, 2004], Chapter 5; [Niederreiter, 1992]
Implementation: src/neutryx/engines/qmc.py
Discrepancy¶
Discrepancy measures how uniformly points \(\{x_1, \ldots, x_N\}\) fill \([0,1]^d\):
Koksma-Hlawka inequality:
where \(V(f)\) is the variation of \(f\) (Hardy-Krause sense).
Low-discrepancy sequences: \(D_N^* = O((\log N)^d / N)\)
Reference: [Niederreiter, 1992]; [Glasserman, 2004], Section 5.1
Sobol Sequences¶
Sobol sequences are \((t,s)\)-sequences with excellent discrepancy properties.
Construction: - Based on digital nets and direction numbers - Use Gray code for efficient generation - Scrambling (Owen, 1995) improves convergence
Properties: - Dimension: Up to \(d = 1000+\) (practical limit) - Discrepancy: \(D_N^* = O((\log N)^d / N)\) - Effective dimensions: Finance problems often have low effective dimension
Reference: [Sobol, 1967]; [Glasserman, 2004], Section 5.2
Implementation: src/neutryx/engines/qmc.py:sobol_sequence()
Halton Sequences¶
Halton sequences use van der Corput sequences in different prime bases.
Construction: For base \(b\), reflect digits of \(n\) around decimal point.
Example: Base 2
| \(n\) | Binary | Halton |
|---|---|---|
| 1 | 1 | 0.1 = 0.5 |
| 2 | 10 | 0.01 = 0.25 |
| 3 | 11 | 0.11 = 0.75 |
| 4 | 100 | 0.001 = 0.125 |
Multidimensional: Use different primes for each dimension.
Disadvantage: Correlation between dimensions for high \(d\) (use scrambling).
Reference: [Halton, 1960]; [Glasserman, 2004], Section 5.2
Implementation: src/neutryx/engines/qmc.py:halton_sequence()
Randomized QMC¶
Randomized QMC combines benefits of QMC (fast convergence) and MC (error estimates):
- Generate QMC sequence
- Apply random shift (modulo 1)
- Repeat \(M\) times to estimate variance
Convergence: \(O(N^{-3/2})\) for smooth problems (better than both QMC and MC alone!)
Reference: [Owen, 1995]; [L'Ecuyer & Lemieux, 2002]
Multi-Level Monte Carlo¶
Motivation¶
For path-dependent options, we discretize with \(M\) time steps → discretization bias \(O(M^{-1})\).
Problem: Fine discretization (large \(M\)) is expensive.
Multi-Level MC (MLMC) [Giles, 2008]: Use multiple levels of refinement, allocate samples optimally.
Reference: [Giles, 2008]; [Glasserman, 2004], Section 8.4
Implementation: src/neutryx/engines/qmc.py:multilevel_mc()
Telescoping Sum¶
Idea: Estimate expectation at finest level \(L\) as a telescoping sum:
where \(V_\ell\) is the estimator with \(M_\ell = 2^\ell\) time steps.
Key insight: \(\text{Var}(V_\ell - V_{\ell-1}) \ll \text{Var}(V_\ell)\) (successive levels are highly correlated).
MLMC estimator:
Coupled paths: Use the same Brownian increments for \(V_\ell\) and \(V_{\ell-1}\) (maximizes correlation).
Reference: [Giles, 2008], Section 2
Complexity Analysis¶
Standard MC to achieve MSE \(\varepsilon^2\): - Need \(M = O(\varepsilon^{-1})\) time steps (discretization) - Need \(N = O(\varepsilon^{-2})\) samples (variance) - Cost: \(O(\varepsilon^{-3})\)
MLMC with optimal allocation: - Cost: \(O(\varepsilon^{-2} (\log \varepsilon)^2)\) (Lipschitz payoffs) - Cost: \(O(\varepsilon^{-2})\) (smooth payoffs)
Speedup: Factor of \(\varepsilon^{-1}\) or more!
Optimal sample allocation:
Reference: [Giles, 2008], Theorem 1
Applications¶
MLMC is highly effective for: - Path-dependent options (Asians, barriers) - SDEs with small diffusion coefficient - PDEs with stochastic coefficients
Extension: Multi-index MC (multiple discretization parameters)
Reference: [Giles, 2015]
American Options: Longstaff-Schwartz¶
Problem Formulation¶
American option: Holder can exercise at any time \(\tau \in [0, T]\).
Value:
where \(\mathcal{T}\) is the set of stopping times.
Dynamic programming (Bellman):
Continuation value: \(C(S_t) = e^{-r\Delta t} \mathbb{E}^{\mathbb{Q}}[V_{t+\Delta t} | S_t]\)
Optimal exercise: Exercise if \(g(S_t) \geq C(S_t)\)
Challenge: \(C(S_t)\) is unknown → estimate via regression.
Reference: [Longstaff & Schwartz, 2001]; [Glasserman, 2004], Section 8.6
Implementation: src/neutryx/engines/longstaff_schwartz.py
LSM Algorithm¶
Longstaff-Schwartz (LSM) algorithm [Longstaff & Schwartz, 2001]:
-
Simulate paths: Generate \(N\) paths \(\{S_t^{(i)}\}_{i=1}^N\) forward in time.
-
Backward induction: Start at maturity \(T\), work backward:
-
At time \(t_m\):
- In-the-money paths: Paths where \(g(S_t^{(i)}) > 0\)
- Regress: \(V_{t_{m+1}}^{(i)} \sim \sum_{k=1}^K \beta_k \phi_k(S_t^{(i)})\) (basis functions \(\phi_k\))
- Continuation value: \(C(S_t^{(i)}) = \sum_{k=1}^K \hat{\beta}_k \phi_k(S_t^{(i)})\)
- Decision: Exercise if \(g(S_t^{(i)}) \geq C(S_t^{(i)})\)
-
Forward pass: Compute payoff along each path using optimal exercise times.
-
Estimate: \(\hat{V}_0 = e^{-r\bar{\tau}} \bar{g}(S_{\bar{\tau}})\) (average over paths)
Reference: [Longstaff & Schwartz, 2001]; [Glasserman, 2004], Section 8.6
Basis Functions¶
Common choices for \(\phi_k(S)\):
- Power polynomials: \(1, S, S^2, S^3, \ldots\)
- Laguerre polynomials: \(L_0(S), L_1(S), L_2(S), \ldots\) $$ L_n(x) = \frac{e^x}{n!} \frac{d^n}{dx^n}(x^n e^{-x}) $$
- Hermite polynomials: \(H_0(S), H_1(S), H_2(S), \ldots\)
- Chebyshev polynomials: \(T_0(S), T_1(S), T_2(S), \ldots\)
Guideline: 3-5 basis functions typically sufficient.
Reference: [Longstaff & Schwartz, 2001], Section 2
Convergence and Bias¶
High bias: \(\hat{V}_{\text{LSM}} \leq V_{\text{true}}\) (suboptimal exercise)
Bias correction [Andersen & Broadie, 2004]: - Compute upper bound via dual formulation - Bracket true value: \(\text{Lower bound} \leq V \leq \text{Upper bound}\)
Convergence: As \(N, K \to \infty\), \(\hat{V}_{\text{LSM}} \to V\)
Reference: [Longstaff & Schwartz, 2001]; [Andersen & Broadie, 2004]
PDE Methods¶
Black-Scholes PDE¶
Recall the Black-Scholes PDE:
Terminal condition: \(V(S, T) = g(S)\)
Boundary conditions: Depend on option type (e.g., \(V(0, t) = 0\) for call)
Reference: [Wilmott et al., 1995]; [Tavella & Randall, 2000]
Implementation: src/neutryx/models/pde.py
Finite Difference Discretization¶
Grid: - Space: \(S_{\min} = S_0, S_1, \ldots, S_N = S_{\max}\) - Time: \(t_0 = 0, t_1, \ldots, t_M = T\)
Finite difference approximations:
Reference: [Wilmott et al., 1995], Chapter 10
Explicit Euler Scheme¶
Discretize in time (forward difference):
Explicit update:
Stability: Requires \(\Delta t \leq \frac{(\Delta S)^2}{\sigma^2 S_{\max}^2}\) (CFL condition) - very restrictive!
Advantage: Simple, no matrix inversion
Disadvantage: Conditionally stable (small time steps required)
Reference: [Wilmott et al., 1995], Section 10.4
Implicit Euler Scheme¶
Discretize using backward differences (implicit):
Matrix form: \(A \mathbf{V}_{j+1} = \mathbf{V}_j\)
where \(A\) is a tridiagonal matrix (Thomas algorithm: \(O(N)\)).
Stability: Unconditionally stable (can use large time steps)
Disadvantage: Requires matrix solve at each step
Reference: [Wilmott et al., 1995], Section 10.5; [Tavella & Randall, 2000]
Implementation: src/neutryx/models/pde.py:solve_pde_implicit()
Crank-Nicolson Scheme¶
Crank-Nicolson averages explicit and implicit (theta-scheme with \(\theta = 1/2\)):
where \(\mathcal{L}\) is the spatial operator.
Properties: - Second-order accurate in both time and space: \(O(\Delta t^2, \Delta S^2)\) - Unconditionally stable - No oscillations (for smooth payoffs)
Standard for European options
Reference: [Wilmott et al., 1995], Section 10.6; [Tavella & Randall, 2000]
Implementation: src/neutryx/models/pde.py:crank_nicolson()
American Options (Free Boundary)¶
Constraint: \(V(S, t) \geq g(S)\) (immediate exercise value)
Discretization: At each time step, enforce:
Projected SOR: Solve PDE subject to constraint using successive over-relaxation.
Reference: [Wilmott et al., 1995], Chapter 11; [Tavella & Randall, 2000], Chapter 5
Implementation: src/neutryx/models/pde.py:american_option_pde()
Multidimensional PDEs¶
For basket options or multi-asset derivatives:
Curse of dimensionality: Grid size grows exponentially (\(N^d\) for \(d\) assets).
ADI (Alternating Direction Implicit): Splits multi-D problem into sequence of 1D problems.
Complexity: \(O(MN^d)\) instead of \(O(MN^{2d})\) (huge savings)
Reference: [Wilmott et al., 1995], Chapter 12; [Tavella & Randall, 2000], Chapter 6
Implementation: src/neutryx/models/pde.py:adi_2d()
Fourier Methods¶
Carr-Madan Formula¶
Carr and Madan (1999) derived an FFT-based pricing formula using the characteristic function.
Modified call price (with damping \(\alpha > 0\)):
Fourier transform:
where \(\phi_T(u) = \mathbb{E}[e^{iu \log S_T}]\) is the characteristic function.
Inverse Fourier transform:
FFT: Discretize and apply FFT → \(O(N \log N)\) for \(N\) strikes.
Reference: [Carr & Madan, 1999]; [Gatheral, 2006], Chapter 2
Implementation: src/neutryx/engines/fourier.py:carr_madan_fft()
COS Method¶
Fang and Oosterlee (2008) developed the Fourier-cosine (COS) method with superior stability.
Idea: Expand the density in cosine series:
Option price:
where \(F_k\) are known analytically for European options.
Truncation: \([a, b]\) chosen based on cumulants:
(\(L = 10\) typically)
Advantages: - Fewer evaluations than Carr-Madan - Better numerical stability - Easier to implement
Reference: [Fang & Oosterlee, 2008]; [Fang & Oosterlee, 2009]
Implementation: src/neutryx/engines/fourier.py:cos_method()
Applicability¶
Fourier methods work for any model with known characteristic function: - Heston, SABR (via numerical integration) - Merton, Kou, Variance Gamma - Affine jump-diffusions - Time-changed Lévy processes
Limitation: Requires smooth characteristic function (no path-dependence in payoff).
Reference: [Carr & Madan, 1999]; [Fang & Oosterlee, 2008]
Greeks Computation¶
Finite Differences (Bump-and-Revalue)¶
Delta (first derivative):
Gamma (second derivative):
Choice of \(h\): Balance truncation error (\(O(h^2)\)) and rounding error (\(O(\varepsilon/h^2)\)).
Optimal \(h\): \(h \approx \varepsilon^{1/3} S_0\) (where \(\varepsilon\) is machine precision).
Disadvantages: - Multiple pricing evaluations (expensive for MC) - Noisy for MC (requires variance reduction)
Reference: [Glasserman, 2004], Chapter 7
Pathwise Derivatives¶
Pathwise method differentiates the simulated path with respect to parameters.
Assumption: Payoff \(g(S_T)\) is Lipschitz continuous (not discontinuous).
Formula:
For GBM:
Delta:
Advantages: - Unbiased (if payoff is smooth) - Low variance (same simulation as pricing) - No additional samples required
Disadvantages: - Requires smooth payoff (fails for digitals, barriers)
Reference: [Glasserman, 2004], Section 7.2; [Broadie & Glasserman, 1996]
Implementation: src/neutryx/engines/pathwise.py
Likelihood Ratio Method (LRM)¶
LRM (also called score function method) works for discontinuous payoffs.
Idea: Differentiate the density instead of the payoff.
where \(f(S_T; \theta)\) is the density of \(S_T\).
Score function: \(\psi(\theta) = \frac{\partial \log f}{\partial \theta}\)
Example (GBM): For \(\theta = \sigma\) (vega):
Advantages: - Works for discontinuous payoffs (digitals, barriers) - Unbiased
Disadvantages: - High variance (weights can be large) - Requires knowledge of density
Reference: [Glasserman, 2004], Section 7.3; [Broadie & Glasserman, 1996]
Automatic Differentiation (JAX)¶
JAX provides automatic differentiation (autodiff) via:
- Forward mode (tangent): Efficient for \(\mathbb{R} \to \mathbb{R}^m\)
- Reverse mode (adjoint): Efficient for \(\mathbb{R}^n \to \mathbb{R}\)
Greeks via autodiff:
import jax
delta = jax.grad(price_function, argnums=0)(S0) # ∂V/∂S0
gamma = jax.grad(jax.grad(price_function))(S0) # ∂²V/∂S0²
Advantages: - Exact derivatives (up to numerical precision) - Automatic (no manual derivation) - Fast (same complexity as function evaluation, up to small constant)
Disadvantages: - Requires differentiable code (no if/else based on computed values) - Memory overhead for reverse mode
Reference: [Bradbury et al., 2018]; [Griewank & Walther, 2008]
Implementation: All Greeks in src/neutryx/valuations/greeks/ use JAX autodiff
Summary¶
This document covered:
- Monte Carlo: Fundamental MC, convergence, standard error
- Variance Reduction: Antithetic, control variates, importance sampling, stratified, moment matching, conditional MC
- Quasi-Monte Carlo: Low-discrepancy sequences (Sobol, Halton), randomized QMC
- MLMC: Telescoping estimator, optimal allocation, complexity reduction
- LSM: Regression-based American option pricing, basis functions
- PDE: Finite differences, Crank-Nicolson, ADI for multi-asset
- Fourier: Carr-Madan FFT, COS method, characteristic functions
- Greeks: Finite differences, pathwise, LRM, automatic differentiation
All methods are rigorously implemented in Neutryx with extensive validation.
Next: Calibration Theory
References: See Bibliography for complete citations.