Project 4: Building a Universal Inference Engine

COMP 536 | Short Projects

Author

Dr. Anna Rosen

Published

April 28, 2026

Due: Friday April 17, 2026 by 11:59 PM PT

Collaboration Guidelines

Core inference algorithms should be independently implemented. You may absolutely talk with classmates about debugging strategies, test design, plotting choices, memo organization, and scientific interpretation. The point of this project is not to struggle alone. The point is to build a working inference pipeline that you understand.

Review the Project Submission Guide for collaboration and submission expectations.

One inference engine, many scientific problems

You are building one reusable Bayesian pipeline. The cosmology model in this project is only the first target. Once your sampler and diagnostics work, you can swap in a different forward model and likelihood and use the same machinery on a completely different scientific problem.

Graduate requirement

The common baseline in this project is a validated Metropolis-Hastings inference pipeline. If you are enrolled for graduate credit, you must also implement Hamiltonian Monte Carlo (HMC) and compare it to your MCMC baseline. If you are enrolled for undergraduate credit, HMC is optional but strongly encouraged.

Start Here

If the full project feels large, do these four things in order.

Read the Science Background so you understand what \(\Omega_m\) and \(h\) mean physically and why the supernova data constrains them.
Implement and validate your cosmology forward model on the worked example: \(z = 0.5\), \(\Omega_m = 0.3\), \(h = 0.7\) should give \(\mu \approx 42.26\) mag.
Implement Metropolis-Hastings and test it on a known 2D Gaussian before connecting it to the supernova likelihood.
Only after the Gaussian test works, run inference on the JLA supernova data with multiple chains and diagnostics.

Use these companion pages as you work:

Project 4 Timeline

You have 3 weeks for a reason. This project is designed to be completed in layers, and the layers matter.

Week 1: Build the posterior ingredients

By the end of Week 1, you should have:

read the science background
implemented a forward model
loaded the JLA data and covariance matrix
passed the worked example and basic forward-model sanity checks

If you finish Week 1 without a trustworthy forward model, you are already behind.

Week 2: Validate the sampler before doing science

By the end of Week 2, you should have:

implemented Metropolis-Hastings
tuned a basic proposal distribution
passed the canonical 2D Gaussian validation test
generated trace plots and an acceptance-rate summary for that toy problem

If you reach the start of Week 3 without passing the Gaussian validation, stop expanding scope and come get help.

Week 3: Run the real analysis and write it up

By the final week, you should be doing:

multi-chain JLA inference
final figures
memo writing
graduate students only: HMC implementation and comparison

Week 3 is for analysis and polish. It is not the week to discover that your sampler has never been validated.

Learning Objectives

By the end of this project, you should be able to:

Explain how a forward model, a likelihood, and a prior combine to define a posterior distribution.
Implement a reusable Metropolis-Hastings sampler that works on both toy problems and real data.
Validate an inference code in stages: forward model first, sampler second, science analysis third.
Diagnose whether chains are mixing and converging using trace plots, acceptance rates, autocorrelation, effective sample size, and multi-chain checks.
Interpret posterior constraints on \(\Omega_m\) and \(h\) in physical language, not just as numbers on a plot.
Connect Project 2’s leapfrog ideas to HMC if you complete the graduate extension.

Project Overview

This project is the capstone of the short-project sequence because it asks you to do something deeper than “fit a dataset.” You are building inference machinery. The same architecture you build here could later be used for exoplanet transits, stellar population fitting, dark matter halo models, or any other problem where you have a forward model and uncertain data.

For this project, the scientific application is Type Ia supernova cosmology. Your data are binned JLA distance modulus measurements with a full covariance matrix. Your job is to infer the posterior distribution of \((\Omega_m, h)\) in a flat universe. To see why this matters physically, start with the Science Background.

The two-layer architecture

The project is easiest to manage if you keep the code modular.

Problem-specific layer: cosmology.py and likelihood.py
Reusable inference layer: mcmc.py, diagnostics.py, and hmc.py if you complete the graduate lane

That separation is the big idea. The cosmology pieces tell the code what the universe predicts. The sampler and diagnostics tell the code how to explore a posterior distribution.

Common Core for Everyone

Every student should complete the following baseline.

Module	What it does	Required for everyone?
`cosmology.py`	Computes luminosity distance and distance modulus	Yes
`likelihood.py`	Loads the JLA data and evaluates \(\log p(\theta \mid D)\)	Yes
`mcmc.py`	Implements Metropolis-Hastings	Yes
`diagnostics.py`	Computes core chain diagnostics	Yes
`hmc.py`	Implements Hamiltonian Monte Carlo	Graduate required, optional undergrad

Phase 1: Forward Model and Likelihood

Before you do any sampling, you need a trustworthy posterior.

Provided data files

The supernova dataset for this project is already included in this folder, so you do not need to hunt it down elsewhere. Download or inspect these files directly:

jla_mub.txt — binned redshift and distance-modulus data
jla_mub_covmatrix.txt — the full \(31 \times 31\) covariance matrix

To see why both files matter physically, revisit the Science Background, especially the discussion of correlated uncertainties in the JLA sample.

1.1 Forward model

Implement a function that predicts the distance modulus \(\mu(z; \Omega_m, h)\) for a flat universe. Your production implementation may use:

direct numerical integration of \[ D_L(z) = \frac{c(1+z)}{H_0}\int_0^z \frac{dz'}{\sqrt{\Omega_m(1+z')^3 + (1-\Omega_m)}} \]
or the Pen (1999) approximation discussed in the draft materials

Baseline rule for the forward model

For Project 4, you need one correct forward-model implementation first. Do not delay the rest of the project because you are trying to perfect two versions at once.

Required for everyone: one working flat-universe implementation that passes the worked example
Recommended for stronger validation: compare numerical integration to the Pen approximation after the first version works
Good A-level / graduate depth: implement both and compare speed and accuracy

At minimum, your forward model should satisfy:

\(D_L(0) = 0\)
the worked example \(z = 0.5\), \(\Omega_m = 0.3\), \(h = 0.7\) gives \(\mu \approx 42.26 \pm 0.02\) mag
predicted \(\mu(z)\) increases smoothly with redshift over the JLA range

1.2 Likelihood and prior

Implement the log-likelihood using the full covariance matrix:

\[ \ln \mathcal{L}(\theta) = -\frac{1}{2}\mathbf{r}^\top \mathbf{C}^{-1}\mathbf{r} \]

with residual vector

\[ r_i = \mu_i^{\mathrm{obs}} - \mu_i^{\mathrm{theory}}(z_i; \theta). \]

Use Cholesky factorization rather than explicitly inverting \(\mathbf{C}\). That is both more stable and more professional.

Use flat priors with physically sensible bounds:

\(\Omega_m \in [0.0, 1.2]\)
\(h \in [0.5, 0.9]\)

Return -np.inf outside the allowed region so your sampler automatically rejects those proposals.

Do not skip the covariance matrix

If you replace the full covariance matrix with a diagonal approximation, your uncertainty estimates will be wrong. This project is partly about learning what a real scientific likelihood looks like.

Phase 2: Validate Metropolis-Hastings on a Toy Gaussian

Do not connect your sampler to the supernova data first. That makes debugging much harder because you would be debugging the sampler and the cosmology model at the same time.

Instead, define a known 2D Gaussian target distribution with a specified mean and covariance. Your Metropolis-Hastings implementation should recover:

the correct sample mean
the correct sample covariance
a reasonable acceptance rate after tuning, typically in the 20% to 50% range

This is your first major validation gate. If your toy Gaussian test fails, stop there and fix it before moving on.

Canonical Gaussian test case

Use this exact toy problem unless you have a strong reason not to. The point is to remove ambiguity, not to make you invent your own debugging target.

Use

\[ \boldsymbol{\mu} = \begin{pmatrix} 0.30 \\ 0.70 \end{pmatrix} \]

and

\[ \Sigma = \begin{pmatrix} 0.04^2 & -0.00048 \\ -0.00048 & 0.02^2 \end{pmatrix}. \]

This gives you a moderately negatively correlated 2D Gaussian, which is useful because it previews the kind of posterior geometry you will see in the real JLA problem.

Recommended starting choices:

initial point: \(\theta^{(0)} = (0.50, 0.60)\)
initial proposal covariance: \(\mathrm{diag}(0.02^2, 0.01^2)\)
short tuning run: 2,000 steps
production validation run: 20,000 steps

Treat the Gaussian validation as successful when all of the following are true:

the sample mean is within about 0.01 of the true mean in each parameter
the sample covariance is within about 20% of the target covariance
the acceptance rate lands in the 20% to 50% range after tuning
the trace plots look stationary rather than sticky or trending

If you cannot recover this Gaussian, you are not ready to run the supernova posterior yet.

Required Metropolis-Hastings interface

Your mcmc.py module should expose a reusable sampler, for example:

metropolis_hastings(log_prob_fn, theta_init, proposal_cov, n_steps, args=())

The sampler should return:

the full chain
an acceptance-rate summary

You already have the mathematical ingredients from Module 5. The hard part here is software discipline: keep the sampler general and keep the problem-specific logic outside it.

Phase 3: Run JLA Inference with MCMC

Once the Gaussian test works, connect the sampler to your cosmology posterior and infer \((\Omega_m, h)\) from the JLA sample.

Red-flag rule

If you reach the start of Week 3 without passing the canonical Gaussian validation, you are behind. Do not jump to HMC, do not polish figures, and do not keep guessing. Come to office hours or lab and fix the sampler first.

Minimum MCMC workflow

Run at least 3 to 4 independent chains with different random seeds. For each chain:

start from a physically plausible point
tune the proposal covariance on short runs
run a production chain
discard burn-in before computing final summaries

Core diagnostics

Your baseline diagnostics should include:

trace plots for both parameters
acceptance rate
autocorrelation or effective sample size
one multi-chain convergence check, preferably split-\(\hat{R}\)

The goal is not to collect diagnostics for their own sake. The goal is to justify that your posterior summaries come from chains that actually explored the target distribution.

Phase 4: Analyze the Posterior

Use your converged MCMC chains to answer the scientific question: what does the supernova dataset say about \((\Omega_m, h)\)?

Your analysis should include:

posterior summaries for \(\Omega_m\) and \(h\)
a corner plot showing marginal and joint constraints
a plot of \(\mu(z)\) with the data and posterior model realizations
a short discussion of the correlation between \(\Omega_m\) and \(h\)
a comparison to literature values at the level of scientific interpretation, not heroics

You are not expected to resolve the Hubble Tension. You are expected to explain what your posterior says, how uncertain it is, and why the result is scientifically interesting.

Graduate HMC Lane

If you are taking the course for graduate credit, continue after the MCMC baseline works.

Implement HMC in hmc.py using the same posterior and diagnostics pipeline. This is where the connection to Project 2 becomes explicit: HMC uses leapfrog integration in parameter space.

Your graduate HMC work must include:

a gradient implementation, usually finite differences unless you choose an autodiff extension
basic HMC tuning of step size \(\epsilon\) and trajectory length \(L\)
an energy check such as a \(\Delta H\) histogram or summary
a direct comparison between MCMC and HMC on the same posterior

That comparison should discuss mixing and efficiency, not just whether the two methods land in the same region.

Undergraduate option

If you are an undergraduate student and want an extension, HMC is the most natural next step. It is absolutely worth trying once your MCMC pipeline is working.

Deliverables

Your submission should include the following pieces.

Code repository

Your repo should follow the contract in the Starter Code Structure page. At minimum, it must include:

a root-level run.py
modular source files
a small test suite
generated figures in outputs/
a README.md explaining how to reproduce your work

Research memo

Write a technical memo that explains:

how you validated the forward model
how you validated the sampler
what posterior constraints you found
what those constraints mean physically
for graduate students, how HMC compares to MCMC

This is a computational science memo. I care about method, evidence, and interpretation more than polished rhetoric.

Growth memo

Submit the standard growth memo for the course. Be specific about what was difficult, what clicked, and which debugging habits actually helped you.

Required Figures

Your final submission should include enough figures to prove that the pipeline works. For most students, that means at least:

one validation figure or table for the forward model or toy Gaussian test
trace plots for the production MCMC chains
a corner plot for \((\Omega_m, h)\)
a data vs. model plot for \(\mu(z)\)

Graduate students should also include an HMC-specific validation or comparison figure, such as:

\(\Delta H\) distribution
side-by-side trace plots
autocorrelation comparison
ESS-per-second comparison

See the Expectations & Grading page for what stronger or weaker evidence looks like.

Validation Gates

Do not keep piling on complexity if one of these fails.

Gate 1: Forward model passes the worked example and basic sanity checks.
Gate 2: Metropolis-Hastings recovers a known 2D Gaussian.
Gate 3: JLA chains mix and produce finite, stable posterior summaries.
Gate 4: Graduate students only: HMC shows reasonable energy behavior and comparable posterior results.

This project is much more manageable if you respect those gates. The students who get into trouble are usually the ones who skip validation and try to debug the entire pipeline at once.

Extension Ideas

Once the baseline is working, you can push further. Good extensions include:

JAX autodiff for gradients
non-flat cosmology
NUTS
importance reweighting with an informative prior on \(h\)
your own scientifically motivated comparison

For undergraduates, extensions are optional. For graduate students, the HMC lane already counts as the required advanced component, and any work beyond that is extra depth.

Closing Advice

This project rewards steady progress much more than last-minute heroics. If you can make the forward model trustworthy and the toy Gaussian sampler work in the first week, the rest of the project becomes much more manageable. If you skip those steps, everything later becomes harder than it needs to be.

Build the inference engine in layers. Validate each layer. Then let the science question come into focus.