Project 4: Building a Universal Inference Engine
COMP 536 | Short Projects
Due: Friday April 17, 2026 by 11:59 PM PT
Collaboration Guidelines
Core inference algorithms should be independently implemented. You may absolutely talk with classmates about debugging strategies, test design, plotting choices, memo organization, and scientific interpretation. The point of this project is not to struggle alone. The point is to build a working inference pipeline that you understand.
Review the Project Submission Guide for collaboration and submission expectations.
You are building one reusable Bayesian pipeline. The cosmology model in this project is only the first target. Once your sampler and diagnostics work, you can swap in a different forward model and likelihood and use the same machinery on a completely different scientific problem.
The common baseline in this project is a validated Metropolis-Hastings inference pipeline. If you are enrolled for graduate credit, you must also implement Hamiltonian Monte Carlo (HMC) and compare it to your MCMC baseline. If you are enrolled for undergraduate credit, HMC is optional but strongly encouraged.
Start Here
If the full project feels large, do these four things in order.
- Read the Science Background so you understand what \(\Omega_m\) and \(h\) mean physically and why the supernova data constrains them.
- Implement and validate your cosmology forward model on the worked example: \(z = 0.5\), \(\Omega_m = 0.3\), \(h = 0.7\) should give \(\mu \approx 42.26\) mag.
- Implement Metropolis-Hastings and test it on a known 2D Gaussian before connecting it to the supernova likelihood.
- Only after the Gaussian test works, run inference on the JLA supernova data with multiple chains and diagnostics.
Use these companion pages as you work:
Project 4 Timeline
You have 3 weeks for a reason. This project is designed to be completed in layers, and the layers matter.
Week 1: Build the posterior ingredients
By the end of Week 1, you should have:
- read the science background
- implemented a forward model
- loaded the JLA data and covariance matrix
- passed the worked example and basic forward-model sanity checks
If you finish Week 1 without a trustworthy forward model, you are already behind.
Week 2: Validate the sampler before doing science
By the end of Week 2, you should have:
- implemented Metropolis-Hastings
- tuned a basic proposal distribution
- passed the canonical 2D Gaussian validation test
- generated trace plots and an acceptance-rate summary for that toy problem
If you reach the start of Week 3 without passing the Gaussian validation, stop expanding scope and come get help.
Week 3: Run the real analysis and write it up
By the final week, you should be doing:
- multi-chain JLA inference
- final figures
- memo writing
- graduate students only: HMC implementation and comparison
Week 3 is for analysis and polish. It is not the week to discover that your sampler has never been validated.
Learning Objectives
By the end of this project, you should be able to:
- Explain how a forward model, a likelihood, and a prior combine to define a posterior distribution.
- Implement a reusable Metropolis-Hastings sampler that works on both toy problems and real data.
- Validate an inference code in stages: forward model first, sampler second, science analysis third.
- Diagnose whether chains are mixing and converging using trace plots, acceptance rates, autocorrelation, effective sample size, and multi-chain checks.
- Interpret posterior constraints on \(\Omega_m\) and \(h\) in physical language, not just as numbers on a plot.
- Connect Project 2’s leapfrog ideas to HMC if you complete the graduate extension.
Project Overview
This project is the capstone of the short-project sequence because it asks you to do something deeper than “fit a dataset.” You are building inference machinery. The same architecture you build here could later be used for exoplanet transits, stellar population fitting, dark matter halo models, or any other problem where you have a forward model and uncertain data.
For this project, the scientific application is Type Ia supernova cosmology. Your data are binned JLA distance modulus measurements with a full covariance matrix. Your job is to infer the posterior distribution of \((\Omega_m, h)\) in a flat universe. To see why this matters physically, start with the Science Background.
The two-layer architecture
The project is easiest to manage if you keep the code modular.
- Problem-specific layer:
cosmology.pyandlikelihood.py - Reusable inference layer:
mcmc.py,diagnostics.py, andhmc.pyif you complete the graduate lane
That separation is the big idea. The cosmology pieces tell the code what the universe predicts. The sampler and diagnostics tell the code how to explore a posterior distribution.
Common Core for Everyone
Every student should complete the following baseline.
| Module | What it does | Required for everyone? |
|---|---|---|
cosmology.py |
Computes luminosity distance and distance modulus | Yes |
likelihood.py |
Loads the JLA data and evaluates \(\log p(\theta \mid D)\) | Yes |
mcmc.py |
Implements Metropolis-Hastings | Yes |
diagnostics.py |
Computes core chain diagnostics | Yes |
hmc.py |
Implements Hamiltonian Monte Carlo | Graduate required, optional undergrad |
Phase 1: Forward Model and Likelihood
Before you do any sampling, you need a trustworthy posterior.
Provided data files
The supernova dataset for this project is already included in this folder, so you do not need to hunt it down elsewhere. Download or inspect these files directly:
- jla_mub.txt — binned redshift and distance-modulus data
- jla_mub_covmatrix.txt — the full \(31 \times 31\) covariance matrix
To see why both files matter physically, revisit the Science Background, especially the discussion of correlated uncertainties in the JLA sample.
1.1 Forward model
Implement a function that predicts the distance modulus \(\mu(z; \Omega_m, h)\) for a flat universe. Your production implementation may use:
- direct numerical integration of \[ D_L(z) = \frac{c(1+z)}{H_0}\int_0^z \frac{dz'}{\sqrt{\Omega_m(1+z')^3 + (1-\Omega_m)}} \]
- or the Pen (1999) approximation discussed in the draft materials
For Project 4, you need one correct forward-model implementation first. Do not delay the rest of the project because you are trying to perfect two versions at once.
- Required for everyone: one working flat-universe implementation that passes the worked example
- Recommended for stronger validation: compare numerical integration to the Pen approximation after the first version works
- Good A-level / graduate depth: implement both and compare speed and accuracy
At minimum, your forward model should satisfy:
- \(D_L(0) = 0\)
- the worked example \(z = 0.5\), \(\Omega_m = 0.3\), \(h = 0.7\) gives \(\mu \approx 42.26 \pm 0.02\) mag
- predicted \(\mu(z)\) increases smoothly with redshift over the JLA range
1.2 Likelihood and prior
Implement the log-likelihood using the full covariance matrix:
\[ \ln \mathcal{L}(\theta) = -\frac{1}{2}\mathbf{r}^\top \mathbf{C}^{-1}\mathbf{r} \]
with residual vector
\[ r_i = \mu_i^{\mathrm{obs}} - \mu_i^{\mathrm{theory}}(z_i; \theta). \]
Use Cholesky factorization rather than explicitly inverting \(\mathbf{C}\). That is both more stable and more professional.
Use flat priors with physically sensible bounds:
- \(\Omega_m \in [0.0, 1.2]\)
- \(h \in [0.5, 0.9]\)
Return -np.inf outside the allowed region so your sampler automatically rejects those proposals.
If you replace the full covariance matrix with a diagonal approximation, your uncertainty estimates will be wrong. This project is partly about learning what a real scientific likelihood looks like.
Phase 2: Validate Metropolis-Hastings on a Toy Gaussian
Do not connect your sampler to the supernova data first. That makes debugging much harder because you would be debugging the sampler and the cosmology model at the same time.
Instead, define a known 2D Gaussian target distribution with a specified mean and covariance. Your Metropolis-Hastings implementation should recover:
- the correct sample mean
- the correct sample covariance
- a reasonable acceptance rate after tuning, typically in the 20% to 50% range
This is your first major validation gate. If your toy Gaussian test fails, stop there and fix it before moving on.
Canonical Gaussian test case
Use this exact toy problem unless you have a strong reason not to. The point is to remove ambiguity, not to make you invent your own debugging target.
Use
\[ \boldsymbol{\mu} = \begin{pmatrix} 0.30 \\ 0.70 \end{pmatrix} \]
and
\[ \Sigma = \begin{pmatrix} 0.04^2 & -0.00048 \\ -0.00048 & 0.02^2 \end{pmatrix}. \]
This gives you a moderately negatively correlated 2D Gaussian, which is useful because it previews the kind of posterior geometry you will see in the real JLA problem.
Recommended starting choices:
- initial point: \(\theta^{(0)} = (0.50, 0.60)\)
- initial proposal covariance: \(\mathrm{diag}(0.02^2, 0.01^2)\)
- short tuning run: 2,000 steps
- production validation run: 20,000 steps
Treat the Gaussian validation as successful when all of the following are true:
- the sample mean is within about 0.01 of the true mean in each parameter
- the sample covariance is within about 20% of the target covariance
- the acceptance rate lands in the 20% to 50% range after tuning
- the trace plots look stationary rather than sticky or trending
If you cannot recover this Gaussian, you are not ready to run the supernova posterior yet.
Required Metropolis-Hastings interface
Your mcmc.py module should expose a reusable sampler, for example:
metropolis_hastings(log_prob_fn, theta_init, proposal_cov, n_steps, args=())
The sampler should return:
- the full chain
- an acceptance-rate summary
You already have the mathematical ingredients from Module 5. The hard part here is software discipline: keep the sampler general and keep the problem-specific logic outside it.
Phase 3: Run JLA Inference with MCMC
Once the Gaussian test works, connect the sampler to your cosmology posterior and infer \((\Omega_m, h)\) from the JLA sample.
If you reach the start of Week 3 without passing the canonical Gaussian validation, you are behind. Do not jump to HMC, do not polish figures, and do not keep guessing. Come to office hours or lab and fix the sampler first.
Minimum MCMC workflow
Run at least 3 to 4 independent chains with different random seeds. For each chain:
- start from a physically plausible point
- tune the proposal covariance on short runs
- run a production chain
- discard burn-in before computing final summaries
Core diagnostics
Your baseline diagnostics should include:
- trace plots for both parameters
- acceptance rate
- autocorrelation or effective sample size
- one multi-chain convergence check, preferably split-\(\hat{R}\)
The goal is not to collect diagnostics for their own sake. The goal is to justify that your posterior summaries come from chains that actually explored the target distribution.
Phase 4: Analyze the Posterior
Use your converged MCMC chains to answer the scientific question: what does the supernova dataset say about \((\Omega_m, h)\)?
Your analysis should include:
- posterior summaries for \(\Omega_m\) and \(h\)
- a corner plot showing marginal and joint constraints
- a plot of \(\mu(z)\) with the data and posterior model realizations
- a short discussion of the correlation between \(\Omega_m\) and \(h\)
- a comparison to literature values at the level of scientific interpretation, not heroics
You are not expected to resolve the Hubble Tension. You are expected to explain what your posterior says, how uncertain it is, and why the result is scientifically interesting.
Graduate HMC Lane
If you are taking the course for graduate credit, continue after the MCMC baseline works.
Implement HMC in hmc.py using the same posterior and diagnostics pipeline. This is where the connection to Project 2 becomes explicit: HMC uses leapfrog integration in parameter space.
Your graduate HMC work must include:
- a gradient implementation, usually finite differences unless you choose an autodiff extension
- basic HMC tuning of step size \(\epsilon\) and trajectory length \(L\)
- an energy check such as a \(\Delta H\) histogram or summary
- a direct comparison between MCMC and HMC on the same posterior
That comparison should discuss mixing and efficiency, not just whether the two methods land in the same region.
If you are an undergraduate student and want an extension, HMC is the most natural next step. It is absolutely worth trying once your MCMC pipeline is working.
Deliverables
Your submission should include the following pieces.
Code repository
Your repo should follow the contract in the Starter Code Structure page. At minimum, it must include:
- a root-level
run.py - modular source files
- a small test suite
- generated figures in
outputs/ - a
README.mdexplaining how to reproduce your work
Research memo
Write a technical memo that explains:
- how you validated the forward model
- how you validated the sampler
- what posterior constraints you found
- what those constraints mean physically
- for graduate students, how HMC compares to MCMC
This is a computational science memo. I care about method, evidence, and interpretation more than polished rhetoric.
Growth memo
Submit the standard growth memo for the course. Be specific about what was difficult, what clicked, and which debugging habits actually helped you.
Required Figures
Your final submission should include enough figures to prove that the pipeline works. For most students, that means at least:
- one validation figure or table for the forward model or toy Gaussian test
- trace plots for the production MCMC chains
- a corner plot for \((\Omega_m, h)\)
- a data vs. model plot for \(\mu(z)\)
Graduate students should also include an HMC-specific validation or comparison figure, such as:
- \(\Delta H\) distribution
- side-by-side trace plots
- autocorrelation comparison
- ESS-per-second comparison
See the Expectations & Grading page for what stronger or weaker evidence looks like.
Validation Gates
Do not keep piling on complexity if one of these fails.
- Gate 1: Forward model passes the worked example and basic sanity checks.
- Gate 2: Metropolis-Hastings recovers a known 2D Gaussian.
- Gate 3: JLA chains mix and produce finite, stable posterior summaries.
- Gate 4: Graduate students only: HMC shows reasonable energy behavior and comparable posterior results.
This project is much more manageable if you respect those gates. The students who get into trouble are usually the ones who skip validation and try to debug the entire pipeline at once.
Extension Ideas
Once the baseline is working, you can push further. Good extensions include:
- JAX autodiff for gradients
- non-flat cosmology
- NUTS
- importance reweighting with an informative prior on \(h\)
- your own scientifically motivated comparison
For undergraduates, extensions are optional. For graduate students, the HMC lane already counts as the required advanced component, and any work beyond that is extra depth.
Closing Advice
This project rewards steady progress much more than last-minute heroics. If you can make the forward model trustworthy and the toy Gaussian sampler work in the first week, the rest of the project becomes much more manageable. If you skip those steps, everything later becomes harder than it needs to be.
Build the inference engine in layers. Validate each layer. Then let the science question come into focus.