Project 4: Expectations & Grading

COMP 536 | Short Projects

Author

Dr. Anna Rosen

Published

April 28, 2026

How This Project Is Graded

There is no point-by-point rubric. Your grade reflects what you demonstrated you can do and how convincingly you demonstrated it. Scope matters, but scope is not the whole story. Clean validation, correct results, thoughtful interpretation, readable code, and strong figures matter just as much as how many methods you implemented.

The tiers below describe the scope ceiling for the work you completed. Within a tier, quality still matters:

a technically correct but lightly interpreted submission can land lower in the tier
a carefully validated and clearly explained submission can land higher in the tier
incorrect results pull the grade down quickly, even if the repo looks ambitious

The tiers are cumulative: B includes the foundations from C, and A includes the expectations from B.

Graduate overlay

The tier descriptions below describe the shared Project 4 baseline. If you are taking the course for graduate credit, you must also complete the HMC lane described on this page. A graduate submission that omits HMC is missing a required part of the project, even if the MCMC baseline is strong.

C — Satisfactory

Scope: a validated inference foundation that actually works.

This means:

your forward model produces sensible \(\mu(z)\) values and passes the worked example at \(z = 0.5\), \(\Omega_m = 0.3\), \(h = 0.7\)
your likelihood uses the full covariance matrix with a stable solve such as Cholesky factorization
your Metropolis-Hastings sampler is validated on a known 2D Gaussian before you use it on the supernova problem
you run the JLA likelihood successfully and obtain a sensible posterior region for \((\Omega_m, h)\)
your repo is organized enough that I can run the project and understand what the main pieces do
your memo shows the validation evidence and explains what the posterior is telling you at a basic scientific level

What this signals: You built the core inference machinery and proved that it is doing something real rather than producing arbitrary chains.

What moves you within this tier: Whether your validation is quantitative instead of hand-wavy. Whether your figures actually show that the sampler works. Whether your memo explains why the covariance matrix matters, why the toy Gaussian test comes first, and what your posterior means physically.

B — Good

Scope: a complete MCMC analysis of the JLA problem with convincing diagnostics.

Everything in C, plus:

multiple JLA chains with different random seeds
tuned proposal covariance with a justified acceptance-rate range
trace plots for both parameters
a corner plot showing joint and marginal posterior structure
a data-versus-model plot showing the supernova data against posterior predictions
quantitative diagnostics such as autocorrelation, effective sample size, and a multi-chain convergence check like split-\(\hat{R}\)
posterior summaries with credible intervals and a short discussion of the \(\Omega_m\)-\(h\) correlation

What this signals: You moved from “the code runs” to “the inference is defendable.” You are treating the chains as scientific evidence, not as decorative output.

What moves you within this tier: Clean diagnostics, stronger figures, better memo structure, and clearer reasoning about why your posterior looks the way it does. A good B-level submission makes it easy for a reader to see that the pipeline is reproducible and that the analysis choices were intentional.

A — Excellent

Scope: the full MCMC project executed with care, validation discipline, and scientific clarity.

Everything in B, plus:

especially strong validation evidence for the forward model and the sampler
careful explanation of tuning choices and convergence decisions
polished, readable figures with informative captions
a memo that interprets the results rather than merely narrating the workflow
evidence that you understand the limitations of the analysis as well as its conclusions

Examples of work that often strengthens an A-level submission:

comparing two forward-model implementations, such as numerical integration versus the Pen approximation
a sharper efficiency discussion using ESS, autocorrelation length, or wall-clock comparisons within the MCMC lane
especially clear discussion of why \(\Omega_m\) and \(h\) are correlated in this dataset

What this signals: You built a trustworthy inference instrument, not just a script that produced plots.

For undergraduates, an excellent MCMC baseline can absolutely earn an A without HMC. For graduate students, the HMC lane below is part of the required scope.

Graduate Students: Required HMC Lane

If you are enrolled for graduate credit, you must go beyond the MCMC baseline.

Your graduate submission must also include:

an HMC implementation that samples the same posterior as your MCMC code
a gradient strategy that you can explain and defend
evidence that you checked HMC behavior, such as \(\Delta H\) summaries or an energy histogram
a direct MCMC-versus-HMC comparison using at least mixing, autocorrelation, or ESS-style evidence

I do not expect perfect industrial-strength HMC. I do expect a serious attempt that shows you understand why HMC exists and how it differs from random-walk Metropolis-Hastings.

What strong graduate work looks like: HMC reaches the same posterior region as MCMC, mixes more efficiently on the same problem, and the memo explains both the benefits and the tuning pain points.

Beyond A — Extensions

Once the baseline is working, there is plenty of room to explore further. Strong extensions include:

JAX autodiff for gradients
non-flat cosmology
NUTS
informative-prior reweighting
your own scientifically motivated experiment

For undergraduates, these are optional enrichment. For graduate students, these are additional extensions beyond the required HMC lane.

How to Succeed

The students who struggle most on this project usually make the same mistake: they try to debug the cosmology model, the likelihood, and the sampler all at once. Do not do that.

Project 4 Pacing Checkpoints

You have 3 weeks because this project has three distinct phases. Treat them that way.

End of Week 1

By the end of Week 1, you should have:

a working forward model
the JLA data and covariance matrix loading correctly
the worked example and basic forward-model sanity checks passing

If you do not have that by the end of Week 1, you are behind.

End of Week 2

By the end of Week 2, you should have:

a working Metropolis-Hastings sampler
a tuned proposal distribution
the canonical 2D Gaussian validation test passing
toy-problem trace plots that look usable

If you reach the start of Week 3 without passing the Gaussian test, you are behind. That is the moment to get help, not the moment to add HMC or polish figures.

Week 3

Week 3 should be for:

multi-chain JLA inference
figure generation
memo writing
graduate students only: HMC implementation and comparison

Week 3 is not the week to discover whether your sampler works.

Build the project in layers:

first make the forward model trustworthy
then make Metropolis-Hastings work on a toy Gaussian
then connect to the JLA likelihood
then add multi-chain diagnostics
then, for graduate students, move to HMC

If one layer is not validated, do not move on. That is not caution. That is time management.