Final Project Getting Started Guide

COMP 536 | Day-One Checklist

Author

Dr. Anna Rosen

Published

April 28, 2026

Purpose

Use this page on the first day you work on the final project. The goal is not to finish the whole pipeline immediately. The goal is to make the baseline path real enough that every later step has somewhere to attach.

Baseline before stretch goals

Keep the baseline intact:

validated JAX-native Leapfrog simulator,
small reproducible dataset over \((Q_0, a)\),
baseline emulator with train-only normalization,
held-out evaluation,
one parameter-recovery example.

If you get behind, cut optional complexity before cutting any of those five pieces.

Step 1: Choose The Scientific Lane

Write down the version of the problem you are solving:

Which parameters vary? The recommended default is \((Q_0, a)\).
Which quantities are fixed? Usually particle count \(N\), mass model, softening, timestep policy, and integration time.
Which summary statistics will your emulator predict? A strong default is \(f_{\rm bound}\), \(\sigma_v\), and \(r_h\).

Do this before writing new code. It prevents you from building a simulator whose outputs do not match the emulator and inference task.

Step 2: Rebuild The Project 2 Core In JAX

Start with the smallest trustworthy version of your old Project 2 model.

Project 2 idea	Day-one JAX target
Positions, velocities, masses stored in objects or lists	Arrays inside a simple state representation
Leapfrog logic inside a mutable simulation object	Pure function: `new_state = leapfrog_step(state, params)`
Force loop that is easy to read	JAX array version that you can compare against the old result
Notebook-only diagnostics	Reusable validation function or script

Do not begin with performance tuning. First make the state, acceleration, and Leapfrog step readable and testable.

Step 3: Validate Before Data Generation

Before training an emulator, produce at least one small validation artifact:

a simple small-\(N\) trajectory or interaction that behaves qualitatively as expected,
a bounded energy plot or table,
a timestep comparison using at least two fixed timestep choices,
a command or script that regenerates the evidence.

You do not need adaptive timestepping. A fixed timestep is acceptable if your validation supports it.

Step 4: Make A Tiny Debug Dataset

Before running a full Latin Hypercube design, generate a tiny end-to-end dataset with about 5 simulations. The goal is to test the plumbing:

simulator runs,
summary statistics are finite and physically plausible,
dataset saves and reloads,
emulator code can read the file.

This debug dataset is allowed to be too small for science. It is a systems check.

Step 5: Plan The Real Dataset

For the real emulator dataset, use Latin Hypercube Sampling or another defensible space-filling design over \((Q_0, a)\).

Keep the split roles separate:

Training: fit emulator parameters.
Validation/calibration: choose training settings, estimate likelihood widths, and check uncertainty behavior.
Held-out test: final reporting and parameter-recovery examples after choices are fixed.

Save a table with at least:

Field	Example
`run_id`	`run_0001`
`Q0`	`0.85`
`a`	`1.20`
`seed`	`4301`
`split`	`train`
`f_bound`	`0.72`
`sigma_v`	`1.84`
`r_h`	`3.15`
`status`	`ok`

Also record units, particle count, timestep, integration time, softening, and the code version used to generate the dataset.

Step 6: Train One Tiny Emulator

Start with the smallest working neural emulator:

inputs: \((Q_0, a)\),
outputs: chosen summary statistics,
model: small MLP in Equinox,
optimizer: Optax Adam,
normalization: compute means and standard deviations from training data only.

Your first goal is not to win. Your first goal is to beat a simple baseline and produce a predicted-vs-true plot on data that was not used for training.

Step 7: Add Inference Only After Evaluation

Use NumPyro only after the emulator is credible enough to be a forward model.

Before running NUTS, you should know:

what prior range you are using for \((Q_0, a)\),
what summary statistics count as the observation,
what likelihood width you are using and how it was calibrated,
which held-out case has known true parameters.

If the posterior misses the truth, debug in this order: simulator, dataset, emulator, likelihood width, summary statistics, sampler settings.

Day-One Checklist

I can describe the scientific lane in 3 - 5 sentences.
I know which Project 2 physics and validation ideas I am carrying forward.
I have a JAX-native state representation plan.
I have one simulator validation artifact planned.
I have a tiny debug dataset plan.
I have chosen Latin Hypercube Sampling or another space-filling design for the real dataset.
I know the fields my dataset table will contain.
I know which simple baseline the neural emulator must beat.
I will not run inference until the simulator and emulator have earned trust.

Good First Commands

Your exact commands may differ, but a strong project usually supports something like:

python run.py --help
python run.py validate
python run.py generate-debug-data
python run.py train-emulator
python run.py make-figures

The names matter less than the principle: a reader should not need to reverse-engineer your workflow from notebook cells.