Final Project Getting Started Guide
COMP 536 | Day-One Checklist
Purpose
Use this page on the first day you work on the final project. The goal is not to finish the whole pipeline immediately. The goal is to make the baseline path real enough that every later step has somewhere to attach.
Keep the baseline intact:
- validated JAX-native Leapfrog simulator,
- small reproducible dataset over \((Q_0, a)\),
- baseline emulator with train-only normalization,
- held-out evaluation,
- one parameter-recovery example.
If you get behind, cut optional complexity before cutting any of those five pieces.
Step 1: Choose The Scientific Lane
Write down the version of the problem you are solving:
- Which parameters vary? The recommended default is \((Q_0, a)\).
- Which quantities are fixed? Usually particle count \(N\), mass model, softening, timestep policy, and integration time.
- Which summary statistics will your emulator predict? A strong default is \(f_{\rm bound}\), \(\sigma_v\), and \(r_h\).
Do this before writing new code. It prevents you from building a simulator whose outputs do not match the emulator and inference task.
Step 2: Rebuild The Project 2 Core In JAX
Start with the smallest trustworthy version of your old Project 2 model.
| Project 2 idea | Day-one JAX target |
|---|---|
| Positions, velocities, masses stored in objects or lists | Arrays inside a simple state representation |
| Leapfrog logic inside a mutable simulation object | Pure function: new_state = leapfrog_step(state, params) |
| Force loop that is easy to read | JAX array version that you can compare against the old result |
| Notebook-only diagnostics | Reusable validation function or script |
Do not begin with performance tuning. First make the state, acceleration, and Leapfrog step readable and testable.
Step 3: Validate Before Data Generation
Before training an emulator, produce at least one small validation artifact:
- a simple small-\(N\) trajectory or interaction that behaves qualitatively as expected,
- a bounded energy plot or table,
- a timestep comparison using at least two fixed timestep choices,
- a command or script that regenerates the evidence.
You do not need adaptive timestepping. A fixed timestep is acceptable if your validation supports it.
Step 4: Make A Tiny Debug Dataset
Before running a full Latin Hypercube design, generate a tiny end-to-end dataset with about 5 simulations. The goal is to test the plumbing:
- simulator runs,
- summary statistics are finite and physically plausible,
- dataset saves and reloads,
- emulator code can read the file.
This debug dataset is allowed to be too small for science. It is a systems check.
Step 5: Plan The Real Dataset
For the real emulator dataset, use Latin Hypercube Sampling or another defensible space-filling design over \((Q_0, a)\).
Keep the split roles separate:
- Training: fit emulator parameters.
- Validation/calibration: choose training settings, estimate likelihood widths, and check uncertainty behavior.
- Held-out test: final reporting and parameter-recovery examples after choices are fixed.
Save a table with at least:
| Field | Example |
|---|---|
run_id |
run_0001 |
Q0 |
0.85 |
a |
1.20 |
seed |
4301 |
split |
train |
f_bound |
0.72 |
sigma_v |
1.84 |
r_h |
3.15 |
status |
ok |
Also record units, particle count, timestep, integration time, softening, and the code version used to generate the dataset.
Step 6: Train One Tiny Emulator
Start with the smallest working neural emulator:
- inputs: \((Q_0, a)\),
- outputs: chosen summary statistics,
- model: small MLP in Equinox,
- optimizer: Optax Adam,
- normalization: compute means and standard deviations from training data only.
Your first goal is not to win. Your first goal is to beat a simple baseline and produce a predicted-vs-true plot on data that was not used for training.
Step 7: Add Inference Only After Evaluation
Use NumPyro only after the emulator is credible enough to be a forward model.
Before running NUTS, you should know:
- what prior range you are using for \((Q_0, a)\),
- what summary statistics count as the observation,
- what likelihood width you are using and how it was calibrated,
- which held-out case has known true parameters.
If the posterior misses the truth, debug in this order: simulator, dataset, emulator, likelihood width, summary statistics, sampler settings.
Day-One Checklist
Good First Commands
Your exact commands may differ, but a strong project usually supports something like:
python run.py --help
python run.py validate
python run.py generate-debug-data
python run.py train-emulator
python run.py make-figuresThe names matter less than the principle: a reader should not need to reverse-engineer your workflow from notebook cells.