Mean temperature: 23.92 degC
title: “Chapter 5: Functions & Modules - Building Reusable Scientific Code” subtitle: “COMP 536 | Python Fundamentals” author: “Anna Rosen” draft: false execute: freeze: auto echo: true warning: true error: false format: html: toc: true code-fold: true code-summary: “Show code” —
Learning Objectives
By the end of this chapter, you will be able to:
Prerequisites Check
Before starting this chapter, verify you can:
Chapter Overview
Function A reusable block of code that performs a specific task, taking inputs (parameters) and optionally returning outputs. Functions encapsulate logic and create abstractions.
Functions are the fundamental building blocks of organized code. Without functions, you’d be copying and pasting the same code repeatedly, making bugs harder to fix and improvements impossible to maintain. But functions are more than just a way to avoid repetition—they’re how we create abstractions, manage complexity, and build reliable software. Whether you’re calculating statistical measures, simulating star cluster dynamics, or processing multi-wavelength observational data, every computational project starts with well-designed functions.
This chapter teaches you to think about functions as contracts between different parts of your code. When you write a function that calculates energy or performs numerical integration, you’re creating a promise: given valid input, the function will reliably return the correct output. This contract mindset helps you write functions that others (including future you) can trust and use effectively. You’ll learn how to choose between positional and keyword arguments, when to use default values, and how to design interfaces that are both flexible and clear.
Abstraction A simplified interface that hides complex implementation details, allowing users to work with concepts rather than low-level operations. Functions are the fundamental abstraction mechanism in programming.
We’ll explore Python’s scope rules, which determine where variables can be accessed, and learn how seemingly simple concepts like default arguments can create subtle bugs that have plagued even major scientific software packages. You’ll discover how Python’s flexible parameter system enables powerful interfaces, and how functional programming concepts prepare you for modern scientific computing frameworks. By the end, you’ll be organizing your code into modules that can be shared, tested, and maintained professionally—essential skills for collaborative scientific research.
In production code, Python modules should be imported at the top of files for clarity and efficiency. However, in these educational examples, we sometimes import inside functions for pedagogical clarity — to show exactly what each function needs. In your own code, always prefer top-level imports unless you have a specific reason to import locally (such as avoiding circular dependencies or optional dependencies).
Functions are how you turn a one-off script into a reusable scientific tool.
In this course, functions are not just “avoiding repetition.” They are how you:
- make your logic testable (so you trust results)
- isolate assumptions (units, tolerances, edge cases)
- reuse code safely across projects
- collaborate without breaking each other’s work
If you invest in this chapter, you’ll write less code—and get more correct results.
COMP 536 uses scripts + IPython (not Jupyter notebooks). Functions are the unit of reuse in script-based workflows: they make your code modular, debuggable, and reproducible.
This chapter is a reference. You are not expected to memorize every parameter rule on first read.
Come back to it when you:
- are copy/pasting the same code twice
- can’t explain what a block of code is supposed to do
- need to test a calculation in isolation
- are ready to organize a project into modules
5.1 Defining Functions: The Basics
A function encapsulates a piece of logic that transforms inputs into outputs. Think of a function as a machine: you feed it raw materials (inputs), it performs some process (the function body), and it produces a product (output). In scientific computing, a function might calculate statistical measures, integrate equations, or transform data — each function performs one clear task that can be tested and trusted.
Your First Function
Let’s start with something every scientist needs — calculating mean and standard deviation:
Let’s break down exactly how this works:
defkeyword: Tells Python we’re defining a function.- Function name (
calculate_mean): Follows snake_case convention, describes what it does. - Parameters (
values): Variables that receive values when function is called. - Docstring: Brief description of what the function does (always include this!).
- Function body: Indented code that does the actual work.
returnstatement: Sends a value back to whoever called the function.
When Python executes calculate_mean(measurements), it creates a temporary namespace where values = [23.5, 24.1, ...], runs the function body, and returns the result.
What will this code print?
def calculate_product(x, y):
product = x * y
# Oops, forgot the return statement!
result = calculate_product(3.0, 4.0)
print(f"Product: {result}")It prints Product: None. The function calculates product but doesn’t return it. Without an explicit return statement, Python functions return None. This is a common bug in scientific code!
To fix it:
def calculate_product(x, y):
product = x * y
return product # Now it returns the valueFunctions Without Return Values
Not all functions return values. Some perform actions like printing results or saving data:
Statistics for Temperature ( degC):
Mean: 20.920
Range: [19.800, 22.300]
- A pure function returns a value and doesn’t modify outside state (easier to test, cache, parallelize, and reuse).
- An impure function has side effects (prints, writes files, mutates globals); that can be useful, but it’s harder to test.
In this chapter, kinetic_energy_cgs(...) is pure, while report_statistics(...) is intentionally impure because it prints.
Returning Multiple Values
Python functions can return multiple values using tuples — perfect for calculations that produce related results:
Mean: 9.950
Std Dev: 0.171
Range: [9.7, 10.2]
Kinetic Energy in CGS Units
Now let’s calculate kinetic energy using CGS units (centimeters, grams, seconds):
Electron kinetic energy: 4.09e-11 ergs
In eV: 25.55 eV
The Design Process: From Problem to Function
Before writing any function, design it first. This prevents the common mistake of coding yourself into a corner:
Energy = 5000.0 ergs
Valid (positive)? True
Valid (in range)? True
Between 1985 and 1987, the Therac-25 radiation therapy machine caused at least six accidents where patients received massive radiation overdoses — up to 100 times the intended dose. Three patients died directly from the overdoses, and three others suffered serious injuries including paralysis.
The root cause was a software function that lacked proper validation. The machine could be configured in two modes: low-power electron beam (direct) or high-power X-ray mode (with a metal target to convert electrons to X-rays). A race condition in the control software meant that if an operator typed too quickly when changing modes, the validation function would pass but leave the machine in a lethal state: high power WITHOUT the metal target in place, delivering concentrated electron beams directly to patients.
#| eval: false
# Simplified illustration of the type of bug:
def configure_beam(mode, power_level):
# FATAL FLAW: No validation that configuration is complete
# Race condition: if operator types fast, these can be inconsistent
set_mode(mode) # Could be "electron" mode
set_power(power_level) # Could be set to X-ray power level!
# No check that mode and power are compatible
# What it should have been:
def configure_beam_safe(mode, power_level):
# Validate configuration before allowing beam
if mode == "electron" and power_level > ELECTRON_MAX:
raise ValueError("Power too high for electron mode!")
if mode == "xray" and not target_in_place():
raise ValueError("X-ray mode requires target!")
# Only proceed if validation passes
set_mode(mode)
set_power(power_level)The tragedy led to fundamental changes in medical device software. Today, every critical function must validate inputs, check system state, and verify outputs. The few lines of validation code you write aren’t bureaucratic overhead—they’re the difference between healing and harm. Every time you add validation to your functions, you’re following safety practices written in the aftermath of preventable deaths.
[Source: Leveson, N.G. & Turner, C.S. (1993). “An Investigation of the Therac-25 Accidents”. IEEE Computer, 26(7), 18-41.]
Every well-designed function follows a contract pattern that applies across all programming:
CONTRACT PATTERN:
1. **Preconditions:** What must be true before calling
2. **Postconditions:** What will be true after calling
3. **Invariants:** What stays unchanged
4. **Side effects:** What else happens
Example for kinetic_energy_cgs():
- **Precondition:** mass > 0, velocity is numeric
- **Postcondition:** returns positive energy value
- **Invariant:** input values unchanged
- **Side effects:** none (pure function)
This pattern appears in:
- Database transactions (ACID properties)
- API design (REST contracts)
- Parallel computing (thread safety)
- Unit testing (test contracts)Before you write a function, answer these:
- Inputs: What are the inputs? What are their units and valid ranges?
- Outputs: What do you return? In what units?
- Preconditions: What must be true before calling?
- Failure modes: What can go wrong (empty data, non-finite values, invalid ranges)?
- Side effects: Does it print, write files, mutate inputs, or depend on global state?
A good function makes assumptions explicit and failures loud.
5.2 Function Arguments In-Depth
Parameter A variable in a function definition that receives a value when the function is called.
Python provides flexible ways to handle function parameters. Understanding the distinction between positional arguments, keyword arguments, and default values is crucial for designing clear, flexible interfaces. Let’s explore this flexibility through progressive examples.
Argument The actual value passed to a function when calling it.
Positional vs Keyword Arguments
Correct escape velocity: 1.12e+06 cm/s = 11.2 km/s
Wrong (args reversed): 1.19e-13 cm/s
The wrong value is 9373724690001567744x too small!
With keywords: 1.12e+06 cm/s
Default Arguments Make Functions Flexible
Default Argument A parameter value specified in the function definition that is used when no argument is provided for that parameter during the function call.
Default arguments let users omit parameters when standard values suffice:
At t=0: 1000000 atoms
After 5730 years: 500000 atoms
U-238 after 1 billion years: 857244 atoms
Use Positional Arguments When:
- The meaning is obvious from context (
power(base, exponent)) - There are only 1-2 required parameters
- The order is natural and memorable
Use Keyword Arguments When:
- There are many parameters (>3)
- Parameters are optional
- The meaning isn’t obvious (
process(True, False, 5)vsprocess(verbose=True, cache=False, retries=5))
Use Default Arguments When:
- There’s a sensible standard value
- Most calls use the same value
- You want backward compatibility when adding features
Best Practice Progression:
- Start with required positional arguments
- Add defaults for optional behaviors
- Use keyword-only arguments for clarity (see Advanced section)
The Mutable Default Trap
Here’s a critical bug that has appeared in major scientific software:
Day 1: [23.5]
Day 2: [23.5, 24.1]
Same list? True
This bug has appeared in:
- CERN analysis scripts (accumulated all runs’ data)
- NASA trajectory calculations (mixed mission parameters)
- Weather prediction models (combined different forecasts)
The symptom: data from previous runs mysteriously appears in new analyses.
The cause: Python evaluates default arguments once when the function is defined, not each time it’s called.
The fix: always use None as default for mutable arguments.
Here’s the correct pattern:
Day 1: [23.5]
Day 2: [24.1]
Combined: [23.5, 24.1]
Sentinel Value A special value (like None) used to signal a particular condition, often used as a default for mutable arguments.
Variable Arguments (*args and **kwargs)
*args Collects extra positional arguments into a tuple.
**kwargs Collects extra keyword arguments into a dictionary.
{'n': 5, 'mean': 3.0, 'std': 1.4142135623730951, 'min': 1, 'max': 5}
Mean only: 2.47
=== Experiment: Test A ===
Configuration:
temperature_k: 293.15
pressure_dynes_cm2: 1013250.0
trials: 10
=== Experiment: Test B ===
Configuration:
temperature_k: 350
pressure_dynes_cm2: 2000000.0
trials: 50
catalyst: Platinum
Parameter Ordering Rules
Python enforces specific rules about parameter order in function definitions. These rules exist to make functions clearer and prevent ambiguous calls. Let’s build up from simple to complex.
Basic Rule: Required Before Optional
The fundamental rule is that parameters with defaults must come after parameters without defaults:
15
25
The Complete Parameter Order
Python supports different parameter types that must appear in a specific order. Here’s the complete hierarchy:
| Order | Type | Syntax Example | How to Call |
|---|---|---|---|
| 1 | Positional-only | def f(x, y, /): |
Must use position: f(1, 2) |
| 2 | Standard | def f(a, b): |
Position or keyword: f(1, 2) or f(a=1, b=2) |
| 3 | Default values | def f(x=10): |
Optional: f() or f(5) |
| 4 | *args | def f(*args): |
Collects extras: f(1, 2, 3) \(\to\) args=(1,2,3) |
| 5 | Keyword-only (required) | def f(*, x): |
Must use name: f(x=5) (required) |
| 6 | Keyword-only with default | def f(*, x=10): |
Optional keyword: f() or f(x=5) |
| 7 | **kwargs | def f(**kwargs): |
Collects extra keywords: f(a=1, b=2) |
Positional-Only Parameter A parameter that can only be passed by position, not by name. Marked with / after such parameters (Python 3.8+).
You rarely need all types in one function. Here are practical examples showing common combinations:
Keyword-Only Parameter A parameter that can only be passed by name, not by position. Created by placing parameters after *args or a bare *.
3.33
3.3333
Processing 4 points
Normalize: False
Options: {'threshold': 0.5, 'verbose': True}
Understanding Keyword-Only Parameters
Any parameter that comes AFTER *args (or a bare *) can ONLY be passed using its name. This feature makes function calls self-documenting:
If a parameter is a boolean (e.g., normalize=True) or has units/meaning that isn’t obvious at the call site, make it keyword-only.
Common Patterns You’ll Use
Average of 2, 4, 6, 8: 5.0
Plotting 3 points
color: red
linewidth: 2
marker: o
Understanding the Special Markers
/ Parameter Marker Forces all parameters before it to be positional-only. Parameters after / can be passed by position or keyword. Python 3.8+.
Python uses two special markers to control how arguments can be passed:
* Parameter Marker Forces all parameters after it to be keyword-only. Can be used alone (*, x) or with *args to collect positional arguments.
5.0
Configuring MyApp
Debug mode enabled
Which of these function definitions are valid, and how would you call each?
# Definition A
def func_a(x, y=1, *args, z):
pass
# Definition B
def func_b(x, *args, z, y=1):
pass
# Definition C
def func_c(x, y=1, z=2, *args):
pass
# Definition D
def func_d(x, /, y, *, z):
passA: VALID - z is keyword-only (required) because it comes after *args
func_a(1, 2, 3, 4, z=5) # x=1, y=2, args=(3,4), z=5
func_a(1, z=5) # x=1, y=1 (default), args=(), z=5
func_a(1, 2, 3) # ERROR: missing required keyword argument 'z'B: VALID - Both z (required) and y (optional) are keyword-only
func_b(1, 2, 3, z=5) # x=1, args=(2,3), z=5, y=1 (default)
func_b(1, z=5, y=4) # x=1, args=(), z=5, y=4
func_b(1, 2, 3) # ERROR: missing required keyword 'z'C: VALID - Standard pattern, *args collects extras at the end
func_c(1, 2, 3, 4, 5) # x=1, y=2, z=3, args=(4,5)
func_c(1) # x=1, y=1 (default), z=2 (default), args=()
func_c(1, 10) # x=1, y=10, z=2 (default), args=()D: VALID (Python 3.8+) - Mixed positioning rules
xmust be passed positionally (before/)ycan be positional or keyword (between/and*)zmust be passed as keyword (after*)
func_d(1, 2, z=3) # Valid: positional x, positional y, keyword z
func_d(1, y=2, z=3) # Valid: positional x, keyword y, keyword z
func_d(x=1, y=2, z=3) # ERROR: x must be positional (not keyword)
func_d(1, 2, 3) # ERROR: z must be keyword (not positional)Key Patterns to Remember:
- Parameters after
*argsor*are keyword-only - Parameters before
/are positional-only (Python 3.8+) - Required parameters must come before optional ones in each section
5.3 Scope and Namespaces
Scope The region of a program where a variable is accessible.
Understanding scope—where variables can be accessed—is crucial for writing bug-free code. Python’s scope rules determine which variables are visible at any point in your program. Without understanding scope, you’ll encounter confusing bugs where variables don’t have the values you expect, or worse, where changing a variable in one place mysteriously affects code elsewhere.
Namespace A container that holds a set of identifiers and their associated objects.
The LEGB Rule
LEGB Local, Enclosing, Global, Built-in - Python’s scope resolution order.
Python resolves variable names using the LEGB rule, searching in this order:
- Local: Inside the current function
- Enclosing: In the enclosing function (for nested functions)
- Global: At the top level of the module
- Built-in: In the built-in namespace (print, len, etc.)
Inside relativistic: beta = 0.500, gamma = 1.155
Inside calculate: E = 9.45e-07 ergs
Global scope: c = 3.00e+10 cm/s
Final energy: 9.45e-07 ergs = 590120.65 eV
The UnboundLocalError Trap
Count: 2
Global counter: 2
UnboundLocalError
The error happens because Python sees you’re assigning to counter, assumes it’s local, but then can’t find a local value to increment.
Symptoms:
- Variable works fine when reading
- Crashes when trying to modify
- Error message mentions “referenced before assignment”
Fix: Either use global keyword or (better) pass the value explicitly.
Real disaster: A climate model that tried to update global temperature but created local variables instead, producing nonsense results for months before discovery.
Closures: Functions That Remember
Closure A function that remembers variables from its enclosing scope even after that scope has finished.
Rectangle rule: 0.328350
Midpoint rule: 0.333325
Exact: 0.333333
Rectangle error: 0.004983
Midpoint error: 0.000008
What happens if both inner and outer functions define the same variable name?
Inner sees: inner
Outer sees: outer
Each function sees its own local variable. The inner function’s x doesn’t affect the outer function’s x. They’re in different namespaces!
Output:
Inner sees: inner
Outer sees: outerIf the inner function didn’t define its own x, it would see the outer’s x due to the LEGB rule (finding it in the Enclosing scope).
Global variables violate the “principle of least surprise” in computing:
HIDDEN STATE ANTI-PATTERN:
- Function behavior depends on external state
- Can't understand function in isolation
- Testing requires global setup
- Parallel processing becomes impossible
Real disaster: Mars Climate Orbiter (1999)
- Global unit variable (metric vs imperial)
- One module changed it to imperial
- Another module read it expecting metric
- $327 million spacecraft crashed into Mars
Better pattern: Explicit State Passing
BAD: current_units = 'metric'; calculate()
GOOD: calculate(units='metric')
5.4 Functional Programming Elements
Side Effect Any state change that occurs beyond returning a value from a function, such as modifying global variables or printing output.
Pure Function A function that always returns the same output for the same input with no side effects.
Python supports functional programming—a style that treats computation as the evaluation of mathematical functions. These concepts are essential for modern scientific frameworks like JAX and lead to cleaner, more testable code.
Lambda Functions
Lambda An anonymous function defined inline using the lambda keyword.
Lambda functions are small, anonymous functions defined inline:
Traditional: 25
Lambda: 25
Sorted by x:
x=0.8, y=1.2
x=1.5, y=2.3
x=2.1, y=4.7
Sorted by relative error:
x=2.1, relative error=1.1%
x=1.5, relative error=4.3%
x=0.8, relative error=16.7%
Map, Filter, and Reduce
These functional tools transform how you process data:
- List comprehension / generator expression (default)
map/filter(fine, but can be harder to read)reduce(rare in Python; prefersum,min,max,any,all)
Original: [981, 979, 983, 9900, 980, 982, 978]
Filtered: [981, 979, 983, 980, 982, 978]
Removed 1 outliers
Calibrated: ['983.0', '981.0', '985.0', '982.0', '984.0', '980.0']
Final mean: 982.5 cm/s^2
Rewrite this loop using functional programming:
#| eval: false
# Square all positive values
results = []
for x in data:
if x > 0:
results.append(x**2)Two equivalent functional approaches:
#| eval: false
# Using filter and map
results = list(map(
lambda x: x**2,
filter(lambda x: x > 0, data)
))
# Using list comprehension (more Pythonic)
results = [x**2 for x in data if x > 0]The list comprehension is generally preferred in Python for readability, but understanding the functional approach prepares you for libraries like JAX.
Functions as First-Class Objects
In Python, functions are objects you can pass around:
Grams: [100, 200, 300, 400]
Kilograms: [0.1, 0.2, 0.3, 0.4]
Dynes: [100000.0, 200000.0, 300000.0, 400000.0]
Newtons: [1.0, 2.0, 3.0, 4.0]
Original: [2, 4, 6, 8, 10]
Normalized: ['-1.41', '-0.71', '0.00', '0.71', '1.41']
Functional programming isn’t just academic—it’s essential for modern scientific computing:
- JAX (Google’s NumPy replacement) requires pure functions for automatic differentiation
- Parallel processing works best with stateless functions
- Testing is trivial when functions have no side effects
- GPU computing maps naturally to functional operations
Example: In JAX, you can automatically differentiate through an entire physics simulation if it’s written functionally. This enables techniques like physics-informed neural networks that would be impossibly complex to implement manually.
5.5 Modules and Packages
Module A Python file containing definitions and statements that can be imported.
As your analysis grows from scripts to projects, organization becomes critical. Modules let you organize related functions together and reuse them across projects.
Package A directory containing multiple Python modules and an __init__.py file.
Creating Your First Module
You can create module files two ways: 1. Recommended: Use your text editor to create a new .py file 2. Optional: Programmatically write files (not recommended; disabled in the rendered book)
Save this as statistics_tools.py in your current directory:
#| eval: false
"""
statistics_tools.py - Basic statistical calculations.
This module provides fundamental statistical functions.
"""
import math
def mean(data):
"""Arithmetic mean (uses math.fsum for better numerical stability)."""
if len(data) == 0:
raise ValueError("mean() requires at least one value")
return math.fsum(data) / len(data)
def variance(data, ddof=0):
"""
Variance with optional degrees of freedom adjustment.
ddof=0 gives population variance (divide by n).
ddof=1 gives sample variance (divide by n-1).
"""
n = len(data)
if n - ddof <= 0:
raise ValueError(f"variance() requires n > ddof (n={n}, ddof={ddof})")
m = mean(data)
return math.fsum((x - m)**2 for x in data) / (n - ddof)
def std_dev(data, ddof=0):
"""Standard deviation corresponding to variance(ddof=...)."""
return math.sqrt(variance(data, ddof=ddof))
def std_error(data):
"""Standard error of the mean (uses sample std: ddof=1)."""
return std_dev(data, ddof=1) / math.sqrt(len(data))
if __name__ == "__main__":
test_data = [1, 2, 3, 4, 5]
print(f"Test mean: {mean(test_data)}")
print(f"Test sample std dev: {std_dev(test_data, ddof=1):.3f}")Using Your Module
#| eval: false
# Different import methods
# Method 1: Import entire module
import statistics_tools
data = [10, 12, 11, 13, 10, 11, 12]
m = statistics_tools.mean(data)
s = statistics_tools.std_dev(data)
print(f"Method 1 - Mean: {m:.2f}, Std: {s:.2f}")
# Method 2: Import specific functions
from statistics_tools import mean, std_error
se = std_error(data)
print(f"Method 2 - Standard error: {se:.3f}")
# Method 3: Import with alias
import statistics_tools as stats
var = stats.variance(data)
print(f"Method 3 - Variance: {var:.3f}")reload
If you edit statistics_tools.py (or any module) and your changes “don’t show up”, you’re probably seeing import caching: Python only imports a module once per session.
#| eval: false
import importlib
import statistics_tools
importlib.reload(statistics_tools)Building a Physics Module
Let’s create a comprehensive module for physics calculations:
#| eval: false
"""
physics_cgs.py - Physics calculations in CGS units.
All calculations use CGS (centimeter-gram-second) units.
"""
# Physical constants in CGS
SPEED_OF_LIGHT = 2.998e10 # cm/s
PLANCK_CONSTANT = 6.626e-27 # erg*s
BOLTZMANN = 1.381e-16 # erg/K
ELECTRON_MASS = 9.109e-28 # g
PROTON_MASS = 1.673e-24 # g
GRAVITATIONAL_CONSTANT = 6.674e-8 # cm^3/(g*s^2)
def kinetic_energy(mass_g, velocity_cms):
"""KE = (1/2) m v^2 in ergs."""
return 0.5 * mass_g * velocity_cms**2
def momentum(mass_g, velocity_cms):
"""p = mv in g*cm/s."""
return mass_g * velocity_cms
def de_broglie_wavelength(mass_g, velocity_cms):
"""lambda = h/p in cm."""
p = momentum(mass_g, velocity_cms)
return PLANCK_CONSTANT / p if p != 0 else float('inf')
def thermal_velocity(temp_k, mass_g):
"""RMS thermal velocity in cm/s."""
import math
return math.sqrt(3 * BOLTZMANN * temp_k / mass_g)
def photon_energy(wavelength_cm):
"""E = hc/lambda in ergs."""
return PLANCK_CONSTANT * SPEED_OF_LIGHT / wavelength_cm
def gravitational_force(m1_g, m2_g, distance_cm):
"""F = Gm_1m_2/r^2 in dynes."""
if distance_cm == 0:
return float('inf')
return GRAVITATIONAL_CONSTANT * m1_g * m2_g / distance_cm**2
class Particle:
"""Preview of Chapter 6: a simple particle with physics methods."""
def __init__(self, mass_g, velocity_cms):
self.mass = mass_g
self.velocity = velocity_cms
def kinetic_energy(self):
return kinetic_energy(self.mass, self.velocity)
def wavelength(self):
return de_broglie_wavelength(self.mass, self.velocity)
if __name__ == "__main__":
# Tiny smoke test
v_electron = 0.01 * SPEED_OF_LIGHT # cm/s
ke = kinetic_energy(ELECTRON_MASS, v_electron)
wavelength = de_broglie_wavelength(ELECTRON_MASS, v_electron)
print(f"KE={ke:.2e} ergs, lambda={wavelength:.2e} cm")After saving that as physics_cgs.py, you can import and use it:
#| eval: false
import physics_cgs
v_electron = 0.01 * physics_cgs.SPEED_OF_LIGHT # cm/s
ke = physics_cgs.kinetic_energy(physics_cgs.ELECTRON_MASS, v_electron)
wavelength = physics_cgs.de_broglie_wavelength(physics_cgs.ELECTRON_MASS, v_electron)
print(f"Electron at 1% c:")
print(f" Velocity = {v_electron:.2e} cm/s")
print(f" KE = {ke:.2e} ergs")
print(f" de Broglie lambda = {wavelength:.2e} cm")Import Best Practices
#| eval: false
# GOOD: Clear, explicit imports
import numpy as np
import matplotlib.pyplot as plt
from math import sin, cos, pi
# BAD: Wildcard imports pollute namespace
# from numpy import * # Adds 600+ names!
# from math import * # Conflicts with numpy!
# Example of namespace collision
import math
import cmath # Complex math module (introduced in Chapter 2)
# Clear which log we're using
real_log = math.log(10) # Natural log of real number
complex_log = cmath.log(-1) # Can handle complex numbers
print(f"math.log(10) = {real_log:.3f}")
print(f"cmath.log(-1) = {complex_log}") # Returns complex number
# If we had used 'from math import *':
# log(10) # Which log? math or cmath? Unclear!Why is from module import * dangerous in scientific code?
Wildcard imports are dangerous because:
- Name collisions: Multiple libraries have functions with same names (log, sqrt, sin)
- Hidden overwrites: Later imports silently replace earlier ones
- Unclear source: Can’t tell where a function comes from
- Namespace pollution: Hundreds of unwanted names
Real example that caused wrong results:
from numpy import * # Has log (natural log)
from math import * # Also has log, overwrites numpy's
from sympy import * # Has symbolic log, overwrites again
# Which log is this?
result = log(data) # Could be any of the three!Always use explicit imports for clarity and safety!
On September 14, 2015, at 09:50:45 UTC, the Laser Interferometer Gravitational-Wave Observatory (LIGO) detected humanity’s first gravitational wave signal, named GW150914. This discovery, which earned the 2017 Nobel Prize in Physics, was only possible because of meticulously modular software architecture.
The LIGO analysis pipeline consisted of hundreds of independent Python modules, each performing one specific task:
#| eval: false
# Simplified LIGO pipeline structure
def calibrate_strain(raw_data, calibration_data):
"""Convert detector output to strain."""
# Complex calibration involving transfer functions
def remove_lines(strain_data, line_frequencies):
"""Remove power line interference and harmonics."""
# Notch filters at 60Hz, 120Hz, etc.
def whiten_data(strain, psd):
"""Normalize frequency response."""
# Spectral whitening for uniform sensitivity
def matched_filter(data, template_bank):
"""Search for gravitational wave patterns."""
# Cross-correlation with theoretical waveformsWhen the first signal appeared, this modular design enabled unprecedented rapid verification. Within minutes, automated pipelines had processed the data through dozens of independent checks. The modularity allowed the 1000+ member collaboration to:
- Test each processing step with simulated signals
- Run multiple independent analysis pipelines in parallel
- Swap alternative algorithms to verify robustness
- Complete peer review in record time
The discovery paper lists over 30 independent software modules, each thoroughly tested and documented. This modular approach—exactly what you’re learning in this chapter—enabled one of physics’ greatest discoveries!
On June 4, 1996, the maiden flight of Ariane 5 ended in spectacular failure just 37 seconds after liftoff. The rocket veered off course and self-destructed, destroying four Cluster satellites worth $370 million. This disaster teaches us a crucial lesson about function design and validation.
The root cause traced back to a single function in the inertial reference system, originally written for Ariane 4. This function converted a 64-bit floating-point horizontal velocity value to a 16-bit signed integer. In Ariane 4’s slower flight profile, this worked perfectly. But Ariane 5 was more powerful. At T+37 seconds, the horizontal velocity exceeded 32,767—the maximum value for a 16-bit signed integer.
The conversion function essentially looked like this conceptually:
#| eval: false
def convert_velocity(velocity_64bit):
# Direct conversion without validation
return int(velocity_64bit) # Assumes it fits in 16 bits!What they needed was:
#| eval: false
def convert_velocity_safe(velocity_64bit):
MAX_INT16 = 32767
MIN_INT16 = -32768
if velocity_64bit > MAX_INT16:
# Handle overflow appropriately
log_error(f"Velocity {velocity_64bit} exceeds 16-bit range")
return MAX_INT16 # Or raise exception with proper handling
elif velocity_64bit < MIN_INT16:
return MIN_INT16
return int(velocity_64bit)The overflow caused an exception in both the primary and backup systems (which ran identical code). The main flight computer, suddenly receiving diagnostic data instead of attitude information, interpreted it as actual flight data and commanded a violent course correction. The resulting aerodynamic forces exceeded design limits, triggering automatic self-destruct.
The investigation revealed four critical lessons about function design: 1. Always validate input ranges, especially when converting between data types 2. Never assume code reuse is safe without checking new operating conditions 3. Design for graceful degradation rather than complete failure 4. Test with realistic data ranges that match actual operating conditions
Every validation function you write, every bounds check you add, every type conversion you verify—these aren’t bureaucratic overhead. They’re the difference between mission success and catastrophic failure. This disaster led directly to modern software engineering practices that require explicit contracts for all functions, comprehensive range checking, and formal verification of critical code paths.
5.6 Documentation and Testing
Good documentation and testing make your functions trustworthy and reusable. Professional scientific code requires both to ensure reproducibility and reliability.
Documentation Levels
Integral of x^2 from 0 to 2: 2.750
Exact answer: 2.667
Error: 0.083
Testing Your Functions
Test 1: Simple roots - PASSED
Test 2: No real roots - PASSED
Test 3: Repeated root - PASSED
All tests passed!
Introduction to Type Hints (Optional Enhancement)
Type hints are an optional Python feature introduced in Python 3.5. They document expected types but Python does not enforce them - they’re primarily for documentation and IDE support.
For this course: Type hints are not required but can help make your code clearer. Focus on writing correct functions first, then add type hints if desired.
Python supports type hints to document expected types:
Energy: 4.10e-09 ergs
Integer inputs still work: 5000.0 ergs
Mean: 3.0, Std: 1.41, Median: 3.0
This function has a subtle bug. Can you spot it?
def find_peaks(data, threshold):
"""Find all peaks above threshold - HAS BUG."""
peaks = []
peak_indices = []
for i in range(len(data)):
if data[i] > threshold:
# Check if it's a local maximum
if i == 0 or data[i] > data[i-1]:
if i == len(data)-1 or data[i] > data[i+1]:
peaks.append(data[i])
peak_indices.append(i)
return peaks, peak_indicesBug: The strict > comparisons miss flat-topped (“plateau”) peaks where adjacent values are equal. A naive switch to >= can create a new problem: it may label every point on a plateau as a separate peak.
Fix:
def find_peaks_fixed(data, threshold):
"""
Find local maxima above threshold.
Plateau rule (simple heuristic): for a flat-topped peak, return the center index.
"""
peaks = []
peak_indices = []
i = 0
n = len(data)
while i < n:
if data[i] <= threshold:
i += 1
continue
# Expand to full plateau [left, right] where values are equal
left = i
while left > 0 and data[left - 1] == data[i]:
left -= 1
right = i
while right < n - 1 and data[right + 1] == data[i]:
right += 1
# Peak if plateau is higher than neighbors outside the plateau
left_ok = (left == 0) or (data[left] > data[left - 1])
right_ok = (right == n - 1) or (data[right] > data[right + 1])
if left_ok and right_ok:
idx = (left + right) // 2
peaks.append(data[idx])
peak_indices.append(idx)
i = right + 1 # skip past the plateau
return peaks, peak_indicesPeak definitions are domain-specific. This heuristic is a reasonable default, and it avoids marking every plateau point as a separate peak.
5.7 Performance Considerations
Understanding function overhead helps you write efficient code. Let’s measure and optimize!
Function Call Overhead
Function Call Analysis:
Simple function: 5.0 ms
Inline addition: 0.8 ms
Overhead factor: 6.0x
Complex function: 26.2 ms
Lesson: Overhead only matters for trivial operations!
Vectorization (A Quick Preview)
You mentioned vectorization in the learning objectives; here’s what it looks like in practice. The core idea: push work into NumPy’s compiled loops instead of Python loops.
#| eval: false
import numpy as np
import timeit
rng = np.random.default_rng(0)
n = 200_000
masses = rng.uniform(0.5, 2.0, size=n) # g (toy)
velocities = rng.normal(0.0, 1.0e7, size=n) # cm/s (toy)
def ke_loop(m, v):
out = []
for mi, vi in zip(m, v):
out.append(0.5 * mi * vi**2)
return out
def ke_vec(m, v):
return 0.5 * m * v**2
print("Loop:", timeit.timeit("ke_loop(masses, velocities)", globals=globals(), number=3))
print("Vec :", timeit.timeit("ke_vec(masses, velocities)", globals=globals(), number=50))Memoization for Expensive Calculations
Memoization Caching function results to avoid recomputing expensive operations.
Fibonacci(28) = 317811
Without memoization: 33.1 ms
With memoization: 0.113 ms
Speedup: 293x
Cache info: CacheInfo(hits=26, misses=29, maxsize=128, currsize=29)
Planck function at 500nm, 5778K: 2.64e+14 erg/(s cm^2 cm sr)
Cache info: CacheInfo(hits=0, misses=1, maxsize=128, currsize=1)
Main Takeaways
Functions transform programming from repetitive scripting into modular, maintainable software engineering. When you encapsulate logic in well-designed functions, you create building blocks that can be tested independently, shared with collaborators, and combined into complex analysis pipelines. The progression from simple functions to modules to packages mirrors how scientific software naturally grows—what starts as a quick calculation evolves into a shared tool used by entire research communities.
The distinction between positional, keyword, and default arguments gives you the flexibility to design interfaces that are both powerful and intuitive. Positional arguments work well for obvious parameters like power(base, exponent), while keyword arguments with defaults enable complex functions that remain simple for common cases. Understanding when to use each type—and the critical danger of mutable default arguments—prevents the subtle bugs that have plagued major scientific packages.
The scope rules and namespace concepts you’ve learned explain why variables sometimes behave unexpectedly in complex programs. Understanding the LEGB rule prevents frustrating bugs where variables have unexpected values or modifications in one place affect seemingly unrelated code. The mutable default argument trap demonstrates why understanding Python’s evaluation model is crucial for writing reliable code. These aren’t just academic concepts—they’ve caused real disasters in production systems.
Functional programming concepts like map, filter, and pure functions prepare you for modern scientific computing frameworks. JAX requires functional style for automatic differentiation, parallel processing works best with stateless functions, and testing becomes trivial when functions have no side effects. The ability to pass functions as arguments and return them from other functions enables powerful patterns like the specialized integrators we created with closures.
The performance measurements showed that function call overhead only matters for trivial operations in tight loops—exactly where you’ll want to use NumPy’s vectorized operations (Chapter 7) instead. For complex calculations, overhead is negligible compared to computation time. Memoization can provide dramatic speedups when expensive calculations repeat, as often happens in optimization and parameter searching.
Looking forward, the functions you’ve learned to write here form the foundation for object-oriented programming in Chapter 6, where functions become methods attached to objects. The module organization skills prepare you for building larger scientific packages, while the documentation practices ensure your code can be understood and maintained by others. Most importantly, thinking in terms of functional contracts and clear interfaces will make you a better computational scientist, capable of building the robust, efficient tools that modern research demands.
With functions and modules, you can hide complexity behind clean interfaces. In Chapter 6, you’ll go one step further: you’ll bundle data + behavior into objects, which is often the most natural way to model scientific systems.
Definitions
argument - The actual value passed to a function when calling it (e.g., in f(5), 5 is an argument)
closure - A function that remembers variables from its enclosing scope even after that scope has finished executing
decorator - A function that modifies another function’s behavior without changing its code
default argument - A parameter value used when no argument is provided during function call
docstring - A string literal that appears as the first statement in a function, module, or class to document its purpose
function - A reusable block of code that performs a specific task, taking inputs and optionally returning outputs
global - A keyword that allows a function to modify a variable in the global scope
keyword argument - An argument passed to a function by explicitly naming the parameter
lambda - An anonymous function defined inline using the lambda keyword
LEGB - The order Python searches for variables: Local, Enclosing, Global, Built-in
memoization - Caching function results to avoid recomputing expensive operations
module - A Python file containing definitions and statements that can be imported and reused
namespace - A container that holds a set of identifiers and their associated objects
package - A directory containing multiple Python modules and an __init__.py file
parameter - A variable in a function definition that receives a value when the function is called
positional argument - An argument passed to a function based on its position in the parameter list
pure function - A function that always returns the same output for the same input with no side effects
return value - The result that a function sends back to the code that called it
scope - The region of a program where a variable is accessible
sentinel value - A special value (like None) used to signal a particular condition, often as default for mutable arguments
side effect - Any state change that occurs beyond returning a value from a function
*args - Syntax for collecting variable positional arguments into a tuple
****kwargs** - Syntax for collecting variable keyword arguments into a dictionary
Key Takeaways
- Functions are contracts: they promise specific outputs for given inputs
- Choose positional arguments for obvious parameters, keyword arguments for optional ones
- The mutable default argument trap occurs because defaults are evaluated once at definition time
- Always use
Noneas a sentinel for mutable default arguments - Python searches for variables using LEGB: Local, Enclosing, Global, Built-in
- Global variables make code hard to test, debug, and parallelize
- Lambda functions are useful for simple operations but limited to single expressions
- Functional programming concepts (map, filter, reduce) prepare you for modern frameworks
- The
if __name__ == "__main__"pattern makes modules both importable and executable - Never use
from module import *in production code—it causes namespace pollution - Docstrings are essential for scientific code that others will use and maintain
- Function call overhead matters only in tight loops with trivial operations
- Memoization can dramatically speed up expensive repeated calculations
- Testing and validation functions are as important as calculation functions
- Performance optimization should follow: algorithm \(\to\) vectorization \(\to\) caching \(\to\) parallelization
Quick Reference Tables
Function Definition Patterns
| Pattern | Syntax | Use Case |
|---|---|---|
| Basic function | def func(x, y): |
Simple operations |
| Positional only | def func(a, b, /): |
Force positional |
| Default arguments | def func(x, y=10): |
Optional parameters |
| Keyword only | def func(*, x, y): |
Force keywords |
| Variable args | def func(*args): |
Unknown number of inputs |
| Keyword args | def func(**kwargs): |
Flexible options |
| Combined | def func(a, *args, x=1, **kwargs): |
Maximum flexibility |
| Lambda | lambda x: x**2 |
Simple inline functions |
When to Use Different Argument Types
| Argument Type | When to Use | Example |
|---|---|---|
| Positional | Obvious meaning, 1-3 params | power(2, 3) |
| Keyword | Many params, optional | plot(data, color='red') |
| Default | Common values | round(3.14, digits=2) |
| *args | Variable inputs | maximum(1, 2, 3, 4) |
| **kwargs | Configuration options | setup(debug=True, verbose=False) |
Module Import Patterns
| Pattern | Example | When to Use |
|---|---|---|
| Import module | import numpy |
Use many functions |
| Import with alias | import numpy as np |
Standard abbreviations |
| Import specific | from math import sin, cos |
Few specific functions |
| Import all (avoid!) | from math import * |
Never in production |
| Relative import | from . import module |
Within packages |
Common Function Bugs and Fixes
| Problem | Symptom | Fix |
|---|---|---|
| Mutable default | Data persists between calls | Use None sentinel |
| UnboundLocalError | Can’t modify global | Use global or pass value |
| Missing return | Function returns None | Add return statement |
| Namespace pollution | Name conflicts | Avoid wildcard imports |
| Slow recursion | Exponential time | Add memoization |
| Type confusion | Unexpected types | Add type hints/validation |
Next Chapter Preview
With functions and modules mastered, Chapter 6 introduces Object-Oriented Programming (OOP)—a paradigm that bundles data and behavior together. You’ll learn to create classes that model physical systems naturally: a Particle class with position and velocity attributes, methods to calculate energy and momentum, and special methods that make your objects work seamlessly with Python’s built-in functions.
The functional programming concepts from this chapter provide essential background for OOP. Methods are just functions attached to objects, and understanding scope prepares you for the self parameter that confuses many beginners. The module organization skills you’ve developed will expand to organizing classes and building object hierarchies. Most importantly, the design thinking you’ve practiced—creating clean interfaces and thinking about contracts—directly applies to designing effective classes that model the complex systems you’ll encounter in computational physics.
References
Historical Events and Technical Details
- Ariane 5 Flight 501 Failure (1996)
- Lions, J. L. et al. (1996). “ARIANE 5 Flight 501 Failure: Report by the Inquiry Board.” European Space Agency. Paris: ESA.
- Gleick, J. (1996). “A Bug and A Crash.” The New York Times Magazine, December 1, 1996.
- Nuseibeh, B. (1997). “Ariane 5: Who Dunnit?” IEEE Software, 14(3), 15-16.
- LIGO Gravitational Wave Detection (2015)
- Abbott, B. P. et al. (LIGO Scientific Collaboration) (2016). “Observation of Gravitational Waves from a Binary Black Hole Merger.” Physical Review Letters, 116(6), 061102.
- Abbott, B. P. et al. (2016). “GW150914: The Advanced LIGO Detectors in the Era of First Discoveries.” Physical Review Letters, 116(13), 131103.
- LIGO Scientific Collaboration. (2021). “LIGO Algorithm Library - LALSuite.” Available at: https://lscsoft.docs.ligo.org/lalsuite/
- LIGO Open Science Center. (2024). “LIGO Open Data.” Available at: https://www.gw-openscience.org/
- Therac-25 Radiation Overdose Disasters (1985-1987)
- Leveson, N.G. & Turner, C.S. (1993). “An Investigation of the Therac-25 Accidents.” IEEE Computer, 26(7), 18-41.
- “Therac-25.” Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. Retrieved from: https://en.wikipedia.org/wiki/Therac-25
- Mars Climate Orbiter Loss (1999)
- Stephenson, A. G. et al. (1999). “Mars Climate Orbiter Mishap Investigation Board Report.” NASA.
- Oberg, J. (1999). “Why the Mars Probe Went Off Course.” IEEE Spectrum, 36(12), 34-39.
Python Documentation
- Python Language Reference
- Van Rossum, G., & Drake, F. L. (2024). “Python Language Reference, version 3.12.” Python Software Foundation. Available at: https://docs.python.org/3/reference/
- Scientific Python Resources
- Harris, C. R. et al. (2020). “Array programming with NumPy.” Nature, 585(7825), 357-362.
- VanderPlas, J. (2016). Python Data Science Handbook. O’Reilly Media. ISBN: 978-1491912058.