Week 2: The Math Behind the Slingshot

01 From Algorithms to Astrodynamics

Welcome back. Last week I researched the learning side of the problem, like the Markov decision processes and policy gradients. This week I pivoted entirely to the physics and mathematics of the trajectories themselves. To build a good environment for an RL agent to learn in, I need to understand the dynamics the environment is supposed to model, so this week was about building that understanding from first principles.

The reading covered orbital mechanics from Schaub and Junkins' Analytical Mechanics of Space Systems, then shifted to specialized literature on gravity-assist trajectory design, Lambert's problem, and an investigation of autonomous flyby control by Negri and Prado (2023), which managed to highlihgt a major challenge for my project. What follows is a walkthrough of the key ideas and how they fit together.

— Reading Log, Week 2 —

Analytical Mechanics of Space Systems (3rd ed.)

Schaub & Junkins, 2014 — Chapters 1, 2, 9

Two-body dynamics, orbital elements, conics, Lambert's problem

Revisiting Lambert's Problem

Izzo, D., 2015 — Celestial Mechanics and Dynamical Astronomy

Efficient algorithm for solving orbital boundary-value problems

Study on Autonomous Gravity-Assists with a Path-Following Control

Negri & Prado, 2023 — arXiv:2307.06185

Sliding-mode flyby control, Monte Carlo validation, Jovian tours

Interplanetary Gravity-Assist Fuel-Optimal Trajectory Optimization

Sarli & Taheri, 2020 — Celestial Mechanics and Dynamical Astronomy

Optimal control for gravity-assist sequences with perturbations

02 The Two-Body Problem: Where Everything Starts

Schaub and Junkins ground everything in the two-body problem, which consists of a spacecraft of negligible mass orbiting a single gravitational body. In this idealized case, the equations of motion reduce to a very compact form. The spacecraft's acceleration is entirely determined by the gravitational parameter $\mu = GM$ of the central body and the spacecraft's distance $r$ from it:

# Two-body equation of motion (Schaub & Junkins, Ch. 2)
$$\ddot{\mathbf{r}} = -\frac{\mu}{r^3}\mathbf{r}$$ # Vis-viva equation: speed at any point on a conic orbit
$$v^2 = \mu\!\left(\frac{2}{r} - \frac{1}{a}\right)$$ # $a$ = semi-major axis; $a > 0$ ellipse, $a < 0$ hyperbola, $a \to \infty$ parabola

The vis-viva equation is the single most useful result in orbital mechanics. It tells you the spacecraft's speed at any point on any conic orbit, given only the orbit's semi-major axis and the current distance from the gravitational body. For gravity-assist design, it is essential because the approach speed at a planet, the speed through periapsis of the flyby hyperbola, and the departure speed are all just applications of vis-viva.

The two body problem's solutions are conic sections: circles, ellipses, parabolas, and hyperbolas, which can conveniently be described by six numbers called the classical orbital elements, or the semi-major axis $a$, eccentricity $e$, inclination $i$, longitude of the ascending node $\Omega$, argument of periapsis $\omega$, and true anomaly $\nu$. These six numbers completely describe where a spacecraft is and where it is going in a two-body gravitational field. Schaub and Junkins spend considerable effort showing how to convert between these elements and the Cartesian position-velocity state vector, which is what a simulation like Basilisk actually propagates.

Diagram of Keplerian orbital elements — The six classical Keplerian orbital elements: semi-major axis $a$, eccentricity $e$, inclination $i$, longitude of the ascending node $\Omega$, argument of periapsis $\omega$, and true anomaly $\nu$.

For my RL environment, the spacecraft's state needs to encode all information relevant to making good decisions. The Cartesian state vector (position $\mathbf{r}$, velocity $\mathbf{v}$, mass $m$) is the natural choice because Basilisk propagates in Cartesian coordinates. However, the orbital elements are far more interpretable since an RL agent that can effectively "see" its current orbit shape and orientation relative to its target may learn faster and generalize better than one that only sees raw coordinates. How to represent state is one of the design choices I'll need to experiment with.

Design Question #1

Should the RL agent observe Cartesian state vectors $(\mathbf{r}, \mathbf{v}, m)$ or Keplerian orbital elements $(a, e, i, \Omega, \omega, \nu, m)$? Or both? The orbital elements representation may help the agent learn faster by providing a physically meaningful structure, but it also introduces singularities at zero eccentricity and zero inclination that need to be handled carefully.

03 Hyperbolic Orbits: The Grammar of the Gravity Assist

Much of orbital mechanics focuses on elliptical orbits, which are the closed paths that planets and satellites follow. But gravity assists happen on hyperbolic orbits, the curves traced by objects that have more than enough energy to escape a gravitational body. Understanding hyperbolic orbits in detail was one of the most important things I did this week because they are the fundamental unit of every gravity-assist maneuver.

On a hyperbolic orbit, the spacecraft arrives from effectively infinite distance with a nonzero speed relative to the planet, called the hyperbolic excess velocity $v_\infty$. It swings around the planet along a hyperbola, getting as close as the periapsis radius $r_p$, and then departs in a completely different direction, also with speed $v_\infty$ in the planet's reference frame. Schaub and Junkins derive from the vis-viva equation that the magnitude of $v_\infty$ is conserved through the flyby. Hence, the gravity-assist changes the direction of $v_\infty$, not its magnitude, in the planet's frame.

GravAssis — The geometry of a gravitational slingshot: in the planet's reference frame, the spacecraft's speed is unchanged but its direction is rotated by the turning angle $2\delta$. In the Sun's frame, this directional change translates to a net change in heliocentric speed, enabling propellant-free acceleration or deceleration.

# Hyperbolic orbit (Schaub & Junkins, Ch. 9)
$$v_\infty^2 = \mu\!\left(-\frac{1}{a}\right) \quad (a < 0 \text{ for hyperbola})$$ # Half-turning angle: how much the flyby deflects the trajectory
$$\sin\delta = \frac{\mu}{\mu + r_p v_\infty^2}$$ # Total turning angle is $2\delta$; smaller $r_p$ → larger deflection
$$\Delta v_\infty = 2 v_\infty \sin\delta \quad \text{(velocity change in Sun's frame)}$$

The turning angle $2\delta$ determines how much the flyby redirects the spacecraft's heliocentric velocity. A closer periapsis altitude means a tighter hyperbola, a larger turning angle, and a more powerful assist. But obviously the periapsis must stay above the planet's surface (and ideally well above the atmosphere). As a result you have to strike a balance between large deflections and the physical constraints of the flyby geometry.

What surprised me in the Sarli and Taheri (2020) paper was how much perturbations complicate this clean picture. They model solar radiation pressure on the spacecraft alongside the planetary gravity assists themselves, and show that the optimal trajectory must account for these forces continuously throughout the transfer and not just at the flyby encounters. The perturbations are relatively small, but over months-long transfers to the outer planets they accumulate into significant trajectory deviations. A trained RL policy that can correct continuously for small errors is therefore far more valuable than a pre-computed optimal trajectory that assumes a clean dynamical model.

Key Insight from Sarli & Taheri

Gravity-assist trajectories are typically designed using the patched-conics approximation, which treats each gravitational body in isolation and stitches together two-body solutions at the boundaries of each planet's sphere of influence. While this greatly simplifies the math, it introduces errors that must be corrected with additional delta-v. Higher-fidelity N-body simulations reveal that these patched-conics solutions can diverge substantially from real trajectories, especially in the complex gravitational environments of the outer planetary systems.

04 Lambert's Problem: Connecting Two Points in Space

The problem is that given two positions in space and a specified travel time, what is the orbit connecting them? This is a boundary-value problem that appears everywhere in trajectory design. Any time you want to know "what velocity must I leave point A with to reach point B in exactly $\Delta t$ seconds?", you are solving Lambert's problem.

The mathematical elegance of Izzo's (2015) treatment was something I genuinely was not expecting from what I assumed would be a dry algorithmic paper. Izzo builds on Lancaster and Blanchard's classical formulation and introduces a new variable transformation. All Lambert problems with the same ratio $c/s$ (chord length over semi-perimeter of the triangle formed by the two position vectors) are geometrically equivalent under what Gooding called "L-similarity" as they share the same non-dimensional time-of-flight curves. This collapses what looks like a family of infinitely many problems into a single universal parameter space.

# Lambert's problem: key parameters (Izzo, 2015)
$$\lambda^2 = 1 - \frac{c}{s}, \quad s = \frac{1}{2}(r_1 + r_2 + c)$$ # Non-dimensional time of flight
$$T = \frac{1}{2}\sqrt{\frac{\mu}{a_m^3}}(t_2 - t_1) \quad \text{where } a_m = s/2$$ # T is a function of $x$ (the iteration variable) and $\lambda$ alone
$$T(x, \lambda) = \frac{1}{1-x^2}\!\left[\frac{\psi + M\pi}{\sqrt{|1-x^2|}} - x + \lambda y\right]$$

Izzo's solver converges very precisely (errors below $10^{-13}$) in an average of 2.1 iterations for single-revolution transfers and about 3.3 iterations for multi-revolution cases, using a Householder iterative scheme fed by analytically derived initial guesses. For context, this is faster than the previously state-of-the-art Gooding algorithm, which required about 3 iterations per solve. Since a trajectory design problem may involve thousands or millions of Lambert solves (for example, sweeping over all possible departure and arrival dates to build a "porkchop plot"), the difference matters enormously.

Porkchop plot Earth Mars 2018 05 13 — A porkchop plot for an Earth-Mars transfer, showing contours of C3 (departure energy) as a function of launch and arrival dates. The minimum-energy transfer windows appear as elliptical basins; each point on this plot is computed via one Lambert solve.

For my project, the Lambert solver is the backbone of the trajectory planning baseline I need to compare against. When I want to check whether my RL policy is doing something sensible, or when I need to construct a reference trajectory for the patched-conics tour design, Lambert solves are the tool. Izzo's solver is distributed as part of the open-source PyKEP library from ESA, which conveniently interfaces with Python. This is the same library that Zavoli and Federici used in their RL framework, so the integration path is well-tested.

Why Lambert Matters for RL

Lambert's problem gives me a way to construct "expert" reference trajectories for any desired departure-arrival pair. These can serve as a sanity check for RL policy behavior, as targets for reward shaping, and as comparison baselines when measuring how much efficiency the RL policy sacrifices in exchange for robustness. Izzo's efficient solver means I can generate these references quickly enough to use them during training.

05 Autonomous Flyby Control & Limits of Patched Conics

The most immediately useful paper this week was Negri and Prado's (2023) study on autonomous control of gravity-assist hyperbolic trajectories. Negri and Prado zoom in on what actually happens during the flyby itself. Their central finding is important enough that it reframed how I'm thinking about the scope of my own project.

Their control strategy, called the Robust Keplerian Path-Following Control (RKPFC), is built on sliding mode control theory. Rather than tracking a specific reference trajectory in time (which would require the spacecraft to be at exactly the right place at exactly the right moment), the RKPFC drives the spacecraft to the correct orbital geometry, or the right shape and orientation of the hyperbola, without caring precisely when it arrives at each point. The sliding surface is defined in terms of integrals of motion:

# RKPFC sliding surface (Negri & Prado, 2023, Eq. 20)
$$\mathbf{s} = \begin{bmatrix} \tilde{\mathbf{e}} \cdot (\lambda_R \hat{r} + \hat{\theta}) \\ \tilde{h}(\hat{h}_d \cdot (\lambda_N \hat{r} + \hat{\theta})) \end{bmatrix} = \mathbf{0}$$ # $\tilde{\mathbf{e}} = \mathbf{e} - \mathbf{e}_d$: eccentricity vector error
# $\tilde{h} = h - h_d$: angular momentum magnitude error
# When $\mathbf{s} = \mathbf{0}$: spacecraft converges to desired Keplerian geometry

"Path following" rather than "trajectory tracking" is as a result the right control objective for a gravity assist. Once you're inside a planet's sphere of influence, what matters is preserving the hyperbolic geometry (the eccentricity, inclination, periapsis) that will produce the desired velocity change. A few minutes early or late at periapsis barely matters, while arriving with the wrong orbit shape destroys the assist entirely.

What does the Monte Carlo validation show? Without control authority, Negri and Prado show that initial position errors of around 50 km (a realistic figure for autonomous guidance) produce wildly divergent outgoing trajectories. In the Titan simulations, the outgoing semi-major axis has a standard deviation of over 2,400 km, and some trajectories interact catastrophically with Titan's atmosphere. With the RKPFC active, the standard deviation goes to essentially zero and all trajectories converge to the desired geometry.

Left: Titan, Saturn's largest moon and one of the test bodies in Negri & Prado's Monte Carlo simulations, here photographed by Cassini in approximately true color. Right: Enceladus, where the controlled gravity-assist maintained orbital geometry during an extreme 25 km periapsis altitude flyby that is lower than Cassini ever attempted.

Negri and Prado also simulated a Callisto-Ganymede-Europa-Io tour designed using a patched-conics model, then execute it in a more realistic N-body simulation of the Jovian system. However, even with their robust flyby controller active, the tour fails. The encounter with Europa occurs early due to chaotic N-body interactions, and the trajectory diverges completely from the design thereafter. A flyby control strategy alone cannot guarantee a tour based on patched-conics guidance in a truly chaotic multi-body environment. The guidance layer deciding the overall trajectory between flybys needs to be high-fidelity and adaptive.

The Critical Gap This Paper Exposed

Negri and Prado show that robust flyby control is a solved problem for individual encounters. The open challenge is the guidance layer that connects encounters into a coherent tour under N-body dynamics. A trained RL policy could potentially learn to account for the chaotic character of the Jovian system directly from simulation experience, treating the chaos as part of the environment rather than a perturbation to be corrected.

06 The Patched-Conics Approximation and Its Limitations

Most gravity-assist trajectory design relies on the patched-conics approximation. In the zero-SOI version, the encounter with a planet is treated as an instantaneous impulsive maneuver at the planet's center. In the more refined "finite-SOI" patched-conics model that Negri and Prado describe, the spacecraft's trajectory inside the sphere of influence is modeled as a genuine two-body hyperbola, and a Lambert solver connects the arcs between successive encounters. The decision variables for the tour become the encounter times $t_j$ and periapsis radii $r_{p,j}$ at each flyby body, and the optimization minimizes the sum of all corrective delta-v's needed to connect the arcs:

# Patched-conics tour optimization objective (Negri & Prado, 2023, Eq. 17)
$$\min \sum_j \left(\|\Delta\mathbf{V}_j^-\| + \|\Delta\mathbf{V}_j^+\|\right)$$ # $\Delta\mathbf{V}_j^-$: impulse before $j$-th flyby to achieve the desired approach
# $\Delta\mathbf{V}_j^+$: impulse after $j$-th flyby to continue on the designed transfer arc
# Decision variables: $t_j, r_{p,j}, e_j, i_j, \Omega_j, \omega_j$ for each encounter

This framework is clean and computationally manageable, which is why it has been the workhorse of interplanetary mission design for decades. But Negri and Prado's N-2 Circular Restricted N-Body Problem simulations show that in the outer planets, patched conics can be dramatically wrong. The Jovian moons move fast (Io completes a full orbit in under two days), so even the tens of minutes a spacecraft spends inside a moon's sphere of influence are enough for the moon to have moved significantly from where the patched-conics model assumed it was. The accumulated error across a multi-flyby tour makes the patched-conics solution essentially a warm-start guess rather than an executable flight plan.

This is where my thinking about the project is sharpening. For the inner solar system (Earth-Venus-Mars gravity assists), patched conics is a reasonable model and RL might simply need to learn to correct small deviations. For outer planet tours, which are the missions where gravity assists matter most, the RL agent will need to cope with fundamentally chaotic N-body dynamics. This could mean training in a high-fidelity simulation from the start, even if it is computationally expensive, rather than first training on a simple patched-conics model and expecting to transfer.

Design Question #2

Should I train the RL policy in a low-fidelity patched-conics environment first (faster training, cleaner reward signal) and then fine-tune in the high-fidelity Basilisk N-body environment, or is the dynamical gap so large that this transfer is harmful rather than helpful? Curriculum learning suggests the former might work, but the Negri & Prado results suggest the gap may be qualitative, not just quantitative.

07 Building the State Space from First Principles

The spacecraft's state at any instant lives in a seven-dimensional space: three position components, three velocity components, and one for remaining mass. Basilisk propagates this state forward using high-fidelity numerical integration, accounting for N-body gravitational forces, solar radiation pressure, and atmospheric drag where relevant. The RL agent observes this state and outputs a three-dimensional thrust vector.

But raw Cartesian state is not necessarily the most informative representation for the agent. Inspired by Negri and Prado's use of the eccentricity vector $\mathbf{e}$ and angular momentum vector $\mathbf{h}$ as the quantities to be controlled during a flyby, I'm considering augmenting the Cartesian state with derived orbital elements. An agent that can "see" how far its current orbit deviates from the desired flyby hyperbola can form much more targeted commands than one reasoning purely from position and velocity numbers.

However, unlike elliptical orbits, where standard Keplerian elements are always well-defined, the orbital elements of a hyperbolic trajectory inside a planet's SOI are defined relative to that planet's gravitational field, not the Sun's. The RL agent will need to switch reference frames as it moves between gravitational regimes, from heliocentric two-body during the interplanetary cruise, to planetocentric two-body during the flyby, and back again. The patched-conics approximation handles this transition (somewhat crudely) by stitching orbits together at the sphere of influence boundary. Building a smooth and numerically stable version of this for an RL observation function is a concrete implementation challenge ahead of me.

Animation of Cassini trajectory — Cassini's interplanetary trajectory to Saturn, which used two Venus flybys, one Earth flyby, and one Jupiter flyby to achieve the required heliocentric velocity. Each arc between planets is a separate conic section, stitched together at the sphere of influence of each flyby body.

08 Looking Ahead

Next week I'll be finishing the reading phase and moving into environment implementation. The first concrete milestone is getting Basilisk installed and running simple test propagations, confirming that a spacecraft in a Keplerian orbit stays on that orbit, that a hyperbolic flyby gives the expected velocity change, and that the sphere-of-influence transitions behave consistently. These are sanity checks, but failing them early is far better than failing them in the middle of RL training.

Two things from this week's reading are sitting unresolved in my mind. The first is the question of simulation fidelity versus training speed. Negri and Prado's results suggest that patched-conics guidance fails in the outer solar system because the dynamics are qualitatively different from the model. That means high-fidelity training may not be optional for multi-gravity-assist missions as it is the only setting where the problem is being posed correctly. This will likely force me to be aggressive about computational optimization from the start, possibly using parallelized environment rollouts or cloud computing.

The second unresolved question is about the guidance layer. Negri and Prado's flyby controller handles the local problem about keeping the spacecraft on the right hyperbola during an encounter but concludes that the global tour planning problem requires something beyond pure control. Reinforcement learning, trained end-to-end on the full multi-flyby problem, could in principle learn to handle both layers simultaneously. That is an ambitious goal, but it is also precisely what makes this research interesting. I'll be designing the Markov decision process formulation with this dual-layer structure in mind.

One question for anyone reading this: do you know of any open-source implementations of multi-body tour optimization that use something other than patched-conics as the baseline? I'm aware of GMAT and MONTE from JPL, but both are either proprietary or complex to integrate. Any pointers to Python-based tools for generating high-fidelity gravity-assist reference trajectories would be enormously useful.