B.OB: Robots learning from a slow but careful teacher
2024 / Master's thesis, TU Dortmund x Bertrandt / ROS 2 + NMPC + CasADi + acados + PyTorch + Nav2 + Gazebo + Raspberry Pi

TLDR
Master's thesis at TU Dortmund × Bertrandt. Inherited a Pi-based office robot with no software. Built the stack from scratch in ROS 2, then tackled the headline problem: NMPC gave the path-following quality we wanted, but its solve time was too long to keep the robot supplied with fresh commands. Trained a small network offline to imitate NMPC, swapped it in on the host, and kept obstacle handling out of the controller via a separate replanner. The substitution did its job on compute — commands arrived faster than NMPC could produce them. Smoothness in practice was a different problem: the wireless link between host and Pi added jitter, and the dynamics model didn't include the trolley wheel.
The challenge
B.OB is a small office robot for delivering letters, coffee, and paper around the building. Two requirements: it has to know where it is, and it has to follow a given path.
Path-following at the quality we wanted meant NMPC — a controller that plans several seconds ahead and re-solves an optimisation each cycle to keep the robot on the line. The catch is that those solves are expensive. By the time one finishes, the situation has moved on, and the command B.OB receives is already stale.
B.OB itself runs on a Raspberry Pi handling sensors, motor control, and the radio link. The rest of the stack — perception, planning, controller — sits on a host computer over the network. The question was whether NMPC could produce fresh commands fast enough on the host to keep the robot supplied at the rate it needs. With obstacle handling and the rest of the stack on top, it couldn't.
The approach was distillation: train a small neural network offline to imitate NMPC, then swap it in for NMPC on the host. The network produces near-identical commands much faster than NMPC could. The topic was one I'd been wanting to work on, which made the thesis a good fit.
The problems we hit
We inherited a robot with no software at all. Before any of the thesis work could happen, B.OB had to actually drive. First by remote control, then with the basic plumbing in place for sensor reads, SLAM, localisation, and motor control.
Then the harder ones:
The lidar sat too low. B.OB has a tall e-stop pole on top (reachable from standing height). When the lidar mapped the office, it saw four table legs with an open gap between them, so the map said "passable." The pole said otherwise, and the planner happily routed B.OB straight at a tabletop.
NMPC needed re-tuning for every obstacle size. Obstacles can be encoded as constraints inside the controller, and we did. But the tuning didn't generalise — a controller tuned around one obstacle size didn't behave consistently around another.
The robot needed to set its own pace. A lot of robotics control assumes a trajectory: be at point P at exactly time T. That breaks the moment B.OB has to pause for a person walking past. Office robots care about staying along the path, not staying on a clock.
The team was split between cities. We couldn't always be in the same room as the robot, and burning batteries to test small software changes was a bad use of time.
How we solved them
For the table-legs problem, I painted no-go zones over every table footprint and gave them to Nav2's keepout filter. Mapping ran on slam_toolbox; once the map was built I imported it into Gazebo via map2gazebo, so the simulated B.OB and the real B.OB drove through the same building.

For the pacing problem, I rewrote the controller around path-following instead of trajectory-following. The robot tracks how far along the path it is, and the controller inches that "how far" value forward at whatever pace makes sense. Slowing down for someone walking past no longer breaks anything.
For the teacher-student part, I followed a 2023 paper by Zometa & Faulwasser (arXiv:2301.11909) on distilling MPC-style controllers into small networks for embedded hardware. The non-obvious detail: the network has to be told how far along the path the robot is. Without that, the same physical position can correspond to different correct actions, and the network learns an inconsistent mapping. With it, training is straightforward — let NMPC drive in simulation, log every state and the action it picked, train a small PyTorch network to imitate it.
One catch with the 2023 paper: it trained the network on straight-line and parabolic segments only, not arbitrary path shapes. To use the distilled controller on real office routes, I built a path planner that fits a generic path with a sequence of straight lines and parabolas, so the controller only ever sees one segment type at a time — exactly what it was trained on.
For obstacles, instead of putting them in the controller and re-tuning per obstacle size, I built a separate detour engine. It watches the path ahead, and if something blocks the route, it asks the planner for a fresh path and hands it over to the controller. The controller never even knows there was an obstacle. Clean separation: the controller drives, the detour engine handles the world.
For the remote-team problem, the simulator paid off twice. A teammate in another city could keep contributing without the physical robot, and I could iterate on path-following without rebooting the Pi every five minutes.
Underneath all of it, the controller had an 18-second prediction horizon and ran at 20 Hz. It was written in CasADi and solved by acados. Localisation came from a wheel-encoder / IMU / lidar fusion — robot_localization for the EKF on the inertial sensors, AMCL for matching against the map.

The result
In simulation, the network reproduced NMPC's driving behaviour at a fraction of the compute. On the real B.OB in the Bertrandt office, NMPC delivered better path-following accuracy when measured in isolation, but its solve time meant the control loop couldn't keep up once the full system was running. The student network plus the detour engine worked end-to-end: B.OB stayed on the path, avoided obstacles, and commands arrived at the rate the robot needed. Driving in practice still wasn't fully smooth — the wireless link between host and Pi added jitter, and the dynamics model both NMPC and the network were built around didn't include the trolley wheel.

The trade-off was the usual one between classical control and learned approximators. NMPC came with built-in guarantees: hard input and state constraints (velocity and angular-velocity bounds in B.OB's case), guaranteed feasibility within those bounds, and predictable behaviour for situations it was designed for. The NN imitating it gave speed but didn't inherit those guarantees. In principle it can output a command outside the original constraint set, and its behaviour in situations far from the training distribution is harder to reason about than the optimiser's. For an office robot driving routes inside a known map, that was an acceptable trade. For a system where safety has to be provable, it wouldn't be.
The takeaway was the framing more than any single component. Teacher-student distillation suits problems where a capable controller is too expensive to run live and where giving up the optimiser's formal guarantees is an acceptable price for the speedup. The bigger lesson was about boundaries: don't make one system do everything. Let the controller drive, let the planner plan, let the detour engine detour. Easier to reason about, easier to swap parts.
The path-following framing turned out to be the most reusable piece. Any differential drive robot with a reference path and a tight compute budget can borrow this setup.
Technologies and patterns
NMPC · CasADi · acados · PyTorch · imitation learning · distillation · custom path planner (line/parabola decomposition) · ROS 2 · Nav2 (keepout filter, global planner) · slam_toolbox · AMCL · robot_localization (EKF) · Gazebo · map2gazebo · Raspberry Pi · Lidar · IMU