kSTEP: Deterministic Linux Scheduler Testing

Bug study

Scheduler bugs matter, but they are hard to see.

The study separates functionality bugs from policy bugs: failures where Linux keeps running, but scheduler behavior quietly drifts from intended goals such as balance, fairness, locality, and energy efficiency.

15%

Fatal failures

Crashes, oopses, hard lockups, or hangs are visible, but they are only part of the story.

73%

No warnings

Most bugs leave no useful panic, warning, or default trace of the triggering event sequence.

75%

Semantic root causes

Wrong state updates and logic faults dominate, demanding scheduler-specific understanding.

45%

Hidden for years

Many fixes land only after long exposure in mainline, with some bugs surviving over a decade.

54%

Attribute-sensitive

Task, cgroup, and system-level scheduler attributes form a large input space for testing.

28%

Beyond userspace

Kernel events and special hardware topologies are often required to trigger real bugs.

Platform

Controlled execution without inventing fake scheduler states.

kSTEP runs driver programs in kernel space and dispatches normal scheduler-invoking events into an isolated environment. The scheduler remains unmodified; tests guide it into target states through realistic code paths.

01

Precise timing

Explicit ticks advance logical time and place events inside narrow transient windows.

02

Rich control

Drivers can orchestrate tasks, cgroups, kernel activities, CPU frequency, and topology.

03

Deterministic traces

Mocked clocks, isolated CPUs, and reset state make the same input produce the same trace.

04

High fidelity

kSTEP avoids direct scheduler-state mutation, preserving behavior reachable on real systems.

Evaluation

Short drivers expose bugs that benchmarks struggle to reproduce.

Case-study reproducers use compact event sequences to trigger scheduler failures, then compare buggy and fixed kernels with deterministic traces.

Sync wakeup locality

A wakee selects a remote CPU despite the waker CPU becoming idle, causing an initial slowdown.

Cgroup vruntime overflow

A transient hierarchy state can inflate vruntime and starve a task almost indefinitely.

Load-balancing waste

Topology-sensitive bugs make CPUs rebalance redundantly or stay idle with runnable work.

Utilization misaccounting

Bad accounting after sleep can delay CPU frequency ramp-up for latency-sensitive tasks.

Study results

Figures from the scheduler bug study.

These figures summarize the paper's characterization results: where scheduler bugs come from, how visible they are, and how fixes are validated. Together they show why scheduler bugs need controlled, deterministic tests rather than benchmark-only validation.

Root causes and consequences

Semantic state and logic faults dominate, while many consequences remain hard to observe externally.

Bug observability

Most scheduler bugs do not leave clear warnings or logs, especially policy misalignments.

Functionality vs. policy

The study separates strict correctness failures from policy compromises such as fairness and balance.

How scheduler bugs are discovered and tested

Discovery and validation

Current practice often relies on review or workload symptoms instead of concise regression tests.

Reproduced bugs

Buggy-vs-fixed traces from kSTEP drivers.

Each reproducer plot links to the rendered trace. The card links point to the driver source and the raw buggy and fixed kSTEP outputs used to generate the figure.

Sync wakeup buggy and fixed scheduler trace

sync_wakeup.c Sync wakeup locality

Driver Buggy output Fixed output

Vruntime overflow buggy and fixed scheduler trace

vruntime_overflow.c Cgroup vruntime overflow

Driver Buggy output Fixed output

Freeze reproducer buggy and fixed scheduler trace

freeze.c Delayed-dequeue freezer

Driver Buggy output Fixed output

Extra balance buggy and fixed scheduler trace

extra_balance.c Extra load balancing

Driver Buggy output Fixed output

Utilization average buggy and fixed scheduler trace

util_avg.c RT util_avg drop

Driver Buggy output Fixed output

Long balance buggy and fixed scheduler trace

long_balance.c Long balance scan

Driver Buggy output Fixed output

Lag vruntime buggy and fixed scheduler trace

lag_vruntime.c Lag vruntime reweight

Driver Buggy output Fixed output

Even idle CPU buggy and fixed scheduler trace

even_idle_cpu.c Even idle CPU

Driver Buggy output Fixed output

Local group imbalance buggy and fixed scheduler trace

local_group_imbalance.c Local group imbalance

Driver Buggy output Fixed output

Utilization average jump buggy and fixed scheduler trace

util_avg_jump.c Util avg jump

Driver Buggy output Fixed output

RT runtime toggle buggy and fixed scheduler trace

rt_runtime_toggle.c RT runtime toggle

Driver Buggy output Fixed output

uclamp inversion buggy and fixed scheduler trace

uclamp_inversion.c uclamp inversion

Driver Buggy output Fixed output

h_nr_runnable buggy and fixed scheduler trace

h_nr_runnable.c h_nr_runnable accounting

Driver Buggy output Fixed output

Resources

Build on deterministic scheduler execution.

kSTEP can support reproducible debugging, regression tests, policy evaluation, anomaly detection, fuzzing, and agent-driven scheduler test generation.

kSTEP code Framework, drivers, and automation scripts Bug study Dataset and analysis artifacts Paper PDF Full OSDI '26 paper

Deterministic tests for bugs hiding inside the Linux CPU scheduler.

kSTEP: Characterization and Deterministic Testing of Linux CPU Scheduler Bugs

Scheduler bugs matter, but they are hard to see.

Fatal failures

No warnings

Semantic root causes

Hidden for years

Attribute-sensitive

Beyond userspace

Controlled execution without inventing fake scheduler states.

Precise timing

Rich control

Deterministic traces

High fidelity

Short drivers expose bugs that benchmarks struggle to reproduce.

Sync wakeup locality

Cgroup vruntime overflow

Load-balancing waste

Utilization misaccounting

Figures from the scheduler bug study.

Buggy-vs-fixed traces from kSTEP drivers.

Build on deterministic scheduler execution.