Kernel Scheduler Test and Evaluation Platform

Deterministic tests for bugs hiding inside the Linux CPU scheduler.

kSTEP gives researchers and kernel developers precise control over scheduler-invoking events, isolated CPUs, mocked time, and repeatable traces so subtle scheduler behavior can be triggered, observed, and compared without benchmark noise.

232

scheduler bug-fix commits studied since 2020

73%

silent functional faults or policy misalignments

7 + 4

real-world bugs reproduced and new bugs uncovered

0 noise

stable traces for side-by-side kernel comparisons

Paper

kSTEP: Characterization and Deterministic Testing of Linux CPU Scheduler Bugs

Tingjia Cao, Shawn (Wanxiang) Zhong, Caeden Whitaker, Ke Han, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau.

Bug study

Scheduler bugs matter, but they are hard to see.

The study separates functionality bugs from policy bugs: failures where Linux keeps running, but scheduler behavior quietly drifts from intended goals such as balance, fairness, locality, and energy efficiency.

15%

Fatal failures

Crashes, oopses, hard lockups, or hangs are visible, but they are only part of the story.

73%

No warnings

Most bugs leave no useful panic, warning, or default trace of the triggering event sequence.

75%

Semantic root causes

Wrong state updates and logic faults dominate, demanding scheduler-specific understanding.

45%

Hidden for years

Many fixes land only after long exposure in mainline, with some bugs surviving over a decade.

54%

Attribute-sensitive

Task, cgroup, and system-level scheduler attributes form a large input space for testing.

28%

Beyond userspace

Kernel events and special hardware topologies are often required to trigger real bugs.

Platform

Controlled execution without inventing fake scheduler states.

kSTEP runs driver programs in kernel space and dispatches normal scheduler-invoking events into an isolated environment. The scheduler remains unmodified; tests guide it into target states through realistic code paths.

Architecture diagram of kSTEP
01

Precise timing

Explicit ticks advance logical time and place events inside narrow transient windows.

02

Rich control

Drivers can orchestrate tasks, cgroups, kernel activities, CPU frequency, and topology.

03

Deterministic traces

Mocked clocks, isolated CPUs, and reset state make the same input produce the same trace.

04

High fidelity

kSTEP avoids direct scheduler-state mutation, preserving behavior reachable on real systems.

Evaluation

Short drivers expose bugs that benchmarks struggle to reproduce.

Case-study reproducers use compact event sequences to trigger scheduler failures, then compare buggy and fixed kernels with deterministic traces.

Sync wakeup locality

A wakee selects a remote CPU despite the waker CPU becoming idle, causing an initial slowdown.

Cgroup vruntime overflow

A transient hierarchy state can inflate vruntime and starve a task almost indefinitely.

Load-balancing waste

Topology-sensitive bugs make CPUs rebalance redundantly or stay idle with runnable work.

Utilization misaccounting

Bad accounting after sleep can delay CPU frequency ramp-up for latency-sensitive tasks.

Reproduced bugs

Buggy-vs-fixed traces from kSTEP drivers.

Each reproducer plot links to the rendered trace. The card links point to the driver source and the raw buggy and fixed kSTEP outputs used to generate the figure.

Resources

Build on deterministic scheduler execution.

kSTEP can support reproducible debugging, regression tests, policy evaluation, anomaly detection, fuzzing, and agent-driven scheduler test generation.