Johns Hopkins researchers have developed a new mathematical method that could help businesses, financial planners, and engineers make smarter high-stakes decisions by teaching computers to think more like real people, acting at the right moment instead of adjusting constantly.
The research, led by Haoyang Cao, assistant professor in the Whiting School of Engineering’s Department of Applied Mathematics and Statistics and member of the Data Science and AI Institute, focuses on “impulse control” decision making, a framework that reflects how organizations actually operate. Rather than making tiny, continuous changes, companies typically wait until conditions reach a tipping point and then act decisively, like an Amazon warehouse suddenly placing a large restocking order or an investor moving funds in one major trade.
Their results appeared on the preprint website ArXiv.
“This type of model is much closer to reality,” Cao said. “In practice, people observe a situation over time, and only when something important happens do they make a move. It’s sudden, not gradual.”
That realism matters. Poorly timed decisions can lead to empty shelves, wasted inventory, unnecessary transaction fees, or lost profits. Yet translating those jump-style choices into mathematics has long challenged researchers. Classical methods assume smooth, continuous behavior—conditions that break down when decisions arrive as abrupt shocks.
To bridge that gap, the team turned to machine learning, particularly reinforcement learning, where an algorithm learns through exploration and feedback. Their goal was to design a system that could experiment with different decision timings, learn from the outcomes, and gradually discover near-optimal strategies.
But impulse problems introduced a stubborn obstacle. Costs and rewards accumulate over time, yet the very moments when actions occur are unpredictable. “If the timing itself is random, then the usual way of accumulating cost over time no longer works,” Cao explained. “We had to rethink the mathematical representation from the ground up.”
The researchers developed an operator-based framework that tracks how a system “renews” after each jump. For example, how inventory levels reset after a major order. That perspective allowed the learning algorithm to handle discontinuities that normally destabilize numerical methods.
“Exploration actually helps smooth out those sharp kinks,” Cao said. “It increases the regularity of the problem just enough that reinforcement learning becomes feasible.”
The implications are immediate for industries that depend on well-timed interventions. Retailers could better determine when to reorder goods and by how much. Energy providers could schedule costly equipment maintenance more efficiently. Financial platforms could help users avoid excessive trading fees by recommending fewer, better-timed moves.
Co-author Zhouhao Yang, PhD student in applied mathematics and statistics, said the breakthrough required blending perspectives. “We couldn’t rely only on traditional partial differential equation (PDE) analysis,” he said. “Combining control theory with learning algorithms gave us a new path to solve a problem that used to be too rigid.”
When businesses rely on simple rules to decide when to restock, they often end up ordering just enough inventory to stabilize a decline rather than truly optimizing supply. The team’s method aims to pinpoint what “too low” and “healthy level” should actually mean in the real world, where customer demand, prices, and shipping costs are constantly changing.
Instead of depending on perfect equations, the approach learns directly from experience, using data to discover better timing and order sizes. That flexibility could make the method especially useful in settings where companies face incomplete information or rapidly shifting markets.
The researchers are now expanding the work beyond a single decision maker. Future studies will examine situations where many agents interact, such as competing firms ordering from the same supplier or investors reacting to one another’s trades.
“Once you have multiple players, it becomes a true game,” Yang said. “What I do changes your environment, and what you do changes mine. That’s much closer to the real world.”
While the mathematics remains challenging, the team believes the practical payoff justifies the effort. “Real decision-making is messy,” Cao said. “But if we want algorithms that actually help people, whether in inventory management or finance, we need models that embrace that mess instead of ignoring it.”