Didi, China’s Uber equivalent, has been testing out a new algorithm for assigning drivers to riders in select cities.

The dispatching system uses reinforcement learning (RL), a subset of machine learning that relies on penalties and rewards to get “agents” to achieve a clear objective. In this case, the agents are the drivers and the rewards are their payments for completing a ride.

The company’s current dispatching algorithm has two parts: a forecasting system that predicts how rider demand changes over time, and a matching system that assigns drivers to jobs on the basis of those predictions.

It has served the company well thus far, but it can be inefficient. If the patterns of driver supply and rider demand change, the forecasting model needs to be retrained to continue making accurate predictions.

Moving to an RL approach solves this problem by collapsing both parts into one: with every subsequent piece of data, the algorithm learns to dispatch drivers more efficiently. That allows it to keep evolving with changing supply and demand, without any need to retrain. A/B tests between the old and new algorithms in a handful of cities have confirmed that the new one is indeed more efficient.

Didi is now planning a gradual roll-out of the new dispatching system to cities in China, though an exact time line hasn't been set. Tony Qin, the AI research lead for the company’s US division, told MIT Technology Review that the company may continue to conduct A/B tests between its different algorithms for different locations and use the one that produces the most efficient results.

The RL algorithm may not always be the best one, Qin said. It largely depends on the city’s supply and demand patterns. In the meantime, the company is also developing another RL dispatching algorithm, with different agents and rewards, to add to its arsenal.

An abridged version of this originally appeared in our AI newsletter The Algorithm. To have it delivered directly to your inbox, subscribe here for free.