In the digital soil mapping framework, machine learning (ML) algorithms are currently the most popular methods for the spatial prediction of soil properties. The fast developments of easy-to-use software implementations for a large panel of ML algorithms have encouraged comparison studies between algorithms, with the goal of ranking their performances and identifying the best ones among them. However, as no firm conclusions can be drawn about the best ML algorithm to be used in general, this suggests that combining a set of them could be a better approach. Numerous methods have been proposed to do so, most of them relying on a linear weighting of the individual algorithms. However, there are almost as many methods for linearly weighting ML algorithms as there are ML algorithms, thus leaving the problem unsolved. Moreover, these weighting methods are mostly used out-of-the-box, without paying a proper attention to the associated hypotheses. In this paper, we propose to address this issue by setting the problem in a more formal framework. Starting from classical hypotheses, it is shown how the benefit of averaging various ML algorithms can be estimated from their joint performances. Relying afterwards on the most commonly used linear weighting schemes, it is reminded that, as long as the performance metrics are based on mean square errors, the best averaging method is by essence the best linear (unbiased) predictor. Using a more general Bayesian framework, it is also shown that accounting for conditional biases when weighting ML algorithms is a key issue for obtaining improved predictions, and explicit formulas are proposed for that goal. Finally, these theoretical results are illustrated and discussed using a soil data set collected over an arid and semi-arid region in Iran where clay content, calcium carbonate equivalent, soil organic carbon and electrical conductivity were measured in topsoil samples.