THE SIMPLER MACHINE LEARNING METHOD, THE BETTER

THE SIMPLER MACHINE LEARNING METHOD, THE BETTER: OCCAM’S RAZOR

Accuracy and interpretability are equally important in Machine Learning (ML). Underestimating the latter is the main reason why the implementation of Artificial Intelligence in corporations fails. What will an interpretable and simpler Machine Learning model enable us? Not only to better fine tune the model but also to boost the trust levels of the client.

“Non sunt multiplicanda entia sine necessitate” (“Entities are not to be multiplied without necessity”) is a principle attributed to William Ockham in the 14th century. Obviously, this English Franciscan didn’t know anything about ML back then. And even less about how valuable his principle would become for it…

If you’d ask today to Ockham about ML he would probably say: “among all competing models that explain the data equally well, select the simple one”. Here we detail three points where Occam’s Razor principle can help:

HIGH DIMENSIONAL DATA == NOT COOL

The curse of dimensionality appears when an excessive number of data features overshadows the important ones, prompting metrics to fool us. Wouldn’t it be simpler removing meaningless features?

Dimension reduction is a powerful preprocessing method to achieve it. Applying feature selection or extraction will reduce the dataset while still keeping the relevant information (variance). Principal Component Analysis (PCA), the most common of all, allows us to understand and visualise data, by discarding insignificant features.

SELECT SIMPLER MACHINE LEARNING MODELS LIKE OCCAM WOULD

Following data preprocessing, it’ll feed a model. An interesting approach to model selection lies in the Information Theory’s concept: Minimum Description Length (MDL).

This principle states that the selection of the data model must be the most compact description of data, including the description of the model itself.

This must-read article shows that by putting together Bayesian Inference and MDL, we can estimate that the best model to encode data is the one that minimises the sum of: length of the model (simplicity) and the error rate (accuracy).

It turns out that maths not only support Occam’s philosophical statement, but also the accuracy vs simplicity dilemma, and that’s simply beautiful.

PROBLEMS BEHIND ACCURACY

Finally, overfitting may occur. This results in improvement of the model performance during the data training, resulting in poor improvement during the testing. What does Occam have to say in this regard?

Again, most known methods tackling overfitting can be seen as an application of Occam’s Razor principle. Regularization consists in adding a term to the model avoiding complexity, pruning can be considered as reducing the length of the model and early stopping shows disadvantages from extra cost in time.

In short… don’t use a lot where a little will do