Skip to the content.

A large part of data-driven machine learning is to do with model training where the model parameters are progressively updated with optimisation algorithms so that the model progressively improves as reflected through some performance criteria. Closely related to model training, validating and testing also play a huge role in ensuring the model may generalise its performance to relevant scenarios outside of training. Failing to understand those concepts poses substantial risks in recognising the model performance. This then leads to a diminished benefit from applying models.

As an analogy, a modeller training a model is similar to a student studying for exams. The goal of training a model is to apply it in making a prediction or in a more general sense producing some insights regarding the matter at hand. Similarly, a student studies a course in aim of performing well in the final exam and perhaps becoming capable of applying the contents learned in real life. Like model training, a student learn contents by correcting mistakes and updating his or her interpretations. Model validation on the other hand is comparable to the student taking mock exams to validate how well the contents have been learned. Model testing at the end is like a final exam at the end where completely unseen questions are used to assess how well the student is performing. We will take advantage of this (perhaps highly relatable) analogy throughout the entire writing.

Contents

1. Overview

We present a short overview of common modelling practices. We introduce the concepts of training, validating and testing as well as the data splitting under the context of modelling.

2. Model Training

We explain what happens during model training and what a minimal model training procedure entails, introducing the related and very important concepts of underfitting and overfitting.

3. Model Testing

We describe the need for model testing and how that assists modellers in addressing the problems of underfitting and overfitting arise from model training. We explain how model testing aims to address the issues in model training.

4. Model Validations

We clarify why model validations are usually performed during training. Model validations seemingly fulfill the role of model testing in a redundant manner but it is indeed different and must be separated from the testing procedure which takes place at the very end.

5. Common Modelling Workflow

We summarise the common modelling workflow, providing a general picture of what the normal science entails in the relevant field. We loop back to the concepts introduced in earlier chapters.