Model Training and Evaluation: Loss, Overfitting, and Accuracy
Model Training and Evaluation: Loss, Overfitting, and Accuracy

Model Training and Evaluation: Loss, Overfitting, and Accuracy

When people train their first model, they often focus only on accuracy. To understand whether a model is reliable, you need to know what training adjusts, what a loss function measures, why overfitting happens, and why test data must stay separate.

This article explains the basics of model training and evaluation: parameters, loss functions, epochs, overfitting, validation data, test data, and common classification metrics.

If the previous article explained how to organize a machine learning project, this one explains how to decide whether the training process is trustworthy.

1. What Does Training Adjust?

A model can be viewed as a function with parameters:

prediction = model(input_features, parameters)

Before training, parameters may be random or initialized with default values. Training adjusts those parameters so model output becomes closer to the true labels.

A very simple linear model looks like this:

y = w1 * x1 + w2 * x2 + b

Here, w1, w2, and b are parameters. Training tries to find better values for them.

2. What Is a Loss Function?

The model needs a way to measure how wrong a prediction is. That measurement is the loss function.

For regression, a simple loss can be squared error:

loss = (y_true - y_pred) ** 2

For classification, cross-entropy loss is common. You do not need to derive the formula at the beginning, but the intuition matters:

A confidently wrong prediction receives a large loss. A prediction close to the correct answer receives a smaller loss.

During training, the algorithm tries to reduce the overall loss.

3. The Intuition Behind Gradient Descent

Many models use gradient descent or a variant of it to update parameters. Think of it as walking downhill:

  1. The current parameters produce a loss value
  2. The algorithm estimates which direction reduces loss
  3. The parameters move a small step in that direction
  4. The process repeats many times

An important hyperparameter is the learning rate. If it is too small, training is slow. If it is too large, training can bounce around or fail to converge.

new_weight = old_weight - learning_rate * gradient

This is not the full mathematical story, but it explains why training loops repeat parameter updates.

4. Epoch, Batch, and Iteration

Deep learning training often uses these terms:

  • Epoch: one full pass through the training set
  • Batch: a small group of samples used for one update step
  • Iteration: one parameter update

If the training set has 1000 samples and the batch size is 100, one epoch contains 10 iterations.

Traditional machine learning libraries may not expose these terms directly, but the basic idea is similar: the model uses training data to adjust parameters.

5. Why Overfitting Happens

Overfitting means the model performs well on training data but much worse on new data.

Common causes include:

  • The model is complex enough to memorize noise and details in the training set
  • The training data is too small to represent the real problem
  • The features contain information that should not be available, also called data leakage
  • The model trains for too long without validation monitoring

The danger is that training metrics can look excellent while real-world performance is poor.

6. Training, Validation, and Test Data

For reliable evaluation, data is often split into three parts:

  • Training set: used to fit parameters
  • Validation set: used to tune settings, select models, and watch for overfitting
  • Test set: used at the end to estimate final generalization

For small practice projects, a training/test split can be enough. But remember: the test set should not be used repeatedly for tuning, or it becomes part of the decision process.

7. Common Classification Metrics

Classification should not be judged by accuracy alone. These metrics often appear together:

  • Accuracy: the proportion of correct predictions
  • Precision: among predicted positives, how many are truly positive
  • Recall: among true positives, how many were found
  • F1-score: a combined measure of precision and recall

For medical screening, missing a real positive case may be costly, so recall may matter more. For automatic account blocking, falsely blocking normal users may be costly, so precision may matter more.

8. Reading a Confusion Matrix

A confusion matrix compares predicted labels with true labels:

                  predicted negative  predicted positive
true negative              TN                  FP
true positive              FN                  TP
  • TP: a positive sample predicted correctly
  • TN: a negative sample predicted correctly
  • FP: a negative sample incorrectly predicted as positive
  • FN: a positive sample incorrectly predicted as negative

The advantage of a confusion matrix is that it shows not only how many mistakes happened, but also which direction those mistakes went.

9. Evaluation Checklist

When evaluating a model, check these questions:

  • Was the test set isolated from training?
  • Are the classes heavily imbalanced?
  • Did you look beyond accuracy?
  • Was the model compared with a simple baseline?
  • Did you inspect some wrong predictions manually?
  • Is the gap between training performance and test performance too large?

The point of training is not merely to push one metric upward. The point is to build a trustworthy evaluation process and understand when the model is likely to fail.

10. What a Trustworthy Training Record Includes

A useful training record should include at least these details:

  • How training, validation, and test data were split
  • The model, important parameters, and random seed
  • Training metrics and test metrics, not just one final score
  • Error analysis, especially for the most costly error types
  • Comparison against a simple baseline model

These notes may look small, but they make the experiment auditable when you return to it later.

11. What to Read Next

The previous article is Machine Learning Workflow. After training and evaluation are clear, continue with Neural Network Basics to connect parameters with multi-layer function composition.

Search questions

FAQ

Who is this article for?

This article is for readers who want a beginner-level guide to Model Training and Evaluation. It takes about 9 min and focuses on Model Training, Metrics, Evaluation.

What should I read next?

The recommended next step is Neural Network Basics, so the article connects into a longer learning route instead of ending as an isolated note.

Does this article include runnable code or companion resources?

This article is primarily explanatory, but the related tutorials point to runnable examples, resources, and project pages.

How does this article fit into the larger site?

It is connected to the article context block, learning routes, resources, and project timeline so readers can move from concept to implementation.

Article context

AI Learning Project

A practical route from AI concepts to machine learning workflow, evaluation, neural networks, Python practice, handwritten digits, a CIFAR-10 CNN, adversarial traffic-defense notes, and AI security.

Level: Beginner Reading time: 9 min
  • Model Training
  • Metrics
  • Evaluation
Other language version 模型训练与评估入门:损失函数、过拟合和准确率怎么理解
Share summary Model Training and Evaluation

Understand loss, overfitting, train/test splits, accuracy, recall, and F1.

Download share card Open share center

Leave a Reply

Project timeline

Published posts

  1. AI Basics Learning Roadmap Separate AI, machine learning, and deep learning before going into implementation details.
  2. Machine Learning Workflow Follow the practical path from data and features to training, prediction, and evaluation.
  3. Model Training and Evaluation Understand loss, overfitting, train/test splits, accuracy, recall, and F1.
  4. Neural Network Basics Move from perceptrons to activation, forward propagation, backpropagation, and training loops.
  5. NLP Basics: Understanding Bag of Words and TF-IDF An introduction to the most fundamental text representation methods in NLP: Bag of Words (BoW) and TF-IDF.
  6. RNN Basics: Handling Sequential Data with Memory Understand the core concepts of Recurrent Neural Networks (RNN), the role of hidden states, and their application in NLP.
  7. Transformer Self-Attention Read Q/K/V, scaled dot-product attention, multi-head attention, and positional encoding before exploring LLM internals.
  8. Python AI Mini Practice Run a small scikit-learn classification task and read the experiment output.
  9. Handwritten Digit Dataset Basics Read train.csv, test.csv, labels, and the flattened 28 by 28 pixel layout before training the classifier.
  10. Handwritten Digit Softmax in C Follow the C implementation from logits and softmax probabilities to confusion matrices and submission export.
  11. Handwritten Digit Playground Notes See how the offline classifier was adapted into a browser demo with drawing input and probability output.
  12. CIFAR-10 Tiny CNN Tutorial in C Build and train a small convolutional neural network for CIFAR-10 image classification, then read its loss and accuracy output.
  13. Building a Tiny CIFAR-10 CNN in C: Convolution, Pooling, and Backpropagation A source-based walkthrough of cifar10_tiny_cnn.c, covering CIFAR-10 binary input, 3x3 convolution, ReLU, max pooling, fully connected logits, softmax, backpropagation, and local commands.
  14. High-Entropy Traffic Defense Notes Study encrypted metadata leaks, entropy, traffic classifiers, and a defensive Python chaffing prototype.
  15. AI Security Threat Modeling Build a defense map with NIST adversarial ML, MITRE ATLAS, and OWASP LLM risks.
  16. Adversarial Examples and Robust Evaluation Evaluate clean and perturbed accuracy with an FGSM-style digits experiment.
  17. Data Poisoning and Backdoor Defense Study poison rate, trigger behavior, attack success rate, and training pipeline controls.
  18. Model Privacy and Extraction Defense Measure membership inference signal and surrogate fidelity against a local toy model.
  19. LLM, RAG, and Agent Security Separate instructions from data and enforce tool permissions against indirect prompt injection.

Published resources

  1. Python AI practice code guide The article includes a runnable scikit-learn classification script.
  2. digit_softmax_classifier.c The C source for the handwritten digit softmax classifier.
  3. train.csv.zip Compressed handwritten digit training set with 42000 labeled samples.
  4. test.csv.zip Compressed handwritten digit test set with 28000 unlabeled samples.
  5. sample_submission.csv The official submission format example for checking the final output columns.
  6. submission.csv The prediction file generated by the current C project.
  7. digit-playground-model.json The compact softmax demo model and sample set used by the browser playground.
  8. digit-sample-grid.svg A small handwritten digit preview grid extracted from the training set.
  9. Handwritten digit project bundle Contains the source file, compressed datasets, submission files, browser model, and preview grid.
  10. cifar10_tiny_cnn.c source Single-file C tiny CNN with CIFAR-10 loading, convolution, pooling, softmax, and backpropagation.
  11. model_weights.bin sample weights Model weights generated by one local small-sample run.
  12. test_predictions.csv sample predictions Sample test prediction output from the CIFAR-10 tiny CNN.
  13. CNN project explanation PDF Companion explanation material for the CNN project.
  14. Virtual Mirror redacted code skeleton A redacted mld_chaffing_v2.py control-flow skeleton with secrets, node topology, and target lists removed.
  15. Virtual Mirror stress-test template A redacted CSV template for CPU, memory, peak threads, pulse rate, latency, and error measurements.
  16. Virtual Mirror classifier-evaluation template A CSV template for TP, FN, FP, TN, accuracy, precision, recall, F1, ROC-AUC, entropy, and JS divergence.
  17. Virtual Mirror resource notes Notes explaining why the public resources include only redacted code, test templates, and architecture context.
  18. AI Security Lab README Setup, safety boundaries, and quick-run commands for the AI Security series.
  19. AI Security Lab full bundle Includes safe toy scripts, result CSVs, risk register, attack-defense matrix, and architecture diagram.
  20. AI security risk register CSV risk register template for AI threat modeling and release review.
  21. AI attack-defense matrix Maps attack surface, toy demo, metric, and defensive control into one CSV table.
  22. AI Security Lab architecture diagram Shows threat modeling, robustness, data integrity, model privacy, and RAG guardrails.
  23. FGSM digits robustness script FGSM-style perturbation and accuracy-drop experiment for a local digits classifier.
  24. Data poisoning and backdoor toy script Demonstrates poison rate, trigger behavior, and attack success rate on digits.
  25. Model privacy and extraction toy script Outputs membership AUC, target accuracy, surrogate fidelity, and surrogate accuracy.
  26. RAG prompt injection guard toy script Uses a deterministic toy agent to demonstrate external-data demotion and tool-policy blocking.
  27. Deep Learning topic share card A 1200x630 SVG card for sharing the Deep Learning / CNN topic hub.
  28. Machine Learning From Scratch share card A 1200x630 SVG card for the K-means, Iris, and ML workflow topic hub.
  29. Student AI Projects share card A 1200x630 SVG card for handwritten digits, C classifiers, and browser demos.
  30. CNN convolution scan animation An 8-second Remotion animation showing how a 3x3 convolution kernel scans an input and builds a feature map.

Current route

  1. AI Basics Learning Roadmap Learning path step
  2. Machine Learning Workflow Learning path step
  3. Model Training and Evaluation Learning path step
  4. Neural Network Basics Learning path step
  5. Transformer Self-Attention Learning path step
  6. LLM Visualizer Learning path step
  7. Python AI Mini Practice Learning path step
  8. Handwritten Digit Dataset Basics Learning path step
  9. Handwritten Digit Softmax in C Learning path step
  10. Handwritten Digit Playground Notes Learning path step
  11. CIFAR-10 Tiny CNN Tutorial in C Learning path step
  12. High-Entropy Traffic Defense Notes Learning path step
  13. AI Security Threat Modeling Learning path step
  14. Adversarial Examples and Robust Evaluation Learning path step
  15. Data Poisoning and Backdoor Defense Learning path step
  16. Model Privacy and Extraction Defense Learning path step
  17. LLM, RAG, and Agent Security Learning path step

Next notes

  1. Add more image-classification and error-analysis cases
  2. Turn common metrics into a quick reference
  3. Add more AI security defense experiment notes