Understanding the Role of Data Normalization and Standardization in Machine Learning
Why do we scale features?
For machine learning, every dataset does not require feature scaling, and it is only needed when features have different ranges.
For example, consider a data set containing two features, age(x1) and income(x2), where age ranges from 0–100, while income ranges from 0–20,000 and higher. Income is about 1,000 times larger than age and ranges from 20,000–500,000. So, these two features are in very different ranges. When we do further analysis, like multivariate linear regression, the attributed income will influence the result more due to its larger value. But this doesn’t necessarily mean it is more important as a predictor.
Because different features do not have similar ranges of values, gradients may take a long time, oscillate back and forth, and take a long time before they can finally find their way to the global/local minimum. To overcome the model learning problem, we normalize the data. We ensure that the different features take on similar ranges of values so that gradient descents can converge more quickly.
When to use Normalization?
Normalization typically means rescaling the values into a range of [0,1].
Normalization is an excellent technique to use when you do not know the distribution of your data or when you know the distribution is not Gaussian (a bell curve). Normalization is useful when your data has varying scales, and the algorithm you are using does not make assumptions about the distribution of your data, such as k-nearest neighbors and artificial neural networks.
When to use Standardization?
Standardization: typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).
Standardization assumes that your data has a Gaussian (bell curve) distribution. This does not strictly have to be true, but the technique is more effective if your attribute distribution is Gaussian. Standardization is useful when your data has varying scales, and your algorithm makes assumptions about your data having a Gaussian distribution, such as linear regression, logistic regression, and linear discriminant analysis.
Author: Sadman Kabir Soumik
Posts in this Series
- Ace Your Data Science Interview - Top Questions With Answers
- Understanding Top 10 Classical Machine Learning Algorithms
- Machine Learning Model Compression Techniques - Reducing Size and Improving Performance
- Understanding the Role of Data Normalization and Standardization in Machine Learning
- One-Stage vs Two-Stage Instance Segmentation
- Machine Learning Practices - Research vs Production
- Writing Machine Learning Model - PyTorch vs. TF-Keras
- GPT-3 by OpenAI - The Largest and Most Advanced Language Model Ever Created
- Vanishing Gradient Problem and How to Fix it
- Ensemble Techniques in Machine Learning - A Practical Guide to Bagging, Boosting, Stacking, Blending, and Bayesian Model Averaging
- Understanding the Differences between Decision Tree, Random Forest, and Gradient Boosting
- Different Word Embedding Techniques for Text Analysis
- How A Recurrent Neural Network Works
- Different Text Cleaning Methods for NLP Tasks
- Different Types of Recommendation Systems
- How to Prevent Overfitting in Machine Learning Models
- Effective Transfer Learning - A Guide to Feature Extraction and Fine-Tuning Techniques