Effective Transfer Learning - A Guide to Feature Extraction and Fine-Tuning Techniques

Transfer learning is a technique in machine learning that allows a model trained on one task to be reused and fine-tuned for another similar task. The idea behind transfer learning is that a model that has already learned to recognize patterns in one set of data can be applied to a different but related problem, allowing the model to learn faster and with less data than if it were trained from scratch.

There are two main ways to perform transfer learning:

  1. Feature extraction: In this approach, you take a pre-trained model and remove the last layers (the ones that are responsible for making the final prediction), and add new layers on top. The pre-trained model has already learned useful features from the data, so by reusing these features, you can train a new classifier with less data. This is useful when you have a small dataset and want to leverage the knowledge of a pre-trained model.
  2. Fine-tuning: In this approach, you take a pre-trained model and unfreeze some of the layers near the bottom of the network, and then retrain the entire model with a new dataset. This allows the model to learn from both the new data and the pre-trained weights, which can lead to better performance on the new task. This is useful when you have a larger dataset and want to adjust the pre-trained model to work better for your specific task.

Feature Extraction using Keras

Here's an example of how you can use Keras to perform feature extraction using the ResNet50 model:

 1from keras.applications import ResNet50
 2from keras.layers import Dense
 3from keras.models import Model
 4
 5# Load the ResNet50 model with pre-trained weights
 6base_model = ResNet50(weights='imagenet', include_top=False)
 7
 8# Freeze the layers of the model
 9for layer in base_model.layers:
10    layer.trainable = False
11
12# Add a new fully connected layer for the output
13x = base_model.output
14x = Dense(1024, activation='relu')(x)
15predictions = Dense(10, activation='softmax')(x)
16
17# Create a new model that takes the base_model as input and the new output layer as output
18model = Model(inputs=base_model.input, outputs=predictions)
19
20# Compile and train the model
21model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
22model.fit(X_train, y_train, epochs=10, batch_size=32)

In this example, the ResNet50 model is loaded with pre-trained weights and all of its layers are frozen. Then, a new fully connected layer is added on top of the ResNet50 model, which is connected to the output of the model. This new fully connected layer is then trained using the X_train data and y_train labels.

Fine-tuning using Keras

Here's an example of how you can use Keras to fine-tune the ResNet50 model:

 1from keras.applications import ResNet50
 2from keras.layers import Dense
 3from keras.models import Model
 4
 5# Load the ResNet50 model with pre-trained weights
 6base_model = ResNet50(weights='imagenet', include_top=False)
 7
 8# Unfreeze some of the layers of the model
 9for layer in base_model.layers[:15]:
10    layer.trainable = False
11for layer in base_model.layers[15:]:
12    layer.trainable = True
13
14# Add a new fully connected layer for the output
15x = base_model.output
16x = Dense(1024, activation='relu')(x)
17predictions = Dense(10, activation='softmax')(x)
18
19# Create a new model that takes the base_model as input and the new output layer as output
20model = Model(inputs=base_model.input, outputs=predictions)
21
22# Compile and train the model
23model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
24model.fit(X_train, y_train, epochs=10, batch_size=32)

In the above code,

  1. The base_model variable is assigned the ResNet50 model with pre-trained weights on 'imagenet' dataset. By setting include_top=False it removes the last layers of the ResNet50 model, these layers are responsible for making the final prediction, so that we can add new layers on top.
  2. The next block uses a for loop to iterate over the layers of the base model. The first 15 layers are set to be untrainable, while the remaining layers are set to be trainable. This is done using the trainable attribute of the layers, which controls whether the gradients of the weights of the layer should be updated during training.
  3. Next, a new fully connected layer is added on top of the output of the base model. This layer, called x, applies a ReLU activation function to the input and has 1024 units.
  4. Then, another dense layer is added on top of the x layer, this is called predictions .This new layer applies a softmax activation function and has 10 units as this is a classification problem with 10 classes.
  5. Then, a new model is created with base_model as input and the new output layer as output. This is done by instantiating the Model class, passing in the base_model input and the new output layer.

Author: Sadman Kabir Soumik

Posts in this Series