Real Time Face Mask Detection using VGG16 CNN¶
Introduction¶
The project aims to detect whether a person is wearing a face mask or not using computer vision and machine learning techniques. The project utilizes the VGG16 pre-trained deep learning model and a custom dataset consisting of images with and without masks. The project is implemented using the Keras deep learning framework, OpenCV, and Python.
Dataset Collection¶
Data set consists of 7553 RGB images in 2 folders as with_mask and without_mask. Images are named as label with_mask and without_mask. Images of faces with mask are 3725 and images of faces without mask are 3828. This dataset is taken from following link:https://www.kaggle.com/datasets/omkargurav/face-mask-dataset .
Let's start with importing necessary libraries.
import os
import cv2
import random
import numpy as np
Dataset Preparation¶
The data consists of two classes: with_mask and without_mask. For this experiment, The dataset is created by iterating through all the images in the respective directories using the OpenCV library. The images are resized to 224x224 pixels, as required by the VGG16 model.
categories = ['with_mask', 'without_mask']
#Creating a custom dataset
data = []
for category in categories:
path = os.path.join('/Users/surajbhardwaj/Desktop/project_fmd/data/train', category)
label = categories.index(category)
for file in os.listdir(path):
img_path = os.path.join(path,file)
img = cv2.imread(img_path)
img = cv2.resize(img,(224,224)) #VGG16 takes images of size 224x224
data.append([img,label])
The dataset is then shuffled randomly to reduce any bias during training.
random.shuffle(data)
# Creating a numpy array
X=[]
y=[]
for features, label in data:
X.append(features)
y.append(label)
X =np.array(X)
y = np.array(y)
# Number of Images, Width, Height, Number of Channels
X.shape
(7553, 224, 224, 3)
y.shape
(7553,)
X is the NumPy array that contains the images of the custom dataset.
In order to normalize the image pixel values to a common scale, each pixel value in the NumPy array X is divided by 255.
Since the range of pixel values in an image is between 0 to 255, dividing each pixel value by 255 scales the pixel values to a range of 0 to 1. This is also known as feature scaling or normalization. Normalizing the pixel values of the images is a common preprocessing step in deep learning models, as it helps to reduce the impact of different scales of input features on the model performance.
# Scaling on X
X = X/255
The train_test_split method from the scikit-learn library is used to split the custom dataset into training and testing sets.
The train_test_split method takes the feature data X and the label data y as input and returns four sets of data: X_train, X_test, y_train, and y_test.
The first argument to the train_test_split method is the feature data X, and the second argument is the label data y. The test_size parameter is set to 0.2, which means that 20% of the data is used for testing, and the remaining 80% is used for training.
The train_test_split method shuffles the data and splits it into two sets: the training set and the testing set. The X_train and y_train sets contain 80% of the data, and they are used to train the model. The X_test and y_test sets contain the remaining 20% of the data, and they are used to evaluate the performance of the model.
Splitting the dataset into training and testing sets is an important step in building machine learning models, as it helps to evaluate the performance of the model on new, unseen data. By evaluating the model on the testing set, we can estimate how well the model will perform on new data.
# Performing the train and test split on 2D Numpy Array
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
X_train.shape
(6042, 224, 224, 3)
X_test.shape
(1511, 224, 224, 3)
Model Architecture¶
The VGG16 model is used as the base model for this project. The last layer of the VGG16 model is replaced with a Dense layer with a sigmoid activation function. The Dense layer consists of one neuron as the output is binary, i.e., whether the person is wearing a mask or not. The weights of the VGG16 model are frozen to avoid overfitting.
Now, Let's import the pre-trained VGG16 convolutional neural network (CNN) architecture using Keras, which is a widely-used CNN architecture for image recognition tasks.
Keras is an open-source deep learning library that provides a high-level API for building deep learning models. The keras.applications module provides a set of pre-trained deep learning models that can be used as a starting point for building custom models.
The VGG16 model consists of 16 layers and has been trained on the ImageNet dataset, which contains millions of images from thousands of categories. The model is composed of a stack of convolutional layers followed by a set of fully connected layers. The convolutional layers are used to extract features from the input image, while the fully connected layers are used for classification.
By importing the VGG16 model from Keras, we use this pre-trained model as a starting point for building a custom model for the face mask detection task. The VGG16 model is typically used as a feature extractor, where the output of the final convolutional layer is used as input to a set of fully connected layers that are trained for the specific task. In this code, the final layer of the VGG16 model is replaced with a single dense layer with a sigmoid activation function, which is trained for the face mask detection task.
from keras.applications.vgg16 import VGG16
Using TensorFlow backend.
vgg = VGG16()
2023-03-04 09:35:01.993304: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-04 09:35:01.995996: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
vgg.summary()
Model: "vgg16" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 224, 224, 3) 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 _________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 _________________________________________________________________ predictions (Dense) (None, 1000) 4097000 ================================================================= Total params: 138,357,544 Trainable params: 138,357,544 Non-trainable params: 0 _________________________________________________________________
from keras import Sequential
The Sequential model¶
It is a simple way to build neural networks that have a linear stack of layers. The Sequential model is appropriate for building feedforward neural networks, where the output of one layer is passed as input to the next layer in a sequential manner. It is not appropriate for building neural networks with multiple inputs or outputs, or with shared layers.
In this code, a Sequential model is used to create a custom model for the face mask detection task.
# Converting Functional VGG into Sequential VGG
model = Sequential()
The layers of the VGG16 model are added to a Sequential model created for the face mask detection task.
#Replacing the last Layer of VGG according to our classification problem
for layer in vgg.layers[:-1]:
model.add(layer)
model.summary()
Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 _________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 ================================================================= Total params: 134,260,544 Trainable params: 134,260,544 Non-trainable params: 0 _________________________________________________________________
# Freezing parameters
for layer in model.layers:
layer.trainable =False
model.summary()
Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 _________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 ================================================================= Total params: 134,260,544 Trainable params: 0 Non-trainable params: 134,260,544 _________________________________________________________________
from keras.layers import Dense
The VGG16 model is used as a starting point, and the final layer of the VGG16 model is replaced with a single dense layer with a sigmoid activation function, which is trained for the face mask detection task. The layers of the VGG16 model are added to the Sequential model using the add method. The Sequential model is then compiled and trained using the compile and fit methods.
model.add(Dense(1,activation='sigmoid'))
# 4096 +1 bias = Trainable parameters
model.summary()
Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 _________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 _________________________________________________________________ dense_1 (Dense) (None, 1) 4097 ================================================================= Total params: 134,264,641 Trainable params: 4,097 Non-trainable params: 134,260,544 _________________________________________________________________
model.compile(optimizer='Adam', loss = 'binary_crossentropy', metrics=['accuracy'])
Training¶
The model is trained using the binary cross-entropy loss function and the Adam optimizer. The model is trained for five epochs, and the validation data is used to evaluate the performance of the model. The train_test_split function from the scikit-learn library is used to split the dataset into training and testing sets.
model.fit(X_train,y_train, epochs=5, validation_data=(X_test,y_test))
Train on 6042 samples, validate on 1511 samples Epoch 1/5 6042/6042 [==============================] - 3510s 581ms/step - loss: 0.4381 - accuracy: 0.8128 - val_loss: 0.3335 - val_accuracy: 0.8511 Epoch 2/5 6042/6042 [==============================] - 3837s 635ms/step - loss: 0.2692 - accuracy: 0.9048 - val_loss: 0.2272 - val_accuracy: 0.9272 Epoch 3/5 6042/6042 [==============================] - 3840s 636ms/step - loss: 0.2181 - accuracy: 0.9220 - val_loss: 0.2011 - val_accuracy: 0.9285 Epoch 4/5 6042/6042 [==============================] - 3761s 622ms/step - loss: 0.1919 - accuracy: 0.9333 - val_loss: 0.1937 - val_accuracy: 0.9279 Epoch 5/5 6042/6042 [==============================] - 3670s 607ms/step - loss: 0.1812 - accuracy: 0.9351 - val_loss: 0.1618 - val_accuracy: 0.9431
<keras.callbacks.callbacks.History at 0x7fc401e61e50>
Saving the Model¶
The trained model is saved in the Hierarchical Data Format (HDF5) file format with the name saved_model.h5.
Saving the model is an essential step because we can reuse the trained model in the future without retraining it. Additionally, we can deploy the trained model on different devices or platforms.
The saved model contains the architecture of the model, its weights, and optimizer state, making it easy to reload the model for further training or inference. We can load the saved model using the load_model() method of the Keras library. The saved model file can also be converted to other formats, such as TensorFlow Lite, for use in mobile or edge devices.
model.save('/Users/surajbhardwaj/Desktop/project_fmd/data/saved_model.h5')
Real-time Face Mask Detection¶
Now let's implement a real-time face mask detector using our trained model and OpenCV.
The function detect_face_mask() takes an image as input, preprocesses it by resizing it to 224x224x3, predicts whether the face in the image is wearing a mask or not using the trained model, and returns the prediction.
The function draw_label() takes the input image, text to be drawn, position of the text, and background color as arguments. It draws a rectangle with the specified background color and text on the image at the specified position using OpenCV.
Face Detection¶
The haar object creates a Haar Cascade classifier object for face detection. The detect_face() function takes an image as input, converts it to grayscale, detects faces in the image using the Haar Cascade classifier, and returns the coordinates of the faces in the image.
The code captures video using the cv2.VideoCapture() method and iterates over each frame of the video. For each frame, the code resizes the image, detects faces using the detect_face() method, and predicts whether each face is wearing a mask or not using the detect_face_mask() method. If a face is detected, a rectangle is drawn around the face using the cv2.rectangle() method. A label is also drawn on the image using the draw_label() method to indicate whether the face is wearing a mask or not. Finally, the image is displayed in a window using the cv2.imshow() method. The code exits if the user presses the 'x' key.
def detect_face_mask(img):
y_pred = model.predict_classes(img.reshape(1,224,224,3))
return y_pred[0][0]
def draw_label(img,text,pos,bg_color):
text_size = cv2.getTextSize(text,cv2.FONT_HERSHEY_SIMPLEX,1,cv2.FILLED)
end_x = pos[0] + text_size[0][0] + 2
end_y = pos[1] + text_size[0][1] - 2
cv2.rectangle(img,pos,(end_x,end_y),bg_color,cv2.FILLED)
cv2.putText(img,text,pos,cv2.FONT_HERSHEY_SIMPLEX,1,(255,0,0),1,cv2.LINE_AA)
haar = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
def detect_face(img):
coords = haar.detectMultiScale(img) # Detecting face using HaarCascade Classifier
return coords
cap =cv2.VideoCapture(0,apiPreference=cv2.CAP_AVFOUNDATION)
#while cap.isOpened():
while True:
ret, frame = cap.read()
# call the detection method
img = cv2.resize(frame,(224,224))
y_pred = detect_face_mask(img)
#print(y_pred)
coords= detect_face(cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY))
for x,y,w,h in coords:
cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),2)
if y_pred == 0:
draw_label(frame, "Face mask", (30,30),(0,255,0))
else:
draw_label(frame, "No mask", (30,30),(0,0,255))
cv2.imshow("window", frame)
if cv2.waitKey(1) & 0xFF == ord('x'):
break
cap.release()
cv2.destroyAllWindows()
Conclusion¶
The project demonstrates the use of computer vision and machine learning techniques to detect face masks in real-time. The project utilizes the VGG16 model for feature extraction and the HaarCascade classifier for face detection. The project can be further improved by using other pre-trained models or by training a custom model on a larger dataset.