I in this blog, we'll discuss our two-phase face mask detector, detailing how our computer vision and deep learning pipeline will be implemented.
From there, we'll review the dataset we'll be using to train our face mask detector.
Then I will show you how to use the IPython script to train a face mask detector with our dataset using Keras and Tensorflow.
We'll use the jupyter notebook to review our dataset, train a face mask detector, and review the results.
Given the trained face mask detector, we'll proceed to implement two more additional python scripts used to:
Detect face mask in images
Detect face mask in real-time video streams
We'll wrap up the post be looking a the results of applying our face mask detector.
Two-phase face mask detector
In order to train the face mask detector, we need to break our project into two distinct phases, each with its own respective sub-steps :
Training: Here we’ll focus on loading our face mask detection dataset from disk, training a model (using Keras/TensorFlow) on this dataset, and then serializing the face mask detector to disk
Deployment: Once the face mask detector is trained, we can then move on to loading the mask detector, performing face detection, and then classifying each face as with_mask or without_mask
we'll review each of these phases and associated subjects in detail in the remainder of this blog, but in the meantime, let's take a look at the dataset we'll be using to train our face mask detector.
Our face mask detection dataset
The dataset we'll be using here today was created by wobot intelligence available on kaggle.
The dataset consists of 6000+ images belonging to two classes:
with_mask: 4000 images
without_mask: 2000 images
You can download the dataset from kaggle click here
Our goal is to train a deep learning model to detect whether a person is or not wearing a mask.
Once you download the dataset in the download section, extract the dataset to your project directory, you will be presented with the following directory structure:
. ├── face-mask-detection-dataset │ ├── Medical mask | | └── Medical mask | | └── Medical mask | | ├── annotations | | └── images | | │ ├── submission.csv │ └── train.csv | ├── EDA_and_train_model.ipynb ├── detect_mask_image.py ├── detect_mask_video.py ├── mask_detector.model └── test_image.jpg
The face-mask-detection-dataset/ directory contains the data described in the "Our face mask detection dataset" section.
we have to consider one image example so that you can test the static image face mask detector.
we'll be reviewing three python scripts in this blog:
Now let's get started
Visualizing our face mask detector dataset using NumPy, Pandas, and Matplotlib
Let's begin with our first step of importing packages, data loading and analyzing the data
import visualization libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.patches as patches import os
import all the necessary libraries and packages like numpy, pandas, and matplotlib for data analysis and visualization purposes of data. Pandas package is used to load, visualize, and manipulate CSV data in the dataset. Matplotlib is used to plot and visualize all the data from the dataset.
images = os.path.join('/face-mask-detection-dataset/Medical mask/Medical mask/Medical Mask/images') train=pd.read_csv(os.path.join("/face-mask-detection-dataset/train.csv"))
Load all the image directories which we'll be using for training our detector model. Also, load the CSV file in which the labels are given for the supervised learning of our model.
check the CSV file using pandas head() function and analyze the table. check whether the file contains null values or not, data is given in which format, how many classes are given, which classes are useful and which are not, whether the feature given is useful or not, etc. And then proceed with the data for further processing.
As we need only faces with mask and without the mask
options=['face_with_mask','face_no_mask'] train= train[train['classname'].isin(options)] train.sort_values('name',axis=0,inplace=True)
When we came to know after analyzing data, there are some classes that we don't need for our model as we are creating the model which detects whether the person is or not wearing the face mask. So, we have to take only those classes naming 'face_with_mask' and 'face_no_mask' which we actually need for the model.
Creating training data
Now let's starts to convert our raw data into useful data which we'll be using to our model for training.
data =  img_size = 50 path='/face-mask-detection-dataset/Medical mask/Medical mask/Medical Mask/images/'
for i in range(len(train)): arr =  for j in train.iloc[i]: arr.append(j) img_array = cv2.imread(os.path.join(images, arr), cv2.IMREAD_GRAYSCALE) crop_image = img_array[arr:arr, arr:arr] new_img_array = cv2.resize(crop_image, (img_size, img_size)) data.append([new_img_array, arr])
Here we comprised all the data in a single variable that contains image data with labels and image coordinates for cropping faces in images for training.
Separate training data and labels
x= y= for features, labels in data: x.append(features) y.append(labels) from sklearn.preprocessing import LabelEncoder lbl=LabelEncoder() y=lbl.fit_transform(y)
Here we separated the dependent and independent variables and label encode the class name using LabelEncoder class.
an independent variable consists of image data and the dependent variable consists of Labels of image data for training our face mask detector model.
Convert data into numpy array
x=np.array(x).reshape(-1,50,50,1) x=tf.keras.utils.normalize(x,axis=1) from tensorflow.keras.utils import to_categorical y = to_categorical(y)
Now convert the image data into a standard form using feature scaling by normalizing it to improve our model accuracy. Also, our labels are converted to categorical form as per the TensorFlow requirement.
Training Face Mask Detection Model using TensorFlow/Keras
Now let's come to most interesting part of Artificial Intelligence and Machine Learning that is training our model and make it ready to make predictions over inputs.
import TensorFlow libraries
import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D from tensorflow.keras.layers import Dense from tensorflow.keras.layers import MaxPooling2D from tensorflow.keras.layers import Flatten from tensorflow.keras.optimizers import Adam
We are using the TensorFlow library for this model to train so import all the required packages from TensorFlow and proceed to the model building.
model = keras.models.Sequential([ Conv2D(100,(3, 3),input_shape=x.shape[1:],activation='relu'), MaxPooling2D(pool_size=(2, 2)), Conv2D(64, (3, 3), activation='relu'), MaxPooling2D(pool_size=(2, 2)), Flatten(), Dense(50, activation='relu'), Dropout(0.2), Dense(2, activation='softmax') ])
In our training model of TensorFlow, we have taken a total of four layers of which one is an input layer of input size (50,50) and activation function as 'relu' and one output layer using Dense class with activation function 'softmax' with output dimension of two as we have two classes for prediction.
opt = tf.keras.optimizers.Adam(lr=1e-3, decay=1e-5) model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy']) history = model.fit(x,y,epochs=30,batch_size=5)
we have to define an optimizer for our model and here we'll be using Adam optimizer class to compile our model. Then compile the model with adam optimizer and categorical_crossentropy loss function with accuracy metrics for evaluation of model while training.
finally, we'll start to train our model with fit function passing training data and labels with 30 epochs and batch_size of 5 images per step.
plot training loss and accuracy
N = 30 H = history plt.style.use("ggplot") plt.figure() plt.plot(np.arange(0, N), H.history["loss"], label="train_loss") plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc") plt.title("Training Loss and Accuracy") plt.xlabel("Epoch #") plt.ylabel("Loss/Accuracy") plt.legend(loc="lower left")
By plotting the training graph of loss and accuracy we are able to visualize how our model training, whether the loss is increasing or decreasing, how much accuracy it is showing, and whether our model is overfitting or underfitting. So we can do some changes in the model to improve our accuracy.
save our trained model to use it in our platform based application like in android or web applications. Further, we are going to use this model in real-time face mask detection.
Implementing our face mask detector for images with OpenCV
Now that our face mask detector is trained, let’s learn how we can:
Load an input image from disk
Detect faces in the image
Apply our face mask detector to classify the face as either with_mask or without_mask
Open up the detect_mask_image.py file in your directory structure, and let’s get started:
complete source code is available here
import cv2 import numpy as np import tensorflow as tf from tensorflow.keras.models import load_model
Import pre-trained models
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') classifier = load_model('model.h5')
we need to recognize the face of the person to detect the mask on the face, so load the frontalface_harcascade file to detect the faces.
To download the haar cascade XML file click here
load the haar cascade file using the cv2 CascadeClassifier class by passing just the name of the file.
Also, load our trained face mask detection model using load_model class from TensorFlow.
def recognize(img): img = cv2.resize(img, (50, 50)) x = np.array(img).reshape(-1, 50, 50, 1) x = tf.keras.utils.normalize(x, axis=1) pred = classifier.predict(x) pred = np.argmax(pred, axis=1) sol = None if pred == 1: sol = 'No Mask' elif pred == 0: sol = 'Mask' return sol
This function takes an input of an image array of cropped faces and returns the prediction of the model whether the person is or not wearing the mask and gives the text Mask or No Mask.
# take a sample image for testing the implementation frame = cv2.imread('test_image.png') gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) faces = face_cascade.detectMultiScale(gray, 1.3, 5) for face in faces: x1, y1, width, height = face x2, y2 = x1 + width, y1 + height crop_image = gray[y1:y2, x1:x2] res = recognize(crop_image) cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 0, 255), 2) cv2.putText(frame, str(res), (x1 + 5, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) cv2.imshow('face detection- Haarcascade', frame) cv2.waitKey(0) cv2.destroyAllWindows()
Implementing our face mask detector in real-time video streams with OpenCV
At this point, we know we can apply face mask detection to static images — but what about real-time video streams?
Let’s find out.
Open up the detect_mask_video.py file in your directory structure, and insert the following code:
complete source code is given HERE
import cv2 import numpy as np import tensorflow as tf from tensorflow.keras.models import load_model face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') classifier = load_model('model_new.h5') vc = cv2.VideoCapture(0) def recognize(img): img = cv2.resize(img, (50, 50)) x = np.array(img).reshape(-1, 50, 50, 1) x = tf.keras.utils.normalize(x, axis=1) pred = classifier.predict(x) pred = np.argmax(pred, axis=1) sol = None if pred == 1: sol = 'Mask' elif pred == 0: sol = 'No Mask' return sol while True: _, frame = vc.read() frame = cv2.flip(frame, 1) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) faces = face_cascade.detectMultiScale(gray, 1.3, 5) for face in faces: x1, y1, width, height = face x2, y2 = x1 + width, y1 + height crop_image = gray[y1:y2, x1:x2] res = recognize(crop_image) if res == 'Mask': cv2.rectangle(frame, (x1,y1), (x2,y2), (0,255,0), 2) elif res == 'No Mask': cv2.rectangle(frame, (x1,y1), (x2,y2), (0,0,255), 2) cv2.putText(frame, str(res), (x1 + 5, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) cv2.imshow('face detection- Haarcascade', frame) if cv2.waitKey(1) & 0xFF == ord('q'): break vc.release() cv2.destroyAllWindows()
Live video stream:-
Here, you can see that our face mask detector is capable of running in real-time (and is correct in its predictions as well).
Suggestions for improvement
As you can see from the results sections above, our face mask detector is working quite well despite:
Having limited training data
The with_mask class being artificially generated (see the “How was our face mask dataset created?” section above).
To improve our face mask detection model further, you should gather actual images (rather than artificially generated images) of people wearing masks.
While our artificial dataset worked well in this case, there’s no substitute for the real thing.
Secondly, you should also gather images of faces that may “confuse” our classifier into thinking the person is wearing a mask when in fact they are not — potential examples include shirts wrapped around faces, bandana over the mouth, etc.
All of these are examples of something that could be confused as a face mask by our face mask detector.
Finally, you should consider training a dedicated two-class object detector rather than a simple image classifier.
Our current method of detecting whether a person is wearing a mask or not is a two-step process:
Step #1: Perform face detection
Step #2: Apply our face mask detector to each face
The problem with this approach is that a face mask, by definition, obscures part of the face. If enough of the face is obscured, the face cannot be detected, and therefore, the face mask detector will not be applied.
To circumvent that issue, you should train a two-class object detector that consists of a with_mask class and without_mask class.
Combining an object detector with a dedicated with_mask the class will allow improvement of the model in two respects.
First, the object detector will be able to naturally detect people wearing masks that otherwise would have been impossible for the face detector to detect due to too much of the face being obscured.
Secondly, this approach reduces our computer vision pipeline to a single step — rather than applying face detection and then our face mask detector model, all we need to do is apply the object detector to give us bounding boxes for people both
with_mask and without_mask in a single forward pass of the network.
Not only is such a method more computationally efficient, but it’s also more “elegant” and end-to-end.