Hairnet and Mask Detection for Hygiene Compliance

ganesh90
1 day ago
8 min read

Updated: 10 hours ago

Introduction

In many industries, particularly in food processing, healthcare, and pharmaceuticals, personal protective equipment such as hairnets and face masks is essential for maintaining hygiene and safety standards. Failure to wear this protective gear can lead to contamination, health hazards, and violations of regulatory guidelines. However, manually monitoring every individual to ensure compliance is neither efficient nor always reliable.

This is where artificial intelligence based hairnet and mask detection systems provide a powerful solution. In this blog, we will discuss hairnet and mask detection and implement code to detect them.

Problem Statement

Food safety violations can devastate a restaurant's reputation and put customers at serious risk. According to health authorities, improper use of protective equipment like masks and hairnets is among the most common violations during health inspections. These seemingly simple requirements play a crucial role in preventing food contamination and ensuring customer safety.

Key challenges restaurants face:

Staff not consistently following hygiene equipment requirements
Difficulty monitoring multiple staff members simultaneously
Human oversight limitations during busy periods
Documentation challenges for compliance reporting

How AI Detection Systems Work

Modern AI systems use computer vision technology to automatically detect whether kitchen staff are wearing required safety equipment. Using advanced machine learning models like YOLO (You Only Look Once), these systems can identify:

Mask compliance: Detecting proper face mask usage
Hairnet compliance: Ensuring hair is properly covered
Real-time monitoring: Continuous surveillance without human intervention
Violation alerts: Immediate notifications when compliance issues occur

Implementation Benefits for Restaurants

Enhanced Food Safety

Automated monitoring ensures consistent compliance with safety protocols, significantly reducing the risk of food contamination incidents.

Operational Efficiency

Restaurant managers can focus on core operations while the AI system handles continuous monitoring, freeing up human resources for more strategic tasks.

Compliance Documentation

The system automatically generates compliance reports, making health inspections smoother and providing clear documentation of safety practices.

Cost Reduction

Preventing even one food safety incident can save thousands in potential lawsuits, health department fines, and reputation damage.

Staff Training Support

Visual feedback helps reinforce proper safety protocols and identifies staff members who may need additional training.

Practical Applications

Hairnet and mask detection is already in use in many organizations:

Food processing plants use it to comply with food safety regulations
Hospitals monitor surgical and medical staff to ensure hygiene
Pharmaceutical companies employ it to maintain sterile environments
Public safety agencies use mask detection during outbreaks or health emergencies

Dataset Used

For this project, we will use a dataset containing images of hairnets and masks. The dataset includes 107 images for the training set and 10 images for the validation set. It contains four labels:

NO mask – Person without a face mask
No Hairnet – Person without a hairnet
Hairnet – Person wearing a hairnet
Mask – Person wearing a face mask

The dataset is available on this page: https://universe.roboflow.com/pawan-shejwal-6scix/hairnet-and-mask-detection/dataset/1

Download the dataset compatible with YOLOv5 or later versions.

Implementations

Install and Import the Libraries

First, you will need to install the Ultralytics library. You can do this using the following commands:

In the terminal:

pip3 install ultralytics

In a notebook:

!pip3 install ultralytics

Then import all necessary modules:

import os
import yaml
from ultralytics import YOLO
import numpy as np
import glob

The ultralytics library provides the YOLO implementations, making object detection much more accessible than previous versions.

Creating the Dataset Configuration

YOLO requires a YAML configuration file that tells it where to find training data and what classes to detect:

def create_dataset_yaml(dataset_path, output_yaml_path):
    """Create the dataset.yaml file for the detection"""
    
    abs_dataset_path = os.path.abspath(dataset_path)
    
    dataset_config = {
        'path': abs_dataset_path,
        'train': 'train/images',
        'val': 'valid/images',
        'names': {
            0: 'NO mask',
            1: 'NoHairnet', 
            2: 'hairnet',
            3: 'mask'
        }
    }
    
    with open(output_yaml_path, 'w') as file:
        yaml.dump(dataset_config, file)

This function creates a configuration file that maps class IDs (0-3) to human-readable names and specifies where training and validation images are located.

The Training Function

The heart of the program is the training function that handles all the machine learning complexity:

def train_detection_model(
    dataset_yaml_path,
    output_dir="runs/train/detection_model",
    epochs=100,
    batch_size=16,
    image_size=640,
    model_size='s',
    patience=20,
    device=None
):

Key Parameters Explained:

epochs: How many times the model sees the entire dataset (100-200 is typical)
batch_size: How many images processed together (reduce if you get memory errors)
model_size: 'n' (nano, fastest), 's' (small), 'm' (medium), 'l' (large), to 'x' (extra large, most accurate), depending on your speed, accuracy, and resource requirements.
patience: Early stopping - stops training if no improvement for 20 epochs
dataset_yaml_path: Required parameter - path to the YAML config file we created earlier
output_dir: Where to save training results and model weights
patience: Early stopping mechanism - stops training if validation metrics don't improve for this many epochs
device: Hardware selection - None lets YOLO auto-detect GPU availability

Model Training Process

The actual training happens in just a few lines:

# Load a pre-trained model
model = YOLO(f'yolov5{model_size}.pt')

# Train the model
results = model.train(
    data=dataset_yaml_path,
    epochs=epochs,
    imgsz=image_size,
    batch=batch_size,
    name=os.path.basename(output_dir),
    patience=patience,
    device=device if device else None,
    save=True
)

It will download the model automatically.

Parameters Mapping:

data: Path to YAML config
imgsz: Image size (note the abbreviated parameter name)
batch: Batch size
name: Experiment name for organizing results

Main Execution Flow

The main execution follows this logical sequence:

Validate dataset structure and count files
Create YAML configuration file
Train the model with specified parameters
Save the best performing model

print(f"Dataset summary:")
print(f"  - Training: {train_images} images, {train_labels} labels") 
print(f"  - Validation: {val_images} images, {val_labels} labels")

# Create dataset YAML
create_safety_dataset_yaml(dataset_path=DATASET_PATH, output_yaml_path=DATASET_YAML_PATH)

# Train the model  
best_model_path = train_detection_model(
    dataset_yaml_path=DATASET_YAML_PATH,
    epochs=200,
    batch_size=16,
    model_size='s'
)

Detection

Function Definition and Parameters

def detect_object_in_video(
    model_path, 
    video_path, 
    output_path=None, 
    conf_threshold=0.3,
    iou_threshold=0.45,
    show_fps=True,
    show_labels=True
):
    """
    Perform detection of safety equipment in videos with bounding boxes and labels
    """

Key Parameters Explained:

model_path: File path to the trained .pt model from our training phase
video_path: Input video file path (supports common formats: MP4, AVI, MOV, etc.)
output_path: Where to save the processed video with annotations
conf_threshold (0.3): Minimum confidence score to consider a detection valid
- Lower values (0.1-0.3): More detections, potentially more false positives
- Higher values (0.5-0.8): Fewer detections, higher precision
iou_threshold (0.45): Controls Non-Maximum Suppression for overlapping detections
- Lower values: More aggressive suppression, fewer overlapping boxes
- Higher values: Less suppression, potential duplicate detections
show_fps/show_labels: Boolean flags for visual display options

Video Properties and Configuration

# Get video properties
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

print(f"Video properties: {frame_width}x{frame_height} at {fps} FPS, {total_frames} total frames")

# Updated class names, colors, and violation detection settings
class_names = ['NO mask', 'NoHairnet', 'hairnet', 'mask']
colors = {
    'NO mask': (0, 0, 255),     # Red - indicates violation
    'NoHairnet': (0, 69, 255),  # Orange - indicates violation  
    'hairnet': (255, 255, 0),   # Cyan - compliant equipment
    'mask': (0, 255, 0)         # Green - compliant equipment
}

Code Analysis:

Video Properties: OpenCV provides metadata through cap.get() with property constants
- CAP_PROP_FRAME_WIDTH/HEIGHT: Pixel dimensions
- CAP_PROP_FPS: Frames per second of original video
- CAP_PROP_FRAME_COUNT: Total number of frames for progress tracking
Class Configuration: Maps model output IDs to human-readable names (must match training YAML)
Color Coding: BGR format (Blue, Green, Red) used by OpenCV
- Red/Orange: Violation states (missing equipment)
- Green/Cyan: Compliance states (equipment present)
Strategic Choices: Colors provide instant visual feedback for safety personnel

Core Detection Loop and Frame Processing

# Process video frames
while cap.isOpened():
    # Read frame
    success, frame = cap.read()
    if not success:
        print("End of video or error reading frame")
        break
    
    # Perform detection
    start_time = time.time()
    results = model(frame, conf=conf_threshold, iou=iou_threshold)
    processing_time = time.time() - start_time

    total_processing_time += processing_time
    frame_count += 1

Code Analysis:

Video Loop: while cap.isOpened() continues until video ends or encounters error
Frame Reading: cap.read() returns tuple: (success_flag, frame_array)
Performance Timing:
- time.time() captures high-precision timestamp
Model Inference: model(frame, ...) runs YOLO detection with our specified thresholds
- Returns detection results with bounding boxes, confidence scores, and class predictions
Metrics Tracking: Accumulates statistics for final performance report

Detection Results Processing and Visualization

# Process results
detections = []

for r in results:
    boxes = r.boxes
    
    for box in boxes:
        # Get box coordinates
        x1, y1, x2, y2 = box.xyxy[0]
        x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
        
        # Get confidence and class
        conf = float(box.conf[0])
        cls = int(box.cls[0])
        class_name = class_names[cls]
        
        detections.append({
            'class': class_name,
            'conf': conf,
            'bbox': (x1, y1, x2, y2)
        })
        
        # Draw bounding box
        color = colors[class_name]
        cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
        
        # Create label with class name and confidence
        if show_labels:
            label = f"{class_name}: {conf:.2f}"
            
            # Calculate text size
            (text_width, text_height), _ = cv2.getTextSize(
                label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)
            
            # Draw background rectangle for text
            cv2.rectangle(frame, (x1, y1 - text_height - 10), 
                        (x1 + text_width + 10, y1), color, -1)
            
            # Add text
            cv2.putText(frame, label, (x1 + 5, y1 - 5), 
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)

Code Analysis:

Results Structure: YOLO returns list of detection objects, each containing boxes
Coordinate Extraction: box.xyxy[0] gives (x1, y1, x2, y2) format bounding box coordinates
Data Type Conversion: Convert tensor values to standard Python integers/floats
Detection Storage: Creates structured list for downstream compliance analysis
Bounding Box Drawing: cv2.rectangle() draws colored boxes around detected objects
- Thickness of 2 pixels provides good visibility without overwhelming
Label Creation: Combines class name with confidence score for informative display
Text Sizing: cv2.getTextSize() calculates label dimensions for proper background sizing

Main Execution and Configuration

if __name__ == "__main__":
    # Set your paths and parameters here
    MODEL_PATH = "models/best_hairnet_mask_combined.pt"
    VIDEO_PATH = "test_image_01.mp4"
    OUTPUT_PATH = "outputs/best_yolov5s_hairnet_mask_combined.mp4"
    
    # Detection parameters
    CONF_THRESHOLD = 0.5  # Confidence threshold (0-1)
    IOU_THRESHOLD = 0.45  # IOU threshold for NMS (0-1)
    
    # Run detection
    detect_object_in_video(
        model_path=MODEL_PATH,
        video_path=VIDEO_PATH,
        output_path=OUTPUT_PATH,
        conf_threshold=CONF_THRESHOLD,
        iou_threshold=IOU_THRESHOLD,
        show_fps=True,
        show_labels=True
    )

Code Analysis:

Parameter Tuning: Confidence and IOU thresholds can be adjusted based on use case:
- High Security: Lower confidence (0.2) to catch more potential violations
- Low False Alarms: Higher confidence (0.5) for fewer false positives

Result

Full code is available at the following link:

Insights

Here are the key insights about hairnet and mask detection performance:

Top Performing Model

Best Custom Model: YOLOv5s baseline shows strong performance with 0.784 precision, 0.73 recall, and 0.809 mAP50, making it the most reliable custom solution.

Model Performance Comparison

YOLOv8s Issues: Despite high precision (0.926), it frequently misidentifies hairnets as "no hairnet," indicating poor real-world reliability.
YOLOv11 Variants: Both YOLOv11s and YOLOv11m show consistent performance around 0.75-0.82 precision but don't surpass the YOLOv5s baseline.

Data Augmentation and Enhancement Results

Limited Improvement: Data augmentation from 107 to 856 training samples didn't improve performance over the baseline model.
Image Processing Techniques: Various transformations (sharpening, cartoon filters, adaptive threshold) failed to enhance detection accuracy meaningfully.
Multi-Dataset Training: Combining multiple datasets for comprehensive mask and hairnet detection resulted in lower precision (0.368-0.442) and recall (0.474-0.579). One reason could be that the datasets might not be of good quality.

Full insights are available at the following link:

Get Help When You Need It

Hairnet and mask detection projects can become complex, especially when dealing with real time video processing, model optimization, or integrating the system into existing restaurant management platforms. Do not hesitate to seek assistance when facing challenges with dataset preparation, model fine tuning, or deployment in production environments.

If you are looking to implement hairnet and mask detection and need personalized guidance, CodersArts specializes in helping both students and enterprises with computer vision implementations. For students working on academic projects, their team can provide advice on YOLO model customization, video processing optimization, and research methodology. For enterprises seeking production ready solutions, they offer consultation on system integration, scalability planning, cloud deployment strategies, and custom development tailored to your specific food safety compliance requirements and business operations.

Visit www.codersarts.com or contact us at contact@codersarts.com.