Facial Expression Recognition Using Deep Learning
University of Science and Technology of Hanoi (USTH)
December 2024
Research Overview
01
Introduction
Problem statement and objectives for emotion classification
02
Methodology
Data processing pipeline and model architectures
03
Results
Performance metrics and comparative analysis
04
Conclusion
Key findings and future directions
Problem Statement
Objective: Classify facial expressions into 7 emotion categories using Convolutional Neural Networks (CNNs) on the Cohn-Kanade (CK+) dataset.
The Seven Emotion Classes:
  • Neutral
  • Anger
  • Disgust
  • Fear
  • Happiness
  • Sadness
  • Surprise
Data Processing Pipeline
Load CK+
Import dataset images
Crop Faces
Using landmarks detection
Resize
64×64 pixels
Normalize
Scale pixel values
Split Data
70% train / 30% test
Handling Class Imbalance
The Problem
Imbalanced dataset causes model to predict majority class. Neutral expressions dominate at 64.2% of the dataset, creating significant bias.
The Solution
Compute class weights inversely proportional to class frequency using the formula:
w_i = \frac{N}{C \times n_i}
Where wi = weight for class i, N = total samples, C = number of classes, ni = samples in class i
Computed Weights (CK+ Dataset)
Baseline Model: AAM + SVM
Active Appearance Model
Statistical model combining shape and texture analysis with PCA-based dimensionality reduction
SVM Classifier
Support Vector Machine for final classification decision
0.990
AUC Score
93.8%
Accuracy
93.4%
F1 Score
93.7%
Precision
CNN Architecture: Modified LeNet-5
Layer Structure
Total Parameters: 6,429,577
Optimizer: Adam (learning rate = 0.0005)
LeNet-5 Training Performance
Training Accuracy
97.94%
Excellent learning on training data
Test Accuracy
92.99%
Strong generalization to unseen data
Validation Trend
Follows training closely, indicating proper learning without severe overfitting
Model Performance Comparison
Results Summary
Why CNN V3 Failed
  • Too many parameters (∼1.5M)
  • CK+ dataset too small (631 training samples)
  • Severe overfitting
Why LeNet-5 Succeeded
  • Modernized architecture with ReLU activation
  • Dropout regularization (0.25, 0.5)
  • Good train/validation balance
Key Findings from Emotion Recognition
Architecture Design Matters
For small datasets like CK+ (631 training samples), proper architecture design prevents overfitting and ensures generalization.
LeNet-5 Performance
Achieved 92.99% test accuracy, comparable to AAM+SVM baseline (93.8%), demonstrating CNN effectiveness.
Overfitting Prevention
Key factors include appropriate dropout rates (0.25, 0.5), proper learning rate selection, and regularization techniques.
Exercise 5: Identity Recognition Challenge
New Objective
Classify faces by subject identity (not emotion) - a fundamentally different challenge with unique complexities.
Key Differences from Emotion Recognition
  • More classes: 123 subjects vs 7 emotions
  • Variation: Same person with different expressions
  • Similarity: Different people may look alike
  • Label extraction: From filename pattern S(+.)(+.)(+.)
Dataset Statistics
920
Total Samples
644
Training
276
Testing
123
Subjects
Average samples per subject: 5.2 - significantly limited data per class
Architecture Evolution: Emotion vs Identity
Exercise 4: Emotion Recognition
Architecture (LeNet-based):
  • Conv2D(20, 5×5) + ReLU + MaxPool + Dropout(0.25)
  • Conv2D(50, 5×5) + ReLU + MaxPool + Dropout(0.25)
  • Flatten → Dense(500) + ReLU + Dropout(0.5)
  • Dense(7) + Softmax
Task Characteristics:
  • 7 classes (emotions)
  • ∼100+ samples per class
  • Lower class imbalance
  • Same person, different expressions
Exercise 5: Identity Recognition
Architecture (Enhanced LeNet):
  • Conv2D(20, 5×5) + ReLU + MaxPool + Dropout(0.25)
  • Conv2D(50, 5×5) + ReLU + MaxPool + Dropout(0.25)
  • Conv2D(100, 3×3) + ReLU + MaxPool + Dropout(0.3) ← Extra layer!
  • Flatten → Dense(500) + ReLU + Dropout(0.5)
  • Dense(123) + Softmax
Task Characteristics:
  • 123 classes (subjects)
  • ∼5.2 samples per class
  • Higher complexity
Identity Recognition: Outstanding Results
95%
Test Accuracy
Top-1 classification performance
92%
Precision
Macro average across all classes
93%
Recall
Macro average detection rate
94%
F1-Score
Weighted average performance
Key Achievement: The model achieved 95% accuracy on identity recognition and generalizes remarkably well despite limited samples per subject (5.2 average). This demonstrates the effectiveness of the enhanced architecture with three convolutional blocks.
Conclusions & Future Directions
Emotion Recognition Success
Modified LeNet-5 achieved 92.99% test accuracy on 7-class emotion classification, matching traditional AAM+SVM baseline performance.
Identity Recognition Excellence
Successfully adapted emotion recognition CNN for identity recognition, achieving 95% test accuracy on challenging 123-class problem with limited data.
Architecture Insights
LeNet-inspired architecture with 3 convolutional blocks proved highly effective. Class weighting and regularization crucial for handling imbalanced, limited data.
Key Takeaway
Proper architecture design, regularization techniques, and class balancing strategies enable CNNs to excel even with small datasets.

Thank You!
Questions?