ResNet50 Detects Morphed Identity Documents with 93% Accuracy Using Transfer Learning

deep learning

computer vision

transfer learning

classification

identity verification

security

Published

February 2026

Executive Summary

Problem: Morphing attacks pose a serious threat to passport issuance systems. By blending two faces into a single image, attackers create documents that pass automated face recognition checks for both individuals – enabling identity fraud at borders and security checkpoints. Detecting these attacks reliably requires a model that can identify subtle blending artifacts invisible to most human reviewers.

Approach: A binary classifier was trained to distinguish genuine face images from alpha-blended morphs using transfer learning on a pre-trained ResNet50 architecture. Five hundred faces from the Labeled Faces in the Wild dataset were downloaded, 250 morphed images were generated via 50/50 alpha blending, and the classifier was fine-tuned on the resulting balanced dataset. Grad-CAM visualizations were used to interpret model decision-making and identify failure modes.

Insights: The classifier achieved 93% test accuracy with 98% precision and 89% recall on morphed images, correctly identifying all 46 genuine faces while missing 6 of 54 morphs. Grad-CAM analysis confirmed the model attends to blending artifacts at face edges and texture boundaries rather than spurious background features – but also revealed systematic failures with blurred images, angled faces, and heavy facial hair. Performance is comparable to published academic benchmarks for simple alpha-blended morphs (90–95%), though performance would be expected to degrade substantially against more sophisticated landmark-based or GAN-generated attacks.

Significance: Morphing attack detection is an active area of applied computer vision with direct relevance to border security and identity fraud prevention. This project demonstrates that transfer learning with a relatively small dataset (300 training images) can match academic benchmarks for simple morphs and produce a Grad-CAM-interpretable model suitable for secondary screening workflows. The threshold analysis and tiered deployment recommendation translate directly to operational security system design.

Key Findings

ResNet50 with transfer learning achieved 93% test accuracy, 98% precision, and 89% recall on morphed images with only 300 training images and 10 epochs of fine-tuning on a single GPU.
The model correctly identified all 46 genuine faces (100% recall) while missing 6 of 54 morphs – a false negative rate of 11%, operationally appropriate for secondary screening but not for a primary automated gate.
AUC of 0.9935 confirms near-perfect class separation; threshold adjustments can trade false negatives for false positives depending on deployment context.
Grad-CAM confirmed the model attends to blending artifacts at face edges and skin texture boundaries – not background features – validating the model’s interpretability for operational use.
Systematic failure modes identified: blurred images, faces angled more than 15 degrees, and heavy facial hair all reduce detection reliability.

Applied Findings

Transfer learning on ResNet50 achieves academic-benchmark accuracy for alpha-blended morph detection with a small training dataset, demonstrating viability for rapid prototyping of document security classifiers.
The 11% morph miss rate is acceptable for secondary screening within a tiered system but not for primary automated gate deployment without human review.
Grad-CAM explainability is operationally valuable – it identifies failure modes and supports image quality standards (minimum resolution, frontal pose, sharpness thresholds) that can be implemented upstream.
Performance should be re-evaluated against landmark-based and GAN-generated morphs before any production deployment; alpha-blended morphs represent the easiest detection case.

Research Question

Can a ResNet50-based binary classifier trained on a small dataset of genuine and alpha-blended morphed face images reliably detect morphing attacks – and what does Grad-CAM analysis reveal about the model’s decision-making and failure modes?

Research Answers

Classification Performance

The classifier achieved 93% test accuracy on the held-out test set. Precision on morphed images was 98% – when the model flags an image as morphed, it is correct 98% of the time. Recall on morphed images was 89% – the model detects 89% of actual morphs. All 46 genuine faces in the test set were correctly classified (100% genuine recall), and 48 of 54 morphs were detected, with 6 missed.

Table 1. Confusion Matrix – ResNet50 Morph Classifier

	Predicted Genuine	Predicted Morphed
Actual Genuine	46	0
Actual Morphed	6	48

Interpretation: The model is conservative toward genuine faces – it never false-alarms on a legitimate photo. This asymmetry is operationally appropriate for passport issuance, where falsely rejecting a genuine applicant creates inconvenience but falsely accepting a morphed document creates a security breach. The 11% miss rate on morphs would be unacceptable for a high-security primary gate but is workable for secondary screening where human review catches the remainder.

Training and Convergence

Figure 1. Training and Validation Loss and Accuracy Curves

Interpretation: Both training and validation loss decrease steadily across 10 epochs, with validation accuracy plateauing around 93% after epoch 5. The small gap between training and validation curves indicates good generalization with no evidence of overfitting. Rapid convergence reflects the power of transfer learning – ResNet50’s pre-trained ImageNet weights already encode facial feature representations that transfer directly to morph artifact detection.

Discrimination Ability

Figure 2. ROC Curve – ResNet50 Morph Classifier

Interpretation: AUC of 0.9935 indicates near-perfect class separation. At the default threshold of 0.5, the model achieves 89% true positive rate with 0% false positive rate. Lowering the threshold to 0.3 would increase morph detection to approximately 95% but introduce a 5% false alarm rate on genuine faces. Raising the threshold to 0.7 would reduce detection to approximately 80% while keeping false alarms below 1%. The appropriate operating point depends on deployment context – primary gate versus secondary screening – and should be set with domain input from security operations.

What the Model Learned

Figure 3. Grad-CAM Attention Maps – Genuine and Morphed Faces

Interpretation: For correctly classified genuine faces, the model attends to natural high-contrast facial landmarks – eyes, nose, and mouth – with attention distributed naturally across the face. For correctly detected morphs, attention shifts to face and hair boundaries and skin texture regions where alpha blending creates unnatural smoothness or discontinuities. This pattern confirms the model has learned to detect morphing artifacts rather than relying on irrelevant image features. Three systematic failure modes were identified: blur (the morphing process adds smoothing, making blurred genuine images resemble morphs), angled faces (profile and off-angle poses reduce the frontal facial features available for analysis), and heavy facial hair (beard texture complexity can mask blending artifacts).

Example Images

Figure 4. Genuine vs. Morphed Face Examples

Interpretation: Top row shows genuine faces from the Labeled Faces in the Wild dataset; bottom row shows 50/50 alpha-blended morphs of randomly selected face pairs. Even to human reviewers, some morphs are difficult to identify – particularly when source faces share similar skin tone, age, and frontal alignment. This illustrates why automated detection is necessary and why image quality standards upstream of the classifier matter: well-matched source faces produce the hardest-to-detect morphs.

Comparison to Published Benchmarks

The 93% detection rate is within the 90–95% range reported in academic literature for simple alpha-blended morphs, confirming the classifier performs at benchmark level. Performance degrades substantially with more sophisticated attack types: landmark-based morphs achieve 75–85% academic detection rates (expected 65–75% for this model), and GAN-generated morphs achieve 60–70% academic detection rates (expected 50–60% for this model). Operational deployment must be tested against the attack types adversaries actually use – not only simple alpha blending.

Next Steps

Adversarial robustness testing: Evaluate classifier performance against landmark-based and GAN-generated morphs, which represent more sophisticated and operationally realistic attacks.
Demographic fairness analysis: Assess detection rates across demographic groups – skin tone, age, sex – to identify disparate performance before any deployment.
Image quality pre-filtering: Implement upstream quality checks (minimum 600x600 pixels, frontal pose within ±10 degrees, sharpness threshold, lighting uniformity) to reduce failure modes identified by Grad-CAM.
Ensemble approach: Combine multiple classifiers or operating thresholds to improve robustness and reduce the 11% morph miss rate for higher-security contexts.
Threshold calibration: Work with security operations to set the detection threshold based on the actual deployment context – primary gate versus secondary screening – rather than the default 0.5.

Study Design

Data Source: Face images downloaded from the Labeled Faces in the Wild (LFW) dataset via scikit-learn’s fetch_lfw_people. LFW is a publicly available benchmark dataset of celebrity face photographs collected from the web. Five hundred genuine face images were downloaded; 250 morphed images were generated from randomly selected pairs.

Data Handling: Morphed images were generated using 50/50 alpha blending of two randomly selected LFW face images, resized to a consistent resolution. The final dataset of 500 images (250 genuine, 250 morphed) was split 60% training / 20% validation / 20% test using stratified sampling. Standard ImageNet normalization transforms were applied to all images before model input.

Analytical Approach:

Downloaded 500 LFW face images and generated 250 alpha-blended morphed images to create a balanced binary classification dataset.
Built a custom PyTorch Dataset class to handle image loading, labeling, and augmentation transforms.
Loaded pre-trained ResNet50 (ImageNet weights), froze early layers, and replaced the final classification head with a custom two-class head (2048 → 512 → 2).
Fine-tuned the final 20 layers using Adam optimizer (lr=0.0001), cross-entropy loss, batch size 32, for 10 epochs on Google Colab (Tesla T4 GPU).
Evaluated on held-out test set using accuracy, precision, recall, confusion matrix, and ROC/AUC.
Applied Grad-CAM to visualize model attention on correctly and incorrectly classified examples to interpret decision-making and identify failure modes.
Compared results to published academic benchmarks for alpha-blended, landmark-based, and GAN-generated morphs.

Project Resources

Repository: github.com/kchoover14/morph-attack-detection-resnet

Data: Face images from the Labeled Faces in the Wild dataset, downloaded via scikit-learn. Morphed images generated from LFW pairs using alpha blending. Not included in this repository.

Code:

morph_attack_project.ipynb – full pipeline: data download, morph generation, model training, evaluation, Grad-CAM visualization

Project Artifacts:

Figures (n=4): training and validation curves, ROC curve, Grad-CAM attention maps, genuine vs. morphed examples

Environment:

requirements.txt – install pinned Python package versions with pip install -r requirements.txt

License:

Tools & Technologies

Languages: Python

Tools: PyTorch

Expertise

Transferable Expertise: Demonstrates ability to design and evaluate a deep learning pipeline from data generation through deployment recommendation, using explainability tools (Grad-CAM) to translate model behavior into operationally actionable quality standards and threshold guidance.