From Sketch to Image: A GAN-Based Forensic Reconstruction System Using Text Prompts
Main Article Content
Abstract
Facial sketch-to-image synthesis is a crucial task in forensic investigations, digital art, and human-computer interaction. This research presents a Generative Adversarial Network (GAN)-based model that converts facial sketches into high-fidelity, photorealistic images guided by textual descriptions. The proposed approach integrates Contrastive Language-Image Pretraining (CLIP) to enhance textual feature extraction, ensuring detailed and accurate facial reconstructions. A refined encoder to StyleGAN pipeline is employed to generate images with structural coherence and perceptual realism. The model is trained on the Multi-Modal-CelebA-HQ dataset, comprising 19,923 paired sketches, images, and textual descriptions. Performance evaluation is conducted using standard image similarity metrics, including L2 Norm and Structural Similarity Index (SSIM). Experimental results demonstrate that the proposed method achieves a high SSIM score (0.788) and low L2 Norm (89.68), indicating strong structural similarity and fine- detail preservation. Despite promising results, limitations remain, such as the model's dependency on textual prompts, potential bias in generated features, and computational constraints. Future work will explore multi-modal enhancements, such as incorporating audio cues or additional image inputs, as well as adopting self-attention mechanisms and transformer-based architectures for improved image synthesis. This study contributes to advancements in AI-driven face reconstruction with applications in forensics, digital art, and entertainment industries.