An Empirical Evaluation of DragGAN’s Efficacy Across Distinct Subject Categories

Publication Date : May-06-2026

DOI: 10.70251/HYJR2348.4316

Author(s) :

Sihoon Kim.

Volume/Issue :

Volume 4

Issue 3

(May - 2026)

Abstract :

Generative Adversarial Networks (GANs) have reshaped the landscape of synthetic media, enabling the creation of hyper-realistic imagery through adversarial learning. Within this domain, DragGAN has emerged as a notable innovation, offering intuitive point-based manipulation of generated images by translating user-specified handle points to target spatial locations. Despite its qualitative success in published demonstrations, a rigorous quantitative evaluation of its performance across varying semantic categories remains absent from the literature. This study addresses that gap by assessing DragGAN’s efficacy in maintaining structural integrity and generating diverse outputs across four distinct subject categories: human faces, dog faces, cat faces, and whole dog bodies. Using a curated dataset of images generated from pre-trained StyleGAN2 checkpoints (FFHQ, AFHQ, and a Self-Distilled StyleGAN body model), the Structural Similarity Index (SSIM) was applied to measure fidelity and a decomposed Inception Score (IS) was used to evaluate perceptual quality and diversity. All categories exhibited substantial structural degradation under point-based manipulation, with mean SSIM scores ranging from 0.21 (cat faces) to 0.33 (dog bodies). The full-body dog category achieved the highest structural preservation, while facial categories—particularly cat and dog faces—showed the greatest degradation. Decomposed Inception Score analysis indicated consistently low classifier confidence across all categories, a pattern attributable to domain mismatch between the generated subjects and the ImageNettrained Inception-v3 classifier. These findings establish a quantitative baseline indicating that DragGAN’s point-based manipulation introduces significant structural distortion across all tested domains, with relative performance differences suggesting that full-body manipulation may be more tractable than finegrained facial editing.

American Journal of Student Research®