Zeeshan Zia

Hasan Iqbal
Computer Vision, Generative AI, AR/VR

Detroit, USA
hasan{.}iqbal{.}cs{@}wayne{.}edu

I am a Ph.D. candidate at Wayne State University, specializing in cutting-edge research on Generative AI, Diffusion Models, 3D Computer Vision, Neural Radiance Fields (NeRFs), 3D Gaussian Splatting, and Virtual Reality. My work pushes the boundaries of 3D content generation and visualization, focusing on innovative techniques that blend AR/VR with advanced 3D modeling.

In addition to my research, I have co-authored several impactful publications, including work on anomaly detection using diffusion models, communication-efficient federated learning, and text-driven 3D scene editing. These contributions have been presented at top-tier conferences, including the European Conference on Computer Vision (ECCV).

Before my Ph.D., I worked as an Image Algorithm Engineer in Shanghai, tackling complex Computer Vision challenges like counterfeit detection and OCR with deep learning. I hold a Master’s degree from Tsinghua University and a Bachelor’s from the National University of Sciences and Technology, both of which provided a strong foundation for my current work in 3D Computer vision and GenerativeAI.

Curriculum Vitae

External Links

Industry

Rocket Companies
Detroit, MI
...
Research Intern
May 2024 - till now

Hainan PPK Co. Ltd.
Shanghai, China
...
Image Algorithm Engineer
2018-2021

Pacsquare Tech Pvt. Ltd.
Islamabad, Pakistan
...
Software Engineer
2015-2016

Techlogix Pvt. Ltd.
Islamabad, Pakistan
...
Software Intern
2014-2014

Academia

PhD
Wayne State University
...
Detroit, USA
2021-Till now

MS
Tsinghua University
...
Beijing, China
2016-2018

BE
NUST
...
Islamabad, Pakistan
2011-2015

Selected Recent Publications see all...

  • Umar Khalid*, Hasan Iqbal*, Nazmul Karim, Muhammad Tayyab, Jing Hua and Chen Chen (*Equal contribution)LatentEditor: text driven local editing of 3D scenesECCV 2024Conference
    While neural fields have made significant strides in view synthesis and scene reconstruction, editing them poses a formidable challenge due to their implicit encoding of geometry and texture information from multi-view inputs. In this paper, we introduce LatentEditor, an innovative framework designed to empower users with the ability to perform precise and locally controlled editing of neural fields using text prompts. Leveraging denoising diffusion models, we successfully embed real-world scenes into the latent space, resulting in a faster and more adaptable NeRF backbone for editing compared to traditional methods. To enhance editing precision, we introduce a delta score to calculate the 2D mask in the latent space that serves as a guide for local modifications while preserving irrelevant regions. Our novel pixel-level scoring approach harnesses the power of InstructPix2Pix (IP2P) to discern the disparity between IP2P conditional and unconditional noise predictions in the latent space. The edited latents conditioned on the 2D masks are then iteratively updated in the training set to achieve 3D local editing. Our approach achieves faster editing speeds and superior output quality compared to existing 3D editing models, bridging the gap between textual instructions and high-quality 3D scene editing in latent space. We show the superiority of our approach on four benchmark 3D datasets, LLFF, IN2N, NeRFStudio and NeRF-Art. Project website: https://latenteditor.github.io/
    @inproceedings{khalid2025latenteditor,
      title={LatentEditor: text driven local editing of 3D scenes},
      author={Khalid, Umar and Iqbal, Hasan and Karim, Nazmul and Tayyab, Muhammad and Hua, Jing and Chen, Chen},
      booktitle={European Conference on Computer Vision},
      pages={364--380},
      year={2025},
      organization={Springer}
    }
  • Nazmul Karim*, Hasan Iqbal*, Umar Khalid, Chen Chen and Jing Hua (*Equal contribution)Free-Editor: Zero-shot Text-driven 3D Scene EditingECCV 2024Conference
    Text-to-Image (T2I) diffusion models have recently gained traction for their versatility and user-friendliness in 2D content generation and editing. However, training a diffusion model specifically for 3D scene editing is challenging due to the scarcity of large-scale datasets. Currently, editing 3D scenes necessitates either retraining the model to accommodate various 3D edits or developing specific methods tailored to each unique editing type. Moreover, state-of-the-art (SOTA) techniques require multiple synchronized edited images from the same scene to enable effective scene editing. Given the current limitations of T2I models, achieving consistent editing effects across multiple images remains difficult, leading to multi-view inconsistency in editing. This inconsistency undermines the performance of 3D scene editing when these images are utilized. In this study, we introduce a novel, training-free 3D scene editing technique called \textsc{Free-Editor}, which enables users to edit 3D scenes without the need for model retraining during the testing phase. Our method effectively addresses the issue of multi-view style inconsistency found in state-of-the-art (SOTA) methods through the implementation of a single-view editing scheme. Specifically, we demonstrate that editing a particular 3D scene can be achieved by modifying only a single view. To facilitate this, we present an Edit Transformer that ensures intra-view consistency and inter-view style transfer using self-view and cross-view attention mechanisms, respectively. By eliminating the need for model retraining and multi-view editing, our approach significantly reduces editing time and memory resource requirements, achieving runtimes approximately 20 times faster than SOTA methods. We have performed extensive experiments on various benchmark datasets, showcasing the diverse editing capabilities of our proposed technique. Project website: https://free-editor.github.io/
    @inproceedings{karim2025free,
      title={Free-editor: zero-shot text-driven 3D scene editing},
      author={Karim, Nazmul and Iqbal, Hasan and Khalid, Umar and Chen, Chen and Hua, Jing},
      booktitle={European Conference on Computer Vision},
      pages={436--453},
      year={2025},
      organization={Springer}
    }
  • Umar Khalid*, Hasan Iqbal*, Nazmul Karim, Azib Farooq, Chen Chen and Jing Hua (*Equal contribution)3DEgo: 3D Editing on the Go!ECCV 2024Conference
    We introduce 3DEgo to address a novel problem of directly synthesizing photorealistic 3D scenes from monocular videos guided by textual prompts. Conventional methods construct a text-conditioned 3D scene through a three-stage process, involving pose estimation using Structure-from-Motion (SfM) libraries like COLMAP, initializing the 3D model with unedited images, and iteratively updating the dataset with edited images to achieve a 3D scene with text fidelity. Our framework streamlines the conventional multi-stage 3D editing process into a single-stage workflow by overcoming the reliance on COLMAP and eliminating the cost of model initialization. We apply a diffusion model to edit video frames prior to 3D scene creation by incorporating our designed noise blender module for enhancing multi-view editing consistency, a step that does not require additional training or fine-tuning of T2I diffusion models. 3DEgo utilizes 3D Gaussian Splatting to create 3D scenes from the multi-view consistent edited frames, capitalizing on the inherent temporal continuity and explicit point cloud data. 3DEgo demonstrates remarkable editing precision, speed, and adaptability across a variety of video sources, as validated by extensive evaluations on six datasets, including our own prepared GS25 dataset. Project website: https://3dego.github.io/
    @inproceedings{khalid20253dego,
      title={3DEgo: 3D Editing on the Go!},
      author={Khalid, Umar and Iqbal, Hasan and Farooq, Azib and Hua, Jing and Chen, Chen},
      booktitle={European Conference on Computer Vision},
      pages={73--89},
      year={2025},
      organization={Springer}
    }
  • Hasan Iqbal, Umar Khalid, Chen Chen, Jing Hua. Unsupervised Anomaly Detection in Medical Images Using Masked Diffusion Model 14th International Conference on Machine Learning in Medical Imaging (MLMI) 2023Conference
    It can be challenging to identify brain MRI anomalies using supervised deep-learning techniques due to anatomical heterogeneity and the requirement for pixel-level labeling. Unsupervised anomaly detection approaches provide an alternative solution by relying only on sample-level labels of healthy brains to generate a desired representation to identify abnormalities at the pixel level. Although, generative models are crucial for generating such anatomically consistent representations of healthy brains, accurately generating the intricate anatomy of the human brain remains a challenge. In this study, we present a method called the masked-denoising diffusion probabilistic model (mDDPM), which introduces masking-based regularization to reframe the generation task of diffusion models. Specifically, we introduce Masked Image Modeling (MIM) and Masked Frequency Modeling (MFM) in our self-supervised approach that enables models to learn visual representations from unlabeled data. To the best of our knowledge, this is the first attempt to apply MFM in denoising diffusion probabilistic models (DDPMs) for medical applications. We evaluate our approach on datasets containing tumors and numerous sclerosis lesions and exhibit the superior performance of our unsupervised method as compared to the existing fully/weakly supervised baselines. Project website: https://mddpm.github.io/
    @inproceedings{iqbal2023unsupervised,
      title={Unsupervised anomaly detection in medical images using masked diffusion model},
      author={Iqbal, Hasan and Khalid, Umar and Chen, Chen and Hua, Jing},
      booktitle={International Workshop on Machine Learning in Medical Imaging},
      pages={372--381},
      year={2023},
      organization={Springer}
    }