Follow-Your-Shape (EditAnyShape): Shape-Aware Image Editing via Trajectory-Guided Region Control

ICLR 2026

1HKUST, 2University of Illinois at Urbana-Champaign 3Shanghai Jiao Tong University
* Equal contribution    ✉ Corresponding author
Teaser Image

Follow-Your-Shape enables flexible modification of arbitrary object shapes while strictly maintaining non-target content. The examples demonstrate both single-object and multi-object cases involving significant shape transformation.

💭Abstract

While recent flow-based image editing models demonstrate general-purpose capabilities across diverse tasks, they often struggle to specialize in challenging scenarios — particularly those involving large-scale shape transformations. When performing such structural edits, these methods either fail to achieve the intended shape change or inadvertently alter non-target regions, resulting in degraded background quality. We propose Follow-Your-Shape, a training-free and mask-free framework that supports precise and controllable editing of object shapes while strictly preserving non-target content. Motivated by the divergence between inversion and editing trajectories, we compute a Trajectory Divergence Map (TDM) by comparing token-wise velocity differences between the inversion and denoising paths. The TDM enables precise localization of editable regions and guides a Scheduled KV Injection mechanism that ensures stable and faithful editing. To facilitate a rigorous evaluation, we introduce ReShapeBench, a new benchmark comprising 120 new images and enriched prompt pairs specifically curated for shape-aware editing. Experiments demonstrate that our method achieves superior editability and visual fidelity, particularly in tasks requiring large-scale shape replacement.

🧠Pipeline

Pipeline

The editing process is divided into three stages. In Stage 1, we stabilize the initial denoising trajectory by injecting key-value (KV) features from the inversion path into the denoising model during its initial steps. In Stage 2, we compute a Trajectory Divergence Map (TDM) by comparing the denoising trajectories generated from the source and edit prompts, and process this map to precisely identify the regions intended for editing. In Stage 3, we perform the final edit: guided by the TDM, blended KV features are injected into the final attention blocks of the denoising model to introduce the new semantics, while ControlNet conditions are supplied to ensure the edited regions conform to the original structure.

⚖️Qualitative Comparisons

Qualitative Results

🖼️ More Shape-Aware Editing Results

Shape Editing 1

Shape Editing 2

🎨 Other Types of General Editing

Other Editing Results

BibTeX

@inproceedings{long2026followyourshape,
title={Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control},
author={Zeqian Long and Mingzhe Zheng and Kunyu Feng and Xinhua Zhang and Hongyu Liu and Harry Yang and Linfeng Zhang and Qifeng Chen and Yue Ma},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=uGaR7L3Z1E}
}

References

  1. Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, Xiaohu Qie, and Yinqiang Zheng. MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  2. Kunyu Feng, Yue Ma, Bingyuan Wang, Chenyang Qi, Haozhe Chen, Qifeng Chen, and Zeyu Wang. DiT4Edit: Diffusion Transformer for Image Editing. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025.
  3. Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, and Tomer Michaeli. FlowEdit: Inversion-Free Text-Based Editing Using Pre-trained Flow Models. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025.
  4. Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, et al. FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space. arXiv preprint arXiv:2506.15742, 2025.
  5. Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. PnP Inversion: Boosting Diffusion-Based Editing with 3 Lines of Code. International Conference on Learning Representations (ICLR), 2024.
  6. Jiangshan Wang, Junfu Pu, Zhongang Qi, Jiayi Guo, Yue Ma, Nisha Huang, Yuxin Chen, Xiu Li, and Ying Shan. Taming Rectified Flow for Inversion and Editing. Proceedings of the International Conference on Machine Learning (ICML), 2025.
  7. Tianrui Zhu, Shiyi Zhang, Jiawei Shao, and Yansong Tang. KV-Edit: Training-Free Image Editing for Precise Background Preservation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025.