- Later in our experiments, we found that using
m_mult_vincalc_attn_cont_fn_variantyields a stronger localization signal—that is, using$|m \times v|$ instead of$|m \times v \times W_o|$ .
Run localization on the 𝓛oc𝓚 dataset and generate samples with intervention:
python localization_and_intervention/localize_knowledge_and_intervene.py \
--results_path="{results_path}" \
--disable_k_dominant_blocks={k} \
--num_workers=20 \
--worker_idx={worker_idx} \
--knowledge_type="style" or "place" or "copyright" or "animal" or "celebrity" or "safety" \
--model="pixart" or "sana"and for the FLUX model:
python localization_and_intervention/localize_knowledge_and_intervene_flux.py \
--results_path="{results_path}" \
--disable_k_dominant_blocks={k} \
--num_workers=20 \
--worker_idx={worker_idx} \
--knowledge_type={knowledge_type} These scripts also evaluate the intervention using CLIP scores. For LLaVA-based evaluation, please refer to the sections below.
-
Define a dataset containing prompts that represent the target knowledge using the
BaseDatasetclass indataset.py. (You can refer to existing examples such asPlacesDatasetPlacesDataset, which is designed for localizing place-related knowledge) -
Use the
load_pipefunction inloader.pyto load your desired model and pipeline.. -
Use the
localize_dominant_blocksfunction from one of the following scripts to perform the localization:localization_and_intervention/localize_knowledge_and_intervene.pylocalization_and_intervention/localize_knowledge_and_intervene_flux.py(for FLUX)
Run baseline generations on knowledge-agnostic prompts to assess the impact of the localization process:
python localization_and_intervention/knowledge_agnostic_gen_and_eval.py \
--results_path="{results_path}" \
--model={model_name} \
--knowledge_type={knowledge_type}Run baseline generations on knowledge prompts without intervention to assess localization:
python localization_and_intervention/full_knowledge_gen_and_eval.py \
--results_path="{results_path}" \
--model={model_name} \
--knowledge_type={knowledge_type} \
--worker_idx={worker_idx} \
--num_workers={num_workers}These scripts also evaluate the intervention using CLIP scores. For LLaVA-based evaluation, please refer to the section below.
python llava_eval.py \
--results_path="{results_path}" \
--eval_type=eval_all_knowledge_directories \
--model_name={model_name} \
--knowledge_type={knowledge_type}set --eval_type to eval_a_single_no_knowledge_directory to evaluate "No Knowledge" generations, where the results_path points to a directory containing images (not knowledge subdirectories).
First clone the model checkpoint:
mkdir pretrained_models
gdown 1FX0xs8p-C7Ob-h5Y4cUhTeOepHzXv_46 -O "pretrained_models/csd_checkpoint.pth"then for the evaluation:
python csd/csd_calc.py \
--results_path="your_desired_path_containing_the_images" \
--artists_list_for_model="pixart" \
--eval_single_artist_directoryRemove --eval_single_artist_directory to evaluate each artist's directory in the results_path
Please refer to the dataset/ folder for the 𝓛oc𝓚 dataset, which includes all knowledge categories along with their corresponding prompts for localization. You can also use the data classes defined in dataset.py to load and work with the dataset.
If you find this useful for your research, please cite the following:
@article{zarei2025localizing,
title={Localizing Knowledge in Diffusion Transformers},
author={Zarei, Arman and Basu, Samyadeep and Rezaei, Keivan and Lin, Zihao and Nag, Sayan and Feizi, Soheil},
journal={arXiv preprint arXiv:2505.18832},
year={2025}
}