CARVIEW

MOTORHOMES

Select Language

Model	Type	# Params	AMC23	AQUA	CMATH	GaoKao	Minerva	Olympiad/	SAT
Qwen2.5-1.5B-it	baseline	-	17.5	49.2	65.2	36.4	9.6	12.0	59.4
QLoRA	18.46M	15.0	42.5	61.5	29.6	8.1	8.9	59.4
QOFT	7.89M	27.5	53.1	68.5	41.0	11.8	14.4	81.2
Qwen2.5-1.5B	baseline	-	0.0	18.9	4.0	4.2	2.6	2.4	28.1
QLoRA	18.46M	15.0	37.4	64.2	26.8	8.5	6.8	62.5
QOFT	7.89M	22.5	53.1	56.3	36.1	8.5	12.7	87.5
Qwen2.5-7B-it	baseline	-	50.0	16.5	89.3	61.8	33.5	36.6	53.1
QLoRA	40.37M	30.0	48.0	88.8	50.1	25.4	19.7	68.8
QOFT	17.55M	52.5	70.9	90.5	63.6	33.5	37.6	96.9
Qwen2.5-7B	baseline	-	25.0	55.1	61.2	42.9	11.8	29.9	71.9
QLoRA	40.37M	35.0	48.8	73.7	49.9	18.8	18.5	62.5
QOFT	17.55M	52.5	59.4	80.7	55.6	21.7	34.7	87.5
Qwen2.5-32B-it	baseline	-	62.5	18.5	92.5	70.1	41.5	44.4	65.6
QLoRA	134.22M	62.5	71.7	94.0	71.2	39.7	46.8	96.9
QOFT	57.90M	75.0	83.1	94.7	73.5	41.5	48.7	100.0
Qwen2.5-32B	baseline	-	35.0	23.2	35.7	46.8	20.2	25.2	62.5
QLoRA	134.22M	40.0	52.4	90.5	61.0	32.0	29.8	65.6
QOFT	57.90M	70.0	68.5	90.7	71.4	36.0	44.9	93.8

Explore Related Projects

Orthogonal Finetuning V1

Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning~(OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.

A Parameter-Efficient Formulation with Butterfly Factorization

Paper

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly~(BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

BibTeX


      @InProceedings{qiu2025oftv2,
        title={Orthogonal Finetuning Made Scalable},
        author={Qiu, Zeju and Liu, Weiyang and Weller, Adrian and Sch{\"o}lkopf, Bernhard},
        booktitle={EMNLP},
        year={2025}
      }
      @InProceedings{liu2024boft,
        title={Parameter-efficient orthogonal finetuning via butterfly factorization},
        author={Liu, Weiyang and Qiu, Zeju and Feng, Yao and Xiu, Yuliang and Xue, Yuxuan and Yu, Longhui and Feng, Haiwen and Liu, Zhen and Heo, Juyeon and Peng, Songyou and others},
        booktitle={ICLR},
        year={2024}
      }
      @InProceedings{qiu2023oft,
        title={Controlling text-to-image diffusion by orthogonal finetuning},
        author={Qiu, Zeju and Liu, Weiyang and Feng, Haiwen and Xue, Yuxuan and Feng, Yao and Liu, Zhen and Zhang, Dan and Weller, Adrian and Sch{\"o}lkopf, Bernhard},
        booktitle={NeurIPS},
        year={2023}
      }

Original Source | Taken Source

OFTv2

Orthgonal Finetuning Made Scalable

Orthogonal Finetuning with LoRA-Competitive Scalability

Orthogonal Finetuning Motivation

Orthogonal Transformation Well Preserves the Pre-trained Knowledge

Why Does Orthogonal Transformation Make Sense?

OFT vs LoRA: Two Distinct Roads to Efficient Fine-tuning

Training with OFTv2

Training with Huggingface PEFT

Training with Huggingface TRL

OFTv2 Methods

From Weight-centric to Input-centric Implementation

Approximate Orthogonality via Cayley-Neumann Parameterization

QOFT: Adapting OFT to Finetuning Quantized Foundation Models

Experimental Results

Finetuning Qwen2.5 for Math Reasoning

Subject-driven Generation (DreamBooth Finetuning Stable Diffusion 3.5)

Explore Related Projects

Orthogonal Finetuning V1

A Parameter-Efficient Formulation with Butterfly Factorization

BibTeX