You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It works in the same way as the current support for the SD2.0 depth model, in that you run it from the img2img tab, it extracts information from the input image (in this case, CLIP or OpenCLIP embeddings), and feeds those into the model in addition to the text prompt. Normally you would do this with denoising strength set to 1.0, since you don't actually want the normal img2img behaviour to have any influence on the generated image.
One thing I did not implement is any way to use this functionality but starting from random noise like txt2img does - which would probably generate more varied variations. This would be good for future work.
Additional notes and description of your changes
Key changes:
Autoloading the config. This is straightforward, except one complication: v2-1-stable-unclip-l-inference.yaml requires and hardcodes the path of a supplementary file at checkpoints/karlo_models/ViT-L-14_stats.th . I opted to hotpatch the config to point at models/karlo/ViT-L-14_stats.th instead; and to check the file into the repo itself, since it's very small - thoughts on this approach?
unclip_image_conditioning() - the key feature of this change. Adapted from Stability's script.
Changes all over sd_samplers_compvis and sd_samplers_kdiffusion to account for the fact that these models expect the conditioning to be provided differently (e.g. it needs to be in c_adm instead of c_concat). Not the cleanest code, but it works.
MrCheeze
changed the title
Add support for the Variations models (unclip-h and unclip-l)
Add support for the unclip (Variations) models, unclip-h and unclip-l
Mar 26, 2023
Getting this error when xformers is not installed (using SDP attention):
File "D:\D-SD\AUTO\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 258, in forward
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op=self.attention_op)
NameError: name 'xformers' is not defined
I kept getting 'NoneType is not callable' trying to use this
I was getting the same issue, until I removed the --medvram flag and now it works. When I checked out the source code, there was a place where the model is supposed to call an embedder method. However, an embedder method is not being assigned to the sd_model object when the model is allowed to use only less VRAM.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe what this pull request is trying to achieve.
This adds support for stable-diffusion-2-1-unclip checkpoints that are used for generating image variations. (See also.)
It works in the same way as the current support for the SD2.0 depth model, in that you run it from the img2img tab, it extracts information from the input image (in this case, CLIP or OpenCLIP embeddings), and feeds those into the model in addition to the text prompt. Normally you would do this with denoising strength set to 1.0, since you don't actually want the normal img2img behaviour to have any influence on the generated image.
One thing I did not implement is any way to use this functionality but starting from random noise like txt2img does - which would probably generate more varied variations. This would be good for future work.
Additional notes and description of your changes
Key changes:
checkpoints/karlo_models/ViT-L-14_stats.th
. I opted to hotpatch the config to point atmodels/karlo/ViT-L-14_stats.th
instead; and to check the file into the repo itself, since it's very small - thoughts on this approach?Environment this was tested in
Windows, NVIDIA GTX 1660 6GB
Screenshots or videos of your changes