| CARVIEW |
Guy Yariv
I am a PhD student in Computer Science at the School of Computer Science and Engineering, Hebrew University of Jerusalem, under the joint supervision of Yossi Adi and Sagie Benaim.
I spent the summer of 2024 and the winter of 2025 as a Research Scientist Intern at Meta (GenAI/MSL) and worked as an AI Researcher at Spot by NetApp from 2022 to 2024.
My research interests include machine learning and generative AI. I’m passionate about achieving full controllability in media generation.
Publications
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion
DyPE lets pre-trained diffusion transformers generate ultra-high-res images (16M+ px) without retraining or extra cost, by matching positional encoding extrapolation to diffusion's shift from low-freq structures to high-freq details.
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
We propose Through-The-Mask, a two-stage framework for Image-to-Video generation that uses mask-based motion trajectories to enhance object-specific motion accuracy and consistency, achieving state-of-the-art results, particularly in multi-object scenarios.
RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling
Introducing RewardSDS, a text-to-3D score distillation method that enhances SDS by using reward-weighted sampling to prioritize noise samples based on alignment scores, achieving fine-grained user alignment.
Improving Visual Commonsense in Language Models via Multiple Image Generation
We improve large language models' visual commonsense by generating multiple images from text prompts and integrating them into decision-making via late fusion, boosting performance on visual commonsense reasoning and NLP tasks.
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
We propose a method to generate realistic, audio-aligned videos by adapting a text-to-video model with a lightweight adaptor.
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
We adapt text-conditioned diffusion models for audio-to-image generation by encoding audio into a token compatible with text representations.
Contact
Feel free to reach out:
guyyariv.mail at gmail dot com