Following the introduction of VecSet, extensive work has been done to propose enhancements. This project is designed to incorporate these novel designs and to provide a unifed framework for VecSet-based representations.
- [2025-10-03] Released the inference script.
- [2025-04-09] Released the pretrained model
point_vec1024x32_dim1024_depth24_sdf_nbandlearnable_vec1024_dim1024_depth24_sdf. - [2025-04-06] Released traing code and a pretrained model
learnable_vec1024x32_dim1024_depth24_sdf_nb.
conda create -y -n vecset python=3.11 -y
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
conda install cuda-nvcc=12.4 -c nvidia -y
conda install libcusparse-dev -y
conda install libcublas-dev -y
conda install libcusolver-dev -y
conda install libcurand-dev -y # torch_cluster
pip install flash-attn --no-build-isolation
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.6.0+cu124.html
pip install tensorboard
pip install einops
pip install trimesh
pip install tqdm
pip install PyMCubes16 GPUs (4 GPUs with accum_iter 4)
cd vecset
torchrun \
--nproc_per_node=4 \
main_ae.py \
--accum_iter=4 \
--model learnable_vec1024x16_dim1024_depth24_nb \
--output_dir output/ae/learnable_vec1024x16_dim1024_depth24_sdf_nb \
--log_dir output/ae/learnable_vec1024x16_dim1024_depth24_sdf_nb \
--num_workers 24 \
--point_cloud_size 8192 \
--batch_size 16 \
--epochs 500 \
--warmup_epochs 1 --blr 5e-5 --clip_grad 1The base model design is from VecSet. I have incorporated the following features list:
- Faster training with Flash Attention
- Normalized Bottleneck (NBAE) from LaGeM. No need to tune the KL weight anymore!
- SDF regression instead of occupancy classification suggested by TripoSG. For now, I only use Eikonal regularization.
I am planning to incorporate the following features:
- Edge sampling from Dora-VAE
- Multiresolution training from CLAY
- Compact autoencoder from COD-VAE
- Quantized bottleneck (VQ).
- (Start an issue if you have any ideas!)
The following models will be released in this link:
- (Other models are training!)
| model | Queries | Layers | Channels | Bottlneck (Size x Ch) | Regularization | Loss |
|---|---|---|---|---|---|---|
point_vec1024x32_dim1024_depth24_sdf_nb |
Point | 24 | 1024 | 1024x32 | NB | SDF+Eikonal |
learnable_vec1024x32_dim1024_depth24_sdf_nb |
Learnable | 24 | 1024 | 1024x32 | NB | SDF+Eikonal |
learnable_vec1024_dim1024_depth24_sdf |
Learnable | 24 | 1024 | 1024x1024 | SDF+Eikonal |
If you want to test the autoencoder, make sure the input surface point cloud is normalized,
## surface: N x 3
shifts = (surface.max(axis=0) + surface.min(axis=0)) / 2
surface = surface - shifts
distances = np.linalg.norm(surface, axis=1)
scale = 1 / np.max(distances)
surface *= scaleHere is the inference script,
python infer.py --input input_point_cloud.ply --output output_mesh.obj
The available model definitions can be found in autoencoder.py. Note that the script assumes the input file is a point cloud instead of a mesh file.
- Removed layernorm on KV suggested by Youkang Kong
- Added layernorm before final output layer.
- Added zero initialization on the final output layer.
- Added random rotations as the data augmentations as in LaGeM.
- Adjusted code for latest version of PyTorch.