You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Weakly-supervised Fingerspelling Recognition in British Sign Language Videos
This is the official implementation of the paper. The code has been tested with Python version 3.6.8. Pre-trained checkpoint for fingerspelling is also released below.
Follow the instructions on the BOBSL page to get the username and password to access parts of the BOBSL dataset.
cd features
sh download_features.sh username password
Get the annotations:
cd data/
sh download.sh username password. This is a fast download that obtains the manually verified test annotations and the automatically obtained annotations for the BOBSL episodes. For the automatic annotations, the ? in the word column indicates the fingerspelled word could not be determined by the automatic pseudolabeling method.
The above run should give a CER of 53.1. You can also turn on the --full_word_test flag to compute CER with the full words, which should be 59.9.
Using Video-Swin as a feature extractor
We also release the pre-trained Video-Swin model which is used to extract the features mentioned above. The model has been trained on person crops of the BOBSL dataset. The model will work best if the signer crops are similar to that of the BOBSL signer crops. You can get the pre-trained checkpoint here. Below is a small example of how to use it:
fromvideoswinimportSwinTransformer3D, VideoPreprocessingfromutilsimportload# BOBSL person-crop video inputmodel=SwinTransformer3D()
model=load("video-swin-s.pth")[0]
vp=VideoPreprocessing()
clip=# read an *RGB* video clip with size (batch_size, 3, 16, 256, 256). This can be done using OpenCV, for example. clip=vp(clip) # (batch_size, 3, 16, 224, 224)features=model(clip) # (batch_size, 768)
License and Citation
The code, models, and the released annotations are bound by the exact same licensing terms stated on the official BOBSL page.
Please cite the following paper if you use this repository:
@InProceedings{Prajwal22a,
author = "K R Prajwal and Hannah Bull and Liliane Momeni and Samuel Albanie and G{\"u}l Varol and Andrew Zisserman",
title = "Weakly-supervised Fingerspelling Recognition in British Sign Language Videos",
booktitle = "British Machine Vision Conference",
year = "2022",
keywords = "sign language, fingerspelling, bsl, bobsl",
}
About
Code for "Weakly-supervised Fingerspelling Recognition in British Sign Language Videos", BMVC 2022.