[TMLR 2024] NuTime

NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining

Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu

This repository contains the official implementation of the paper: NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining, which is accepted to TMLR 2024. In this work, we propose the NuTime model for large-scale time series pretraining. The model is based on the Transformer architecture, which takes input as a set of tokens from non-overlapping windows. Each window is represented by its normalized shape, the window mean and standard deviation. We develop a numerically multi-scaled embedding method (NME) for representing the scalar values of mean and std. The model can take raw values of time-series data in any numerical scales as input without any data normalization and transformation.

Feel free to contact me (chenguolin@stu.pku.edu.cn) or open an issue if you have any questions or suggestions.

📢 News

2024-11-12: Checkpoint of the self-supervised pretrained NuTime is released.
2024-11-12: Codes about data preprocessing, training, evaluation are released.
2024-07-15: It might take some time to clean the entire codebase for releasing, so we first provide the code about window & mean & std embeddings, which is the essential part of the proposed NuTime, at here.
2024-07-10: NuTime is accepted to TMLR 2024.

📋 TODO

Release the training and evaluation code
Release the self-supervised pretrained NuTime

🔧 Installation

Please install PyTorch according to your CUDA version first. There are not restrictions on the torch version, feel free to use your preferred one.

git clone https://github.com/microsoft/NuTime.git
cd NuTime
bash settings/setup.sh

📊 Dataset

Please refer to data/preprocess.py. We provide the script to preprocess the data including: UCR, UEA, SleepEDF, Epilepsy, etc. The processed and splitted Epilpesy dataset is provided in datasets/Epilepsy for example.

Due to license constraints, we cannot provide the download link of the pretraining data. However, the dataset can be gathered according to the process described in the section 4.1 of the paper.

🚀 Usage

The core part of our work is WindowNormEncoder in models/encoders/WindowNormEncoder.py and WinT in models/networks.py. You can directly view the code for implementation details. Other codes are merely for data preprocessing, training, evaluation and ablation study, which could be ignored essentially.
Checkpoint of the self-supervised (i.e., BYOL-style) pretrained NuTime (with 9 multi-scaled embeddings) is provided in ckpt/checkpoint_bias9.pth

Run Script

To pretrain, fine-tune the model, an example script is provided in ./run.sh.

📚 Citation

If you find our work helpful, please consider citing:

@article{lin2024nutime,
  title={NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining},
  author={Chenguo Lin and Xumeng Wen and Wei Cao and Congrui Huang and Jiang Bian and Stephen Lin and Zhirong Wu},
  journal={Transactions on Machine Learning Research (TMLR)},
  year={2024}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
ckpt		ckpt
configs		configs
data		data
encoders		encoders
experiments		experiments
models		models
settings		settings
utils		utils
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
azure-pipelines.yml		azure-pipelines.yml
config.py		config.py
main.py		main.py
pipeline.py		pipeline.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[TMLR 2024] NuTime

NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining

Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu

📢 News

📋 TODO

🔧 Installation

📊 Dataset

🚀 Usage

Run Script

📚 Citation

Contributing

Trademarks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

microsoft/NuTime

Folders and files

Latest commit

History

Repository files navigation

[TMLR 2024] NuTime

NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu

📢 News

📋 TODO

🔧 Installation

📊 Dataset

🚀 Usage

Run Script

📚 Citation

Contributing

Trademarks

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time-Series Pretraining

Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu

Packages