The Speech Signal Processing Toolkit (SPTK) is a comprehensive suite of software tools for speech signal processing.
SPTK consists of over 100 independent commands for diverse speech signal processing tasks. A key feature is that all commands communicate through standard input and output, enabling the creation of complex processing chains using pipes.
Below is a simple example of using SPTK commands in the terminal:
$ x2x +sd < data.raw | clip -l 32768 -u 32767 | x2x +da | less
As shown above, SPTK follows the Unix philosophy. All data is handled in a raw, header-less format (typically 64-bit double-precision), allowing for seamless integration between tools. Furthermore, all parameters are configured via command-line options, making the toolkit ideal for automation and scripting.
For more information, please refer to the following resources:
- Reference Manual - Detailed command specifications and usage.
- Tutorial Slides - A great starting point for beginners to understand the basics of SPTK.
- Interactive Tutorial (Google Colab) — The easiest way to try SPTK in your browser.
- Conference Paper - For technical details and background, please refer to our publication in the ISCA Archive.
Follow us on X (Twitter) for the latest updates and demonstrations.
To build SPTK, you need a C++ compiler that supports C++11 (or later) and CMake.
- C++ Compiler: GCC 4.8.5+ / Clang 3.5.0+ / Visual Studio 2015+
- CMake: 3.1+
Linux / macOS
You can download and build the latest version of SPTK from source:
# Clone the repository
$ git clone https://github.com/sp-nitech/SPTK.git
$ cd SPTK
# Build the toolkit
$ makeTo use SPTK commands from any directory, add the SPTK/bin/ directory to your PATH.
For example, in your .bashrc or .zshrc:
export PATH="$PATH:/path/to/SPTK/bin"
If you wish to integrate SPTK functions into your own C++ projects, link against the static library lib/libsptk.a.
Windows
Before proceeding, ensure that cmake and MSBuild are added to your PATH environment variable.
You can build SPTK by running make.bat or by following these steps in the Command Prompt:
# Navigate to the SPTK directory
$ cd C:\path\to\SPTK
# Create a build directory
$ mkdir build
$ cd build
# Generate project files and build
$ cmake .. -DCMAKE_INSTALL_PREFIX=..
$ MSBuild /p:Configuration=Release INSTALL.vcxprojAlternatively, you can open the generated SPTK.sln file in the build directory and compile using the Visual Studio GUI.
To use SPTK functions in your Windows projects, link against the static library lib/sptk.lib.
The following pipeline demonstrates how to decrease the volume of input.wav by half:
$ wav2raw +s input.wav | x2x +sd | sopr -m 0.5 | x2x +ds -r | raw2wav +s -s 16 > output.wavSPTK includes various example scripts. To run them, navigate to an example directory and execute run.sh:
$ cd egs/analysis_synthesis/mgc
$ ./run.shTo visualize data, you can use the built-in Python environment for plotting:
# Set up a virtual environment (one-time)
$ cd tools; make venv PYTHON_VERSION=3.8; cd ..
# Generate a figure
$ . ./tools/venv/bin/activate
$ impulse -l 32 | gseries impulse.png
$ deactivate- Enhanced Precision: The default data type has been upgraded from 4-byte
floatto 8-bytedouble(64-bit). - C++ Engine: The core signal processing logic is now implemented in C++ (formerly C), ensuring better maintainability.
- Modern Plotting: Drawing commands have been migrated to Python, offering greater flexibility and modern visualization options.
- Thread-Safety: The library is now thread-safe, making it compatible with multi-threaded applications and parallel processing.
- Cross-Platform Support: Added official support for Windows (in addition to Linux and macOS).
Obsoleted & Integrated
acep,agcep,amcep→amgcepbellc2sp→mgc2spcat2,echo2dads,us,us16,uscd→soxorffmpegfiggc2gc→mgc2mgcgcep,mcep,uels→mgcepglsadf,lmadf,mlsadf→mglsadfivq,vq→imsvq,msvqlsp2sp→mglsp2spmgc2mgclsp,mgclsp2mgcpsgr,xgrwavjoin,wavsplit
Separated & Renamed
c2ir→c2mpir&mpir2cdtw→dtw&dtw_mergemgclsp2sp→mglsp2spmglsadf→mglsadf&imglsadftrain→train&msequlaw→ulaw&iulawvstat→vstat&median
- Keiichi Tokuda (Project Design) - Nagoya Institute of Technology
- Keiichiro Oura - Nagoya Institute of Technology
- Takenori Yoshimura (Lead Maintainer) - Nagoya Institute of Technology
- Takato Fujimoto - Nagoya Institute of Technology
We would like to express our gratitude to all the contributors who have supported SPTK's development over the years:
- Akira Tamamori
- Cassia Valentini
- Chiyomi Miyajima
- Fernando Gil Resende Junior
- Gou Hirabayashi
- Heiga Zen
- Junichi Yamagishi
- Kazuhito Koishida
- Keiichi Tokuda
- Keiichiro Oura
- Kenji Chiba
- Masatsune Tamura
- Naohiro Isshiki
- Noboru Miyazaki
- Satoshi Imai
- Shinji Sako
- Tadashi Kitamura
- Takao Kobayashi
- Takashi Masuko
- Takashi Nose
- Takato Fujimoto
- Takayoshi Yoshimura
- Takenori Yoshimura
- Toru Takahashi
- Toshiaki Fukada
- Toshihiko Kato
- Toshio Kanno
- Yoshihiko Nankaku
This software is released under the Apache License 2.0.
SPTK incorporates the following third-party libraries:
| Category | Library | License |
|---|---|---|
| Pitch Extraction | Snack | Tcl/Tk License |
| SWIPE' | MIT License | |
| REAPER | Apache License 2.0 | |
| WORLD Analysis-Synthesis | WORLD | 3-Clause BSD License |
| Audio Format Conversion | dr_libs | Public Domain / MIT License |
| stb | Public Domain / MIT License | |
| Command-line Parser | ya_getopt | 2-Clause BSD License |
@InProceedings{sp-nitech2023sptk,
author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},
title = {{SPTK4}: An open-source software toolkit for speech signal processing},
booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},
pages = {211--217},
year = {2023},
}