CARVIEW |
Introducing MPT-7B: A New Open-Source LLM
An LLM Trained on 1T Tokens of Text and Code by MosaicML Foundation Series.

Image by Author
The Large language models (LLM) are going crazy at the moment. However, as an organization, if you do not have the right resources, it can be challenging to jump on the large language model wave. Training and deploying large language models can be difficult, and you suddenly feel left out. Open-source LLMs, such as the LLaMA series from Meta have allowed for LLM resources to be available.Â
And to add to the open-source collection is MosaicML Foundations' latest addition to their series - MPT-7B.
What is MPT-7B?
MPT stands for MosaicML Pretrained Transformer. MPT models are GPT-style decoder-only transformers that come with many improvements:Â
- Performance-optimized layer implementations
- Greater training stability due to architecture changes
- No context length limitations
MPT-7B is a transformer model that has been trained from scratch using 1T tokens of text and code. Yes, 1 TRILLION! It was trained on the MosaicML platform, with a time frame of 9.5 days with zero human intervention. Costing MosaicML ~$200k.
It is open-source, making it available for commercial use and the tool will be a game changer on how businesses and organizations work with their predictive analytics and decision-making process.Â
The main features of MPT-7B are:
- Licensed for commercial useÂ
- Trained on a large amount of data (1T tokens)
- Can handle extremely long inputs
- Optimized for fast training and inference
- Highly efficient open-source training code.
MPT-7B is the base model and has been shown to outperform other open-source 7B - 20B models. The quality of MPT-7B matches LLaMA-7B. To evaluate the quality of MPT-7B, MosaicML Foundation put together 11 open-source benchmarks and evaluated them using the industry-standard manner.

Image by MosaicML Foundation
MosaicML foundations are also releasing three additional fine-tuned models:
- MPT-7B-Instruct
- MPT-7B-Chat
- MPT-7B-StoryWriter-65k+
MPT-7B-Instruct
The MPT-7B-Instruct model is for short-form instruction following. With 26,834 dated the 14th of May, MPT-7B-Instruct allows you to ask quick and short questions and provides you with an instant response. Have a question, and you just want a simple answer - use MPT-7B-Instruct.
Why is this so great? Typically LLMs are taught to continue generating text based on the input that was provided. However, some are looking for LLMs that treat their input as an instruction. Instruction finetuning allows LLMs to perform instruction-following outputs.Â
MPT-7B-Chat
Yes, we have another chatbot. MPT-7B-Chat generates dialogue. For example, if you want the chatbot to generate a speech, giving it context it will generate a text in a conversational manner. Or maybe you want to write a tweet which paraphrases a paragraph from an article, it can generate the dialogue for you!
Why is this so great? MPT-7B Chat is ready and well-equipped for a variety of conversational tasks, delivering more seamless, engaging multi-turn interactions for users.
MPT-7B-StoryWriter-65k+
This is for the story writers! For those who want to write stories that have a long context, MPT-7B-StoryWriter-65k+ is a model designed for exactly that. The model was built by fine-tuning MPT-7B with a context length of 65k tokens, and it can extrapolate beyond 65k tokens. MosaicML Foundation has been able to generate 84k tokens on a single node of A100-80GB GPUs.Â
Why is this so great? This is because most open-source LLMs can only handle sequences with up to a few thousand tokens. But just by using a single node of 8xA100-80GB on the MosaicML platform, you can finetune MPT-7B to handle context lengths up to 65k!Â
More on How MPT-7B was Built
The MosaicML team built these models in only a few weeks. In only a few weeks they dealt with the data preparation, training, finetuning, and deployment.Â
The data was sourced from a variety of sources, which all had a billion tokens available in each source. The number of effective tokens still got a billion in each source! The team used EleutherAI’s, GPT-NeoX, and 20B tokenizer, allowing them to train on a diverse mix of data, apply consistent space delimitation, and more.Â
All the MPT-7B models were trained on the MosaicML platform, using A100-40GB and A100-80GB GPUs from Oracle Cloud.Â
If you would like to know more about the tools and costs of MPT-7B, have a read of the: MPT-7B Blog.
Wrapping it up
The MosaicML platform can be considered as the best starting point for organisations, if it be private, commercial or community related to build custom LLMs. Having this open-source resource available will allow organisations to feel freer about using these tools to improve the current organisational challenges.Â
Customers are able to train LLMs on any computing provider, or data source, whilst being able to maintain efficiency, privacy and cost transparency.
What do you think you will be using MPT-7B for? Let us know in the comments below
Nisha Arya is a Data Scientist, Freelance Technical Writer and Community Manager at KDnuggets. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.
- Introducing MetaGPT's Data Interpreter: SOTA Open Source LLM-based…
- Introducing the Testing Library for Natural Language Processing
- Introducing Healthcare-Specific Large Language Models from John Snow Labs
- Introducing OpenChat: The Free & Simple Platform for Building…
- Introducing OpenLLM: Open Source Library for LLMs
- Introducing Falcon2: Next-Gen Language Model by TII
Latest Posts
- We Benchmarked DuckDB, SQLite, and Pandas on 1M Rows: Here’s What Happened
- Prompt Engineering Templates That Work: 7 Copy-Paste Recipes for LLMs
- A Complete Guide to Seaborn
- 10 Command-Line Tools Every Data Scientist Should Know
- How I Actually Use Statistics as a Data Scientist
- The Lazy Data Scientist’s Guide to Exploratory Data Analysis
Top Posts |
---|
- 5 Fun AI Agent Projects for Absolute Beginners
- How I Actually Use Statistics as a Data Scientist
- The Lazy Data Scientist’s Guide to Exploratory Data Analysis
- Prompt Engineering Templates That Work: 7 Copy-Paste Recipes for LLMs
- 10 Command-Line Tools Every Data Scientist Should Know
- A Gentle Introduction to TypeScript for Python Programmers
- We Benchmarked DuckDB, SQLite, and Pandas on 1M Rows: Here’s What Happened
- A Complete Guide to Seaborn
- From Excel to Python: 7 Steps Analysts Can Take Today
- A Gentle Introduction to MCP Servers and Clients