CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 13.4k
v0.5.0
Compare
aed1419
New models
- Llama 3.3: a new state of the art 70B model. Llama 3.3 70B offers similar performance compared to Llama 3.1 405B model.
- Snowflake Arctic Embed 2: Snowflake's frontier embedding model. Arctic Embed 2.0 adds multilingual support without sacrificing English performance or scalability.
Structured outputs
Ollama now supports structured outputs, making it possible to constrain a model's output to a specific format defined by a JSON schema. The Ollama Python and JavaScript libraries have been updated to support structured outputs, together with Ollama's OpenAI-compatible API endpoints.
REST API
To use structured outputs in Ollama's generate or chat APIs, provide a JSON schema object in the format
parameter:
curl -X POST https://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Tell me about Canada."}],
"stream": false,
"format": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"capital": {
"type": "string"
},
"languages": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"name",
"capital",
"languages"
]
}
}'
Python library
Using the Ollama Python library, pass in the schema as a JSON object to the format
parameter as either dict
or use Pydantic (recommended) to serialize the schema using model_json_schema()
.
from ollama import chat
from pydantic import BaseModel
class Country(BaseModel):
name: str
capital: str
languages: list[str]
response = chat(
messages=[
{
'role': 'user',
'content': 'Tell me about Canada.',
}
],
model='llama3.1',
format=Country.model_json_schema(),
)
country = Country.model_validate_json(response.message.content)
print(country)
JavaScript library
Using the Ollama JavaScript library, pass in the schema as a JSON object to the format
parameter as either object
or use Zod (recommended) to serialize the schema using zodToJsonSchema()
:
import ollama from 'ollama';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
const Country = z.object({
name: z.string(),
capital: z.string(),
languages: z.array(z.string()),
});
const response = await ollama.chat({
model: 'llama3.1',
messages: [{ role: 'user', content: 'Tell me about Canada.' }],
format: zodToJsonSchema(Country),
});
const country = Country.parse(JSON.parse(response.message.content));
console.log(country);
What's Changed
- Fixed error importing model vocabulary files
- Experimental: new flag to set KV cache quantization to 4-bit (
q4_0
), 8-bit (q8_0
) or 16-bit (f16
). This reduces VRAM requirements for longer context windows.
New Contributors
- @dmayboroda made their first contribution in #7906
- @Geometrein made their first contribution in #7908
- @owboson made their first contribution in #7693
Full Changelog: v0.4.7...v0.5.0