HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 date: Mon, 14 Jul 2025 14:38:41 GMT content-type: text/html last-modified: Thu, 10 Jul 2025 14:18:56 GMT expires: Mon, 14 Jul 2025 14:38:41 GMT cache-control: max-age=0,public, max-age=0, s-maxage=300, must-revalidate x-frame-options: DENY x-xss-protection: 1; mode=block x-envoy-upstream-service-time: 9 cf-cache-status: REVALIDATED set-cookie: __cf_bm=NUfr6UZ0tH0zSTrwWY9yOv_6MyHj1InSLgP8sPkYpCE-1752503921-1.0.1.1-9dO3TkxJuOunvstN9Uw6Rgv7u3CWrKmmeFspOlfffq4WQUx5Gcw8ywvyg1IH_TwRLEd_0AlxdHKWhG2iuTqK0QKYa57c_7oeEslN7lRkX11CnHT36EWx7dz9WpWWqSjr; path=/; expires=Mon, 14-Jul-25 15:08:41 GMT; domain=.digitalocean.com; HttpOnly; Secure; SameSite=None vary: Accept-Encoding server: cloudflare cf-ray: 95f1c3e27c9d20c5-BLR content-encoding: gzip Using DigitalOcean’s Serverless Inference (and Agents) with the OpenAI SDK | DigitalOcean

Blog
Docs
Get Support
Contact Sales

DigitalOcean

- Featured Products
See all products
See all solutions
- Our Community
- DigitalOcean Partner Programs
Pricing

- Community
- DigitalOcean
- Community
- DigitalOcean

Blog
Docs
Get Support
Contact Sales

- Community
- DigitalOcean
- Community
- DigitalOcean

Tutorials
Questions
Product Docs
Cloud Chats

Report this

What is the reason for this report?

This undefined is spam

This undefined is offensive

This undefined is off-topic

This undefined is other

Why Use DigitalOcean Inference
How it Works
Install the OpenAI SDK
Example 1 List Available Models
Example 2 Run a Simple Chat Completion
Example 3 Stream a Response
What about Agents
Example 4 Use an Agent No Stream
Example 5 Use an Agent With Streaming and Retrieval
Conclusion
Further Resources

Tutorials
AI/ML
Using DigitalOcean’s Serverless Inference (and Agents) with the OpenAI SDK

Tutorial

Using DigitalOcean’s Serverless Inference (and Agents) with the OpenAI SDK

Published on July 10, 2025

AI/ML

By Amit Jotwani

Using DigitalOcean’s Serverless Inference (and Agents) with the OpenAI SDK

DigitalOcean recently launched Serverless Inference - a hosted API that lets you run large language models like Claude, GPT-4o, LLaMA 3, and Mistral, without managing infrastructure.

It’s built to be simple: no GPU provisioning, no server configuration, no scaling logic to manage. You just send a request to an endpoint.

The API works with models from Anthropic, OpenAI, Meta, Mistral, Deep Seek, and others - all available behind a single base URL.

The best part is that it’s API-compatible with OpenAI.

That means if you’re already using tools like the OpenAI Python SDK, LangChain, or LlamaIndex, or any of the supported models, they’ll just work. You can swap the backend without rewriting your application.

This same approach works with DigitalOcean Agents too — models that are connected to your own documents or knowledge base.

Gradient AI Serverless Inference

Why Use DigitalOcean Inference?

There are a few reasons this approach stands out—especially if you’re already building on DigitalOcean or want more flexibility in how you use large language models.

One key for everything: You don’t have to manage separate keys for OpenAI, Anthropic, Mistral, etc. One DigitalOcean API key gives you access to every model—Claude, GPT-4o, LLaMA 3, Mistral, and more.
Lower costs: For many workloads, this is significantly more cost-effective than going direct to the provider.
Unified billing: All your usage shows up in one place. Everything shows up in your DO bill, alongside your apps, spaces, databases, whatever else you’re running on DO. No surprise charges across different platforms.
Privacy for open models: If you’re using open-weight models (like LLaMA or Mistral), your prompts and completions aren’t logged or used for training.
*No SDK changes: The OpenAI SDK just works. You can also use libraries like LangChain or LlamaIndex the same way—just change the base URL.

How it Works

Under the hood, everything runs through the OpenAI method:

client.chat.completions.create()

The only thing you need to change is the base_url when you initialize the client

Serverless Inference (for general models): https://inference.do-ai.run/v1/
Agents (for context-aware inferencing over your own data): https://.agents.do-ai.run/api/v1/

That’s it - let’s walk through how this looks in practice.

Install the OpenAI SDK

Run the following command to install the OpenAI SDK to your machine:

pip install openai python-dotenv

Example 1: List Available Models

This lists all available models from DigitalOcean Inference. Same code you’d use with OpenAI—you’re just pointing it at a DigitalOcean endpoint (base_url).

from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(
    base_url="https://inference.do-ai.run/v1/",  # DO's Inference endpoint
    api_key=os.getenv("DIGITAL_OCEAN_MODEL_ACCESS_KEY")
)
# List all available models
try:
    models = client.models.list()
    print("Available models:")
    for model in models.data:
        print(f"- {model.id}")
except Exception as e:
    print(f"Error listing models: {e}")

Example 2: Run a Simple Chat Completion

We’re using the same .chat.completions.create() method. Only the base_url is different.

from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(
    base_url="https://inference.do-ai.run/v1/",  # DO's Inference endpoint
    api_key=os.getenv("DIGITAL_OCEAN_MODEL_ACCESS_KEY")
)
# Run a simple chat completion
try:
    response = client.chat.completions.create(
        model="llama3-8b-instruct",  # Swap in any supported model
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Tell me a fun fact about octopuses."}
        ]
    )
    print(response.choices[0].message.content)
except Exception as e:
    print(f"Error during completion: {e}")

Example 3: Stream a Response

Want a response that streams back token by token? Just add stream=True, and loop through the chunks.

from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI(
    base_url="https://inference.do-ai.run/v1/",  # DO's Inference endpoint
    api_key=os.getenv("DIGITAL_OCEAN_MODEL_ACCESS_KEY")
)
# Run a simple chat completion with streaming
try:
    stream = client.chat.completions.create(
        model="llama3-8b-instruct",  # Swap in any supported model
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Tell me a fun fact about octopuses."}
        ],
        stream=True
    )
    for event in stream:
        if event.choices[0].delta.content is not None:
            print(event.choices[0].delta.content, end='', flush=True)
    print()  # Add a newline at the end
except Exception as e:
    print(f"Error during completion: {e}")

What about Agents?

DigitalOcean also offers Agents - these are LLMs paired with a custom knowledge base. You can upload docs, add URLs, include structured content, connect a DigitalOcean Spaces bucket or even an Amazon S3 bucket as a data source. The agent will then respond with that context in mind. It’s great for internal tools, documentation bots, or domain-specific assistants.

You still use .chat.completions.create() — the only difference is the base_url. But now your responses are grounded in your own data.

Note: With Inference, the base URL is fixed. With Agents, it’s unique to your agent, and you append /api/v1.

python
client = OpenAI(
    base_url="https://your-agent-id.agents.do-ai.run/api/v1/",
    api_key=os.getenv("DIGITAL_OCEAN_MODEL_ACCESS_KEY")
)

Agent Status and Endpoint Access

Example 4: Use an Agent (No Stream)

This is a standard Agent request using .chat.completions.create() - same method as before.

The only real change is the base_url, which points to your Agent’s unique endpoint (plus /api/v1). With Inference, the base URL is fixed. With Agents, it’s unique to your agent, and you just append /api/v1 to it.

Here we’ve also added include_retrieval_info=True to the body. This tells the API to return extra metadata about what the Agent pulled from your knowledge base to generate its response.

from openai import OpenAI
from dotenv import load_dotenv
import os
import json
load_dotenv()
try:
    # Create a new client with the agents endpoint
    agents_client = OpenAI(
        base_url="https://rrp247s4dgv4xoexd2sk62yq.agents.do-ai.run/api/v1/",
        api_key=os.getenv("AJOT_AGENT_KEY")
    )
    
        
    # Try a simple text request with the agents endpoint
    stream = agents_client.chat.completions.create(
        model="openai-gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": "Hello! WHo is Amit?"
        }],
        extra_body = {"include_retrieval_info": True}
    )
    
    print(f"\nAgents endpoint response: {agents_response.choices[0].message.content}")
    for choice in response.choices:
         print(choice.message.content)
    
     response_dict = response.to_dict()
     print("\nFull retrieval object:")
     print(json.dumps(response_dict["retrieval"], indent=2))
    
except Exception as e:
    print(f"Error with agents endpoint: {e}")

Example 5: Use an Agent (With Streaming and Retrieval)

The only change here is that we’ve enabled stream=True to get the response back as it generates. Everything else is the same.

from openai import OpenAI
from dotenv import load_dotenv
import os
import json
load_dotenv()
try:
    # Create a new client with the agents endpoint
    agents_client = OpenAI(
        base_url="https://rrp247s4dgv4xoexd2sk62yq.agents.do-ai.run/api/v1/",
        api_key=os.getenv("AJOT_AGENT_KEY")
    )
    
        
    # Try a simple text request with the agents endpoint
    stream = agents_client.chat.completions.create(
        model="openai-gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": "Hello! WHo is Amit?"
        }],
        extra_body = {"include_retrieval_info": True},
        stream=True,
    )
    
    for event in stream:
        if event.choices[0].delta.content is not None:
            print(event.choices[0].delta.content, end='', flush=True)
    print()  # Add a newline at the end
    
except Exception as e:
    print(f"Error with agents endpoint: {e}")

Conclusion

To recap:

You can use the OpenAI SDK directly with DigitalOcean’s Serverless Inference and Agents.
The only thing you need to change is the base_url—everything else stays the same.
Use https://inference.do-ai.run/v1/ for general-purpose models like LLaMA 3, GPT-4o, Claude, etc.\
Use your Agent’s URL (plus /api/v1) to connect to your own docs or knowledge base.
Everything runs through .chat.completions.create()—no new methods to learn.
You can enable streaming with stream=True, and get retrieval info with include_retrieval_info=True.

This makes it easy to test multiple models, switch backends, or ground a model in your own content—all without changing your existing code.

Further Resources

What is Serverless Inference?: A simple overview of how Serverless Inference works and where it fits.
How to Use Serverless Inference: Documentation for Serverless Inference
List of Supported Models: Full breakdown of all models available (OpenAI, Claude, LLaMA, Mistral, etc.).
Create and Configure a DigitalOcean Agent: Walkthrough for building agents that respond using your own knowledge base.
Inference API Reference: Schema, endpoints, and everything under the hood.
OpenAI Python SDK: The official SDK this entire guide is built on.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

Amit Jotwani

Author

See author profile

Category:

Tutorial

Tags:

AI/ML

Still looking for an answer?

Ask a question Search for more help

Was this helpful?

﻿

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.

Deploy on DigitalOcean

Click below to sign up for DigitalOcean's virtual machines, Databases, and AIML products.

Featured tutorials

SOLID Design Principles Explained: Building Better Software Architecture
How To Remove Docker Images, Containers, and Volumes
How to Create a MySQL User and Grant Privileges (Step-by-Step)

All tutorials
All topic tags

Join the Tech Talk

Success! Thank you! Please check your email for further details.

Please complete your information!

Table of contents

Why Use DigitalOcean Inference?
How it Works
Install the OpenAI SDK
Example 1: List Available Models
Example 2: Run a Simple Chat Completion
Example 3: Stream a Response
What about Agents?
Example 4: Use an Agent (No Stream)
Example 5: Use an Agent (With Streaming and Retrieval)
Conclusion
Further Resources

Deploy on DigitalOcean
Click below to sign up for DigitalOcean's virtual machines, Databases, and AIML products.
Sign up
Popular Topics
Featured tutorials
- All tutorials
- All topic tags