CARVIEW |
Select Language
HTTP/2 307
age: 222556
cache-control: public, max-age=0, must-revalidate
content-security-policy: worker-src * blob: data: 'unsafe-eval' 'unsafe-inline'; object-src data: ; base-uri 'self'; form-action 'self' https://codesandbox.io; upgrade-insecure-requests; frame-ancestors 'none';
content-type: text/html; charset=utf-8
date: Sat, 26 Jul 2025 01:57:12 GMT
etag: "l90syjm5bi24rj"
location: /api-reference/partition/sdk-python
server: Vercel
strict-transport-security: max-age=63072000
vary: RSC, Next-Router-State-Tree, Next-Router-Prefetch, Next-Router-Segment-Prefetch
x-frame-options: DENY
x-matched-path: /_sites/[subdomain]/[[...slug]]
x-nextjs-prerender: 1
x-nextjs-stale-time: 60
x-powered-by: Next.js
x-vercel-cache: HIT
x-vercel-id: bom1::iad1::th8mv-1753717588559-3cc6dda0ca7d
content-length: 99495
HTTP/2 200
age: 224794
cache-control: public, max-age=0, must-revalidate
content-encoding: gzip
content-security-policy: worker-src * blob: data: 'unsafe-eval' 'unsafe-inline'; object-src data: ; base-uri 'self'; form-action 'self' https://codesandbox.io; upgrade-insecure-requests; frame-ancestors 'none';
content-type: text/html; charset=utf-8
date: Sat, 26 Jul 2025 01:19:54 GMT
etag: W/"rah53727zhej9c"
server: Vercel
strict-transport-security: max-age=63072000
vary: RSC, Next-Router-State-Tree, Next-Router-Prefetch, Next-Router-Segment-Prefetch
x-frame-options: DENY
x-matched-path: /_sites/[subdomain]/[[...slug]]
x-nextjs-prerender: 1
x-nextjs-stale-time: 60
x-powered-by: Next.js
x-vercel-cache: HIT
x-vercel-id: bom1::iad1::js4gf-1753717588945-7391168a9be8
Python SDK - Unstructured Contact Us Community Feedback Blog Product Trust Portal Meta-Prompt (Titles) Meta-Prompt (Full)
Unstructured API
Partition Endpoint
- Overview
- POST request
- Python SDK
- JavaScript/TypeScript SDK
- Endpoint parameters
- Endpoint validation errors
- Examples
- Document elements and metadata
- Partitioning strategies
- Chunking strategies
- Speed up processing of large files and batches
- Get element contents
- Extract tables as HTML
- Extract images and tables from documents
- Get chunked elements
- Transform a JSON file into a different schema
- Generate a JSON schema for a file
- Endpoint Playground
Troubleshooting
Partition Endpoint
Python SDK
The Unstructured Python SDK client allows you to send one file at a time for processing by
the Unstructured Partition Endpoint.
To use the Python SDK, you’ll first need to set an environment variable named
If you see the error:
If you see the error:
UNSTRUCTURED_API_KEY
,
representing your Unstructured API key. To get your API key, do the following:
-
Sign in to your Unstructured account:
- If you do not already have an Unstructured account, go to https://unstructured.io/contact and fill out the online form to indicate your interest.
- If you already have an Unstructured account, sign in by using the URL of the sign in page that Unstructured provided to you when your Unstructured account was created. After you sign in, the Unstructured user interface (UI) then appears, and you can start using it right away. If you do not have this URL, contact Unstructured Sales at sales@unstructured.io.
-
Get your Unstructured API key:
a. In the Unstructured UI, click API Keys on the sidebar.
b. Click Generate API Key.
c. Follow the on-screen instructions to finish generating the key.
d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.
Installation
Before using the SDK to interact with Unstructured, install the library:Python
Copy
Ask AI
pip install unstructured-client
The SDK uses semantic versioning and major bumps could bring breaking changes. It is advised to
pin your installed version. See the migration guide, later on this page, for breaking change announcements.
Basics
Let’s start with a simple example in which you send a PDF document to the Unstructured Partition Endpoint to be partitioned by Unstructured.Python
Copy
Ask AI
import os, json
import unstructured_client
from unstructured_client.models import operations, shared
client = unstructured_client.UnstructuredClient(
api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
)
filename = "PATH_TO_INPUT_FILE"
req = operations.PartitionRequest(
partition_parameters=shared.PartitionParameters(
files=shared.Files(
content=open(filename, "rb"),
file_name=filename,
),
strategy=shared.Strategy.VLM,
vlm_model="gpt-4o",
vlm_model_provider="openai",
languages=['eng'],
split_pdf_page=True, # If True, splits the PDF file into smaller chunks of pages.
split_pdf_allow_failed=True, # If True, the partitioning continues even if some pages fail.
split_pdf_concurrency_level=15 # Set the number of concurrent request to the maximum value: 15.
),
)
try:
res = client.general.partition(
request=req
)
element_dicts = [element for element in res.elements]
# Print the processed data's first element only.
print(element_dicts[0])
# Write the processed data to a local file.
json_elements = json.dumps(element_dicts, indent=2)
with open("PATH_TO_OUTPUT_FILE", "w") as file:
file.write(json_elements)
except Exception as e:
print(e)
Async partitioning
The Python SDK also has apartition_async
. This call is equivalent to partition
except that it can be used in a non blocking context. For instance, asyncio.gather
can be used to concurrently process multiple files inside of a directory hierarchy, as demonstrated here:
Copy
Ask AI
import asyncio
import os
import json
import unstructured_client
from unstructured_client.models import shared
client = unstructured_client.UnstructuredClient(
api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
)
async def call_api(filename, input_dir, output_dir):
req = {
"partition_parameters": {
"files": {
"content": open(filename, "rb"),
"file_name": os.path.basename(filename),
},
"strategy": shared.Strategy.VLM,
"vlm_model": "gpt-4o",
"vlm_model_provider": "openai",
"languages": ['eng'],
"split_pdf_page", True, # If True, splits the PDF file into smaller chunks of pages.
"split_pdf_allow_failed": True, # If True, the partitioning continues even if some pages fail.
"split_pdf_concurrency_level": 15 # Set the number of concurrent request to the maximum value: 15.
}
}
try:
res = await client.general.partition_async(
request=req
)
element_dicts = [element for element in res.elements]
json_elements = json.dumps(element_dicts, indent=2)
# Create the output directory structure.
relative_path = os.path.relpath(os.path.dirname(filename), input_dir)
output_subdir = os.path.join(output_dir, relative_path)
os.makedirs(output_subdir, exist_ok=True)
# Write the output file.
output_filename = os.path.join(output_subdir, os.path.basename(filename) + ".json")
with open(output_filename, "w") as file:
file.write(json_elements)
except Exception as e:
print(f"Error processing {filename}: {e}")
async def process_files(input_directory, output_directory):
tasks = []
for root, _, files in os.walk(input_directory):
for file in files:
if not file.endswith('.json'):
full_path = os.path.join(root, file)
tasks.append(call_api(full_path, input_directory, output_directory))
await asyncio.gather(*tasks)
if __name__ == "__main__":
asyncio.run(process_files(
input_directory=os.getenv("LOCAL_FILE_INPUT_DIR"),
output_directory=os.getenv("LOCAL_FILE_OUTPUT_DIR")
))
Page splitting
In order to speed up processing of large PDF files, thesplit_pdf_page
* parameter is True
by default. This
causes the PDF to be split into small batches of pages before sending requests to the API. The client
awaits all parallel requests and combines the responses into a single response object. This is specific to PDF files and other
filetypes are ignored.
The number of parallel requests is controlled by split_pdf_concurrency_level
*.
The default is 8 and the max is set to 15 to avoid high resource usage and costs.
If at least one request is successful, the responses are combined into a single response object. An
error is returned only if all requests failed or there was an error during splitting.
This feature may lead to unexpected results when chunking because the server does not see the entire
document context at once. If you’d like to chunk across the whole document and still get the speedup from
parallel processing, you can:
- Partition the PDF with
split_pdf_page
set toTrue
, without any chunking parameters. - Store the returned elements in
results.json
. - Partition this JSON file with the desired chunking parameters.
Python
Copy
Ask AI
req = operations.PartitionRequest(
partition_parameters=shared.PartitionParameters(
files=shared.Files(
content=file.read(),
file_name=filename,
),
strategy=shared.Strategy.VLM,
vlm_model="gpt-4o",
vlm_model_provider="openai",
split_pdf_page=True, # If True, splits the PDF file into smaller chunks of pages.
split_pdf_allow_failed=True, # If True, the partitioning continues even if some pages fail.
split_pdf_concurrency_level=15 # Set the number of concurrent request to the maximum value: 15.
)
)
res = client.general.partition(
request=req
)
Customizing the client
Retries
You can also change the defaults for retries through theretry_config
*
when initializing the client. If a request to the API fails, the client will retry the
request with an exponential backoff strategy up to a maximum interval of one minute. The
function keeps retrying until the total elapsed time exceeds max_elapsed_time
*,
which defaults to one hour:
Python
Copy
Ask AI
import os
client = UnstructuredClient(
api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")
retry_config=RetryConfig(
strategy="backoff",
retry_connection_errors=True,
backoff=BackoffStrategy(
# time intervals are defined in milliseconds
initial_interval=500,
max_interval=60000,
exponent=1.5,
max_elapsed_time=900000, # 15min*60sec*1000ms = 15 minutes
),
)
)
Disabling SSL validation
If you disable SSL validation, requests will accept any TLS certificate presented by the server and ignore hostname mismatches and/or expired certificates, which will make your application vulnerable to man-in-the-middle (MitM) attacks. Only set this toFalse
for testing.
Python
Copy
Ask AI
http_client = requests.Session()
http_client.verify = False
client = UnstructuredClient(
client=http_client,
...
)
Handling the response
The partition response defaults to a dict format that can be converted to Unstructured elements with theelements_from_dicts
utility function as seen below. Otherwise, the API response can be sent directly
to your vector store or another destination.
Python
Copy
Ask AI
from unstructured.staging.base import elements_from_dicts
# ...
if res.elements is not None:
elements = elements_from_dicts(response.elements)
Parameters & examples
The parameter names used in this document are for the Python SDK, which follow snake_case convention. The JavaScript/TypeScript SDK follows camelCase convention. Other than this difference in naming convention, the names used in the SDKs are the same across all methods.- Refer to the API parameters page for the full list of available parameters.
- Refer to the Examples page for some inspiration on using the parameters.
Migration guide
There are breaking changes beginning with Python SDK version 0.26.0. If you encounter any errors when upgrading, please find the solution below. If you see the error:AttributeError: 'PartitionParameters' object has no attribute 'partition_parameters'
Before 0.26.0, the SDK accepted a PartitionParameters
object as input to the sdk.general.partition
function. Beginning with 0.26.0, this object must be wrapped in a PartitionRequest
object. The old behavior was deprecated in 0.23.0 and removed in 0.26.0.
Copy
Ask AI
# Instead of:
from unstructured_client.models import shared
req = shared.PartitionParameters(
files=files,
)
resp = s.general.partition(
request=req
)
# Switch to:
from unstructured_client.models import shared, operations
req = operations.PartitionRequest(
partition_parameters=shared.PartitionParameters(
files=files,
)
)
resp = s.general.partition(
request=req
)
TypeError: BaseModel.__init__() takes 1 positional argument but 2 were given
Beginning with 0.26.0, the PartitionRequest
constructor no longer allows for positional arguments. You must specify partition_parameters
by name.
Copy
Ask AI
# Instead of:
req = operations.PartitionRequest(
shared.PartitionParameters(
files=files,
)
)
# Switch to:
req = operations.PartitionRequest(
partition_parameters=shared.PartitionParameters(
files=files,
)
)
TypeError: General.partition() takes 1 positional argument but 2 were given
Beginning with 0.26.0, the partition
function no longer allows for positional arguments. You must specify request
by name.
Copy
Ask AI
# Instead of:
resp = s.general.partition(req)
# Switch to:
resp = s.general.partition(
request=req
)
Was this page helpful?
Assistant
Responses are generated using AI and may contain mistakes.