You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Blocking api, update the model name and model weight path in blocking_api.py and run.
python blocking_api.py
The server will start on localhost port 5000.
To generate text, send a POST request to the /api/v1/generate endpoint. The request body should be a JSON object with the following keys:
prompt: The input prompt (required).
min_length: The minimum length of the sequence to be generated (optional, default is 0).
max_length: The maximum length of the sequence to be generated (optional, default is 50).
top_p: The nucleus sampling probability (optional, default is 0.95).
temperature: The temperature for sampling (optional, default is 0.6). For example, you can use curl to send a request
curl -k -s -X POST https://localhost:5000/api/v1/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "Below is an instruction that describes a task. Write a response that appropriately completes the request\n### Instruction: write a for loop in typescript\n### Response:", "max_length": 1000, "temperature": 0.7}'
About
Host the GPTQ model using AutoGPTQ as an API that is compatible with text generation UI API.