Running Models with Replicate
Replicate enables users to run models via a straightforward API call. You can use various programming languages, including Python, JavaScript, and cURL.
Python Example:
import replicate
output = replicate.run(
"stability-ai/stable-diffusion-3:527d2a6296facb8e47ba1eaf17f142c240c19a30894f437feee9b91cc29d8e4f",
input={
"prompt": "a photo of vibrant artistic graffiti on a wall saying \"SD3 medium\""
}
)
print(output)
JavaScript Example:
import Replicate from "replicate";
const replicate = new Replicate();
const output = await replicate.run(
"stability-ai/stable-diffusion-3:527d2a6296facb8e47ba1eaf17f142c240c19a30894f437feee9b91cc29d8e4f",
{
input: {
prompt: "a photo of vibrant artistic graffiti on a wall saying \"SD3 medium\""
}
}
);
console.log(output);
cURL Example:
curl -s -X POST \
-H "Authorization: Token $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-d $'{
"version": "527d2a6296facb8e47ba1eaf17f142c240c19a30894f437feee9b91cc29d8e4f",
"input": {
"prompt": "a photo of vibrant artistic graffiti on a wall saying \"SD3 medium\""
}
}' \
https://api.replicate.com/v1/predictions
Fine-Tuning Models
Replicate supports fine-tuning of models to enhance their performance for specific tasks. Users can upload training data and initiate the fine-tuning process through the Replicate API.
import replicate
training = replicate.trainings.create(
version="stability-ai/sdxl:c221b2b8ef527988fb59bf24a8b97c4561f1c671f73bd389f866bfb27c061316",
input={
"input_images": "https://my-domain/my-input-images.zip",
},
destination="mattrothenberg/sdxl-fine-tuned"
)
print(training)
Deploying Custom Models
Replicate offers deployment of custom models via Cog, an open-source tool. Cog packages models into deployable API servers. Users must define the model environment in cog.yaml
and specify prediction logic in predict.py
.
Example cog.yaml Configuration:
build:
gpu: true
system_packages:
- "libgl1-mesa-glx"
- "libglib2.0-0"
python_version: "3.10"
python_packages:
- "torch==1.13.1"
predict: "predict.py:Predictor"
Example predict.py Configuration:
from cog import BasePredictor, Input, Path
import torch
class Predictor(BasePredictor):
def setup(self):
"""Load the model into memory to make running multiple predictions efficient"""
self.model = torch.load("./weights.pth")
def predict(self, image: Path = Input(description="Grayscale input image")) -> Path:
"""Run a single prediction on the model"""
processed_image = preprocess(image)
output = self.model(processed_image)
return postprocess(output)
Scaling and Pricing
Replicate handles scaling automatically based on traffic. The pricing model is pay-as-you-go, charged per second of usage.
Real-Time Applications
Replicate supports real-time applications. Models such as riffusion/riffusion
facilitate real-time music generation. The API is designed to manage various use cases, including high-throughput and real-time predictions.