Replicate AI FAQs

Running Models with Replicate

Replicate enables users to run models via a straightforward API call. You can use various programming languages, including Python, JavaScript, and cURL.

Python Example:

import replicate

output = replicate.run(
  "stability-ai/stable-diffusion-3:527d2a6296facb8e47ba1eaf17f142c240c19a30894f437feee9b91cc29d8e4f",
  input={
    "prompt": "a photo of vibrant artistic graffiti on a wall saying \"SD3 medium\""
  }
)
print(output)

JavaScript Example:

import Replicate from "replicate";
const replicate = new Replicate();

const output = await replicate.run(
  "stability-ai/stable-diffusion-3:527d2a6296facb8e47ba1eaf17f142c240c19a30894f437feee9b91cc29d8e4f",
  {
    input: {
      prompt: "a photo of vibrant artistic graffiti on a wall saying \"SD3 medium\""
    }
  }
);
console.log(output);

cURL Example:

curl -s -X POST \
  -H "Authorization: Token $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d $'{
    "version": "527d2a6296facb8e47ba1eaf17f142c240c19a30894f437feee9b91cc29d8e4f",
    "input": {
      "prompt": "a photo of vibrant artistic graffiti on a wall saying \"SD3 medium\""
    }
  }' \
  https://api.replicate.com/v1/predictions

Fine-Tuning Models

Replicate supports fine-tuning of models to enhance their performance for specific tasks. Users can upload training data and initiate the fine-tuning process through the Replicate API.

Training Example:

import replicate

training = replicate.trainings.create(
    version="stability-ai/sdxl:c221b2b8ef527988fb59bf24a8b97c4561f1c671f73bd389f866bfb27c061316",
    input={
        "input_images": "https://my-domain/my-input-images.zip",
    },
    destination="mattrothenberg/sdxl-fine-tuned"
)
print(training)

Deploying Custom Models

Replicate offers deployment of custom models via Cog, an open-source tool. Cog packages models into deployable API servers. Users must define the model environment in cog.yaml and specify prediction logic in predict.py.

Example cog.yaml Configuration:

build:
  gpu: true
  system_packages:
    - "libgl1-mesa-glx"
    - "libglib2.0-0"
  python_version: "3.10"
  python_packages:
    - "torch==1.13.1"
predict: "predict.py:Predictor"

Example predict.py Configuration:

from cog import BasePredictor, Input, Path
import torch

class Predictor(BasePredictor):
    def setup(self):
        """Load the model into memory to make running multiple predictions efficient"""
        self.model = torch.load("./weights.pth")

    def predict(self, image: Path = Input(description="Grayscale input image")) -> Path:
        """Run a single prediction on the model"""
        processed_image = preprocess(image)
        output = self.model(processed_image)
        return postprocess(output)

Scaling and Pricing

Replicate handles scaling automatically based on traffic. The pricing model is pay-as-you-go, charged per second of usage.

Pricing Tiers:
- CPU: $0.000100/sec
- Nvidia T4 GPU: $0.000225/sec
- Nvidia A40 GPU: $0.000575/sec
- Nvidia A40 (Large) GPU: $0.000725/sec
- Nvidia A100 (40GB) GPU: $0.001150/sec
- Nvidia A100 (80GB) GPU: $0.001400/sec
- 8x Nvidia A40 (Large) GPU: $0.005800/sec

Real-Time Applications

Replicate supports real-time applications. Models such as riffusion/riffusion facilitate real-time music generation. The API is designed to manage various use cases, including high-throughput and real-time predictions.