Replicate AI FAQs

Running Models with Replicate

Replicate enables users to run models via a straightforward API call. You can use various programming languages, including Python, JavaScript, and cURL.

  • Python Example:

    import replicate
    
    output = replicate.run(
      "stability-ai/stable-diffusion-3:527d2a6296facb8e47ba1eaf17f142c240c19a30894f437feee9b91cc29d8e4f",
      input={
        "prompt": "a photo of vibrant artistic graffiti on a wall saying \"SD3 medium\""
      }
    )
    print(output)
    
  • JavaScript Example:

    import Replicate from "replicate";
    const replicate = new Replicate();
    
    const output = await replicate.run(
      "stability-ai/stable-diffusion-3:527d2a6296facb8e47ba1eaf17f142c240c19a30894f437feee9b91cc29d8e4f",
      {
        input: {
          prompt: "a photo of vibrant artistic graffiti on a wall saying \"SD3 medium\""
        }
      }
    );
    console.log(output);
    
  • cURL Example:

    curl -s -X POST \
      -H "Authorization: Token $REPLICATE_API_TOKEN" \
      -H "Content-Type: application/json" \
      -d $'{
        "version": "527d2a6296facb8e47ba1eaf17f142c240c19a30894f437feee9b91cc29d8e4f",
        "input": {
          "prompt": "a photo of vibrant artistic graffiti on a wall saying \"SD3 medium\""
        }
      }' \
      https://api.replicate.com/v1/predictions
    

Fine-Tuning Models

Replicate supports fine-tuning of models to enhance their performance for specific tasks. Users can upload training data and initiate the fine-tuning process through the Replicate API.

  • Training Example:
    import replicate
    
    training = replicate.trainings.create(
        version="stability-ai/sdxl:c221b2b8ef527988fb59bf24a8b97c4561f1c671f73bd389f866bfb27c061316",
        input={
            "input_images": "https://my-domain/my-input-images.zip",
        },
        destination="mattrothenberg/sdxl-fine-tuned"
    )
    print(training)
    

Deploying Custom Models

Replicate offers deployment of custom models via Cog, an open-source tool. Cog packages models into deployable API servers. Users must define the model environment in cog.yaml and specify prediction logic in predict.py.

  • Example cog.yaml Configuration:

    build:
      gpu: true
      system_packages:
        - "libgl1-mesa-glx"
        - "libglib2.0-0"
      python_version: "3.10"
      python_packages:
        - "torch==1.13.1"
    predict: "predict.py:Predictor"
    
  • Example predict.py Configuration:

    from cog import BasePredictor, Input, Path
    import torch
    
    class Predictor(BasePredictor):
        def setup(self):
            """Load the model into memory to make running multiple predictions efficient"""
            self.model = torch.load("./weights.pth")
    
        def predict(self, image: Path = Input(description="Grayscale input image")) -> Path:
            """Run a single prediction on the model"""
            processed_image = preprocess(image)
            output = self.model(processed_image)
            return postprocess(output)
    

Scaling and Pricing

Replicate handles scaling automatically based on traffic. The pricing model is pay-as-you-go, charged per second of usage.

  • Pricing Tiers:
    • CPU: $0.000100/sec
    • Nvidia T4 GPU: $0.000225/sec
    • Nvidia A40 GPU: $0.000575/sec
    • Nvidia A40 (Large) GPU: $0.000725/sec
    • Nvidia A100 (40GB) GPU: $0.001150/sec
    • Nvidia A100 (80GB) GPU: $0.001400/sec
    • 8x Nvidia A40 (Large) GPU: $0.005800/sec

Real-Time Applications

Replicate supports real-time applications. Models such as riffusion/riffusion facilitate real-time music generation. The API is designed to manage various use cases, including high-throughput and real-time predictions.