Turbocharging ComfyUI: 90% Faster Workflows on Runpod Serverless

Summary

In this article, we share how we built a robust, high-performance codebase on top of the open-source comfy_runner project by PiyushK52, which automates ComfyUI workflow execution. Leveraging Runpod’s Serverless GPUs platform with a custom RP handler, we introduced flexible inputs for custom node/model URLs, media support, and optimization techniques, namely, parallelized downloads/installs via Python’s concurrency tools and a persistent cache using Runpod volumes. The result was transformative: workflows that once took 100+ minutes now complete in just 3–10 minutes, delivering up to a 90% reduction in execution time.

Introduction

ComfyUI is an environment for building and running generative content workflows, where each workflow is a graph of nodes performing tasks like loading models or applying samplers. While powerful, manually setting up and running these workflows at scale can be time-consuming and error-prone.

To address this, comfy_runner was created to automate the downloading of nodes and models and execute provided workflows seamlessly. It even manages a dataset of some 2,000 popular models checkpoints, LoRAs, embeddings, and more through ComfyUI Manager, making backend integration frictionless.

Building on Comfy_runner

We began by forking the comfy_runner codebase, which leverages scripts like main.py and inf.py to automatically install ComfyUI nodes and models as needed. This foundation handled:

Dataset management: A curated list of frequently used models and nodes.
Automated installation: Seamless integration with ComfyUI Manager for node/model setup.
Workflow execution: Running ComfyUI graphs programmatically via Python.

This saved us countless hours of boilerplate work, allowing us to focus on higher-level optimizations rather than reinventing core functionality.

Extending with a Runpod Serverless RP Handler

To scale efficiently, we integrated our codebase with Runpod Serverless, a platform offering pay-as-you-go compute for AI workloads without server management. Key advantages include:

Always-on, pre-warmed GPU instances for low-latency execution
Persistent storage options, enabling volumes that survive across invocations
Full control over runtimes and dependencies via custom worker handlers

We wrote a custom RP handler that exposes a Serverless endpoint accepting JSON inputs for workflows, node/model URLs, and media payloads (images, videos, etc.), leveraging Runpod’s Python SDK for /run and /status operations.

Unlocking Input Flexibility

Traditional comfy_runner pipelines expect local files or fixed model lists. To broaden use cases, we added support for:

Custom node/model URLs: Users can supply links to any Civitai or GitHub assets.
Workflow URLs: Dynamically fetch and run arbitrary workflow definitions.
Media inputs: Pass base64-encoded images, URLs, or multipart uploads for rich media pipelines.

This flexibility empowers creators to experiment without manual pre-packaging, fostering on-the-fly workflow customization.

Performance Optimization: Parallel Downloads and Caching

Initial benchmarks revealed that downloading and installing models serially could stretch runtimes beyond 100 minutes for complex workflows. To tackle this,we:

Parallelized downloads and installations using Python’s concurrent.futures module with ThreadPoolExecutor and ProcessPoolExecutor, reducing I/O wait times significantly.
Chunked file downloads via the requests library and ThreadPoolExecutor for concurrent HTTP fetches, further cutting retrieval times.
Persistent caching on a Runpod volume: frequently downloaded models are stored in a shared, mounted volume so subsequent runs skip re-downloads entirely, leveraging the persistent storage feature of Runpod Serverless.

These optimizations transformed the performance profile, driving workflows from 100+ minutes down to a consistent 3–10 minutes.

Additional Flags and Workflow Persistence

To further streamline operations, we introduced workflow-level flags like:

persist-models: Keeps installed models on the volume after the run completes.

These toggles grant users fine-grained control over execution, from full cold starts to lightning-fast warm runs.

Impact and Results

Execution time dropped from 100+ minutes to 3–10 minutes, a ~90% improvement.
User experience elevated: creators can iterate workflows interactively without lengthy waits.
Cost efficiency realized: shorter runtimes translate directly to lower cloud compute bills.

This performance uplift has unlocked new possibilities for real-time experimentation and rapid prototyping with ComfyUI workflows.

Gratitude to the Open Source Community

We want to express our deep appreciation for the open source community countless contributors to ComfyUI, comfy_runner, and Runpod tools made this work possible. Rather than reinventing core capabilities, we stood on the shoulders of giants to build something greater.

Closing Thoughts

Building atop existing open source projects allowed us to deliver a specialized, high-performance solution in record time. If you’re tackling large-scale AI workflows, consider:

Leveraging proven foundations like comfy_runner.
Exploiting serverless GPU platforms for elastic, pay-per-use compute.
Automating parallelization and caching to minimize idle time.

By combining these strategies, you too can transform protracted workflows into streamlined, cost-effective pipelines without reinventing the wheel. Or just contact us to build stuff for you.

Kubernetes for AI Startups: What You Need to Know Before You Scale ›