Guides

How to Scale a Puppeteer Screenshot API on Kubernetes

Need to scale a website screenshot service? Tour some key Kubernetes scalability features and their benefits for Puppeteer-powered screenshot APIs.

James Walker
James Walker
June 9, 2025

How to Scale a Puppeteer Screenshot API on Kubernetes

Puppeteer is a popular browser automation tool for remote controlling Chrome and Firefox instances. It lets you build website screenshot APIs by using code to launch a headless browser, navigate to a target page, and capture an image.

Puppeteer-based services can be challenging to scale because they're usually resource-intensive. The presence of a full browser install means they also have heavy dependencies. Kubernetes helps make screenshot APIs more scalable by simplifying key tasks such as resource management, network security, and auto-scaling based on actual service usage. It lets you run your Puppeteer screenshot solution as containers you can replicate across multiple physical compute nodes.

In this article, we're going to tour some key Kubernetes scalability features and explain their benefits for Puppeteer-powered screenshot APIs. We'll develop the container image from our How to Build a Docker Image for a Website Screenshot Service article to demonstrate a simple Kubernetes deployment in action.

Why Use Kubernetes for Puppeteer APIs?

Kubernetes is a container orchestration system. It automates the process of deploying, scaling, and managing containers within cloud infrastructure environments. It's an ideal fit for microservices architectures where all your components must be highly available, but also need to be scaled individually.

Kubernetes helps solve many of the issues you might encounter when running Puppeteer at scale. Key advantages include:

  • Easy Puppeteer operations as a microservice: The Kubernetes architecture lets you more easily run your Puppeteer API as a standalone microservice alongside your application code. This improves ease of management by allowing components to be scaled, changed, and deployed individually.
  • Scale up automatically, based on user activity: Kubernetes has built-in autoscaling capabilities. It can start extra replicas of your service as they're required, letting you seamlessly handle spikes in demand. This is particularly important for screenshot services where each request can take several seconds to fulfill.
  • Simple scheduled job management: Kubernetes includes its own cron job mechanism to run containers on a schedule. This lets you easily take website screenshots periodically or process a capture queue, for instance. Projects such as Kueue expand on this functionality.
  • Precisely set resource utilization constraints: Kubernetes has a sophisticated resource management model that lets you control exactly how much CPU and memory your Puppeteer containers can consume.
  • Maintain performance for concurrent captures: The Kubernetes networking layer includes automatic load-balancing to distribute capture requests between your service's replicas. This helps keep overall load low to ensure stable performance at scale.
  • Achieve high availability: Kubernetes makes it easy to deploy multiple replicas of your Puppeteer screenshot API, ensuring there's no service disruption if an instance or a compute node fails.

Now let's take a look at the main options available for scaling Puppeteer with Kubernetes.

Techniques for Scaling Puppeteer on Kubernetes

You can use Kubernetes to scale Puppeteer and your Chrome (or Firefox) instances in two key ways:

  1. One browser instance per Puppeteer container: Under this model, each Puppeteer container runs exactly one browser instance. Containers can still serve multiple requests concurrently by opening new browser pages.
  2. Separate pool of shared browser instances: With this model, the containers that run Puppeteer code don't host browser instances. Instead, Puppeteer's configured to connect to a browser instance running as a separate service in your cluster. This approach lets you scale your Puppeteer code independently of your browser pool.

We're focusing on the first approach in this guide. It's easy to configure and scales well for most use cases. By keeping your Puppeteer code as a tightly scoped microservice, you can minimize any overheads that could affect performance. But if your code is complex, or does more than just call Puppeteer, then you may benefit from decoupling your browser and Puppeteer containers using the second approach.

When scaling Puppeteer, it's also important to plan how you’ll manage browser state. Different browser instances operate independently of each other, so cookies, sessions, and local storage created by a browser running in one container won't be available if a second request lands with another container. This won't typically be an issue if your service simply takes a URL, then captures a screenshot as the public sees it. However, you'll need to enable Kubernetes sticky sessions for your service if users are allowed to login to websites before requesting a series of captures.

Guide: Scaling Puppeteer With a Kubernetes Deployment and Service

Let's look at a simple demo of how to start a scalable Puppeteer deployment in a Kubernetes cluster. These steps are no different to running any typical web service in Kubernetes, but we'll discuss some Puppeteer-specific best practices in the next section. This guide offers a starting point for new Kubernetes users to understand how to deploy a container image, whether it uses Puppeteer or not.

We're using Minikube and the sample Puppeteer app and container image available in this article's GitHub repository. You should have Minikube, Docker, and Git installed before you continue. Clone the GitHub repository to get started:

$ git clone https://github.com/jamesheronwalker/urlbox-puppeteer-kubernetes-demo.git

The app within the repository uses Node.js and Express to provide a simple HTTP API. Calling the /capture?url=<url> endpoint uses Puppeteer to generate a screenshot of the requested URL. The screenshot is provided in the API's HTTP response. You can learn more about the Puppeteer code and Dockerfile in our How to Build a Docker Image for a Website Screenshot Service article.

Use Docker Compose to build the project's container image:

$ docker compose build

This will build the image and tag it as urlbox-puppeteer-kubernetes-demo:latest on your machine.

Next, use the minikube image load command to make the image available to your Minikube cluster. Minikube can't automatically access the images on your host machine, so the image must be manually loaded unless you've already pushed it to a public container registry.

$ minikube image load urlbox-puppeteer-kubernetes-demo:latest

The image is now ready to use as urlbox-puppeteer-kubernetes-demo:latest in your Kubernetes deployments.

Prepare Your Kubernetes Manifests

To run Puppeteer in Kubernetes, you need to create two main resources:

  1. Deployment: The Deployment object manages a set of Pods to ensure a specified number of replicas is available.
  2. Service: Kubernetes Services route network traffic to Pods. They provide load balancing between the available replicas.

You can find Kubernetes YAML manifest files for the Deployment and Service objects within the k8s folder in the sample repository.

Here's what the two files look like.

1. Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: puppeteer
spec:
  replicas: 1
  selector:
    matchLabels:
      app: puppeteer
  template:
    metadata:
      labels:
        app: puppeteer
    spec:
      containers:
        - name: puppeteer
          image: urlbox-puppeteer-kubernetes-demo:latest
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 3000

This manifest specifies that a single Pod replica (replicas: 1) will be deployed initially. We’ll scale this up later on. The Pods run a container called puppeteer that uses the image we pushed into the cluster in the step above. Setting imagePullPolicy to IfNotPresent ensures Kubernetes uses this existing image, rather than trying to pull an update from a non-existing public source.

The Deployment manifest specifies a container port of 3000. This is the port that our Express API service listens on within the container.

2. Service

apiVersion: v1
kind: Service
metadata:
  name: puppeteer
spec:
  type: LoadBalancer
  selector:
    app: puppeteer
  ports:
    - port: 80
      targetPort: 3000

The Service object specifies how traffic reaches the Puppeteer Pods. In this example, we're creating a LoadBalancer service that can be reached outside the cluster for testing purposes, but other types of Service are available for different use cases.

The Service is configured to route traffic to Pods with an app: puppeteer label assigned. This label matches that assigned to the Puppeteer Pods by the template.metadata.labels field in our Deployment manifest above. The Service's port 80 is then configured to route traffic to port 3000 within the Pods.

Apply (Deploy) Your Kubernetes Manifests

Use the kubectl apply command to create the Deployment and Service in your Kubernetes cluster:

$ kubectl apply -f k8s

deployment.apps/puppeteer created

service/puppeteer created

You can then use kubectl get deployment puppeteer to check that the Pods created by the Deployment are ready:

$ kubectl get deployment puppeteer

NAME READY UP-TO-DATE AVAILABLE AGE

puppeteer 1/1 1 1 1m

Test Your Service

Now you can test the Puppeteer service running in your cluster! First, use minikube tunnel to open a network route to your cluster. This emulates having an external load balancer in front of your Kubernetes services. Run the command in a new terminal window, then keep the session open until you’re done testing.

$ minikube tunnel

Next, use the kubectl get services command to find the external IP address assigned to your puppeteer service:

$ kubectl get services

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 13m

puppeteer LoadBalancer 10.100.130.191 10.100.130.191 80:32106/TCP 1m59s

We can see the service has an external IP address of 10.100.130.191. Visiting http://10.100.130.191/capture?url=https://www.google.com in your browser should display a screenshot of the Google homepage, as captured by your Puppeteer API:

Image showing the successful capture in a browser

Scale Your Deployment

Finally, you can now scale your Kubernetes Deployment to start additional Puppeteer replicas. Try using the kubectl scale command to resize to three replicas:

$ kubectl scale deployment/puppeteer --replicas 3

deployment.apps/puppeteer scaled

Kubernetes will start new Pods to run your service, improving capacity and redundancy. Use the kubectl get deployments command to check the rollout's progress:

$ kubectl get deployments

NAME READY UP-TO-DATE AVAILABLE AGE

puppeteer 3/3 3 3 4m

You can then try requesting a new screenshot capture using the same URL as before. The Kubernetes Service model will ensure your requests are transparently load balanced between the three running Pods.

Best Practices and Strategies for Scaling Puppeteer on Kubernetes

The steps above are a simple guide to getting started operating scalable services with Kubernetes. But scalability is much more than manually adjusting replica counts, particularly when it comes to resource-intensive processes like capturing website screenshots with Puppeteer.

The following best practices and advanced strategies ensure performant, secure, and maintainable operations for Puppeteer and Kubernetes at scale.

1. Use a GitOps-Powered Deployment Model

Having developers run kubectl apply and kubectl scale commands isn't scalable. It’s more efficient to use tools such as Argo CD and Flux CD to manage your deployments. These tools implement GitOps strategies to automatically update your Kubernetes deployments after you commit changes to your manifest files. Packaging your manifests as a Helm chart also makes it easier to reuse them across different environments.

2. Consider Using a Capture Job Queue Instead of Load Balancing

The demo solution shown above scales Puppeteer capacity by load balancing screenshot capture requests across multiple replicas of your app. This isn't always the best way to scale: each replica in our example runs its own browser instance, causing increased resource consumption and higher cluster costs.

Modelling your system as a job queue can help make performance more consistent at scale. Instead of having each request capture a screenshot synchronously, add new captures to a queue that your Puppeteer code can pull new jobs from. Kueue is a popular Kubernetes-native job queuing system that lets you precisely control when jobs start and stop.

3. Run Chrome Independently of Your Puppeteer App

We discussed this above, but it's worth mentioning again: large-scale systems may be easier to maintain if you host Chrome containers independently of your Puppeteer code. You can then use Kubernetes to separately scale your browser pool and Puppeteer instances.

Your browser pool should sit behind a Kubernetes service that your Puppeteer code can connect to using connect(). The Service must have sticky sessions enabled for this to work (set spec.sessionAffinity to ClientIP in the Service's manifest).

4. Use Kubernetes Network Policies to Restrict Puppeteer API Access

Kubernetes Pods can exchange network traffic with any other Pod in your cluster by default. This is a potential security risk—if one Pod is compromised, then it could start sending requests to your Puppeteer service. If you're running your browser instances in separate Pods, then the risk is even higher: a compromised Pod could let attackers directly open a Puppeteer connection to the browser.

To mitigate this risk, you must correctly configure Network Policies for each Pod you deploy. A Network Policy is a Kubernetes object that specifies which Pods are allowed to communicate with a target Pod. The following basic example restricts Pods labelled app-component=chrome so that they can only be reached by other Pods labelled app-component=puppeteer.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: demo-policy
spec:
  podSelector:
    matchLabels:
      app-component: chrome
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app-component: puppeteer
  egress:
    - to:
        - podSelector:
            matchLabels:
              app-component: puppeteer

5. Configure Horizontal Pod Autoscaling to Auto-Scale Puppeteer Based on Load

Kubernetes supports automatic horizontal auto-scaling via its HorizontalPodAutoscaler (HPA) component. It adjusts your Deployment's replica counts based on actual CPU and memory utilization, ensuring more replicas are created as load increases. HPA will then remove replicas when the load subsides, avoiding excess costs.

The following HorizontalPodAutoscaler resource will dynamically change the replica count of the puppeteer Deployment to maintain an average CPU utilization of 50%. The minReplicas and maxReplicas fields cap the replica count to ensure there's always at least three, but no more than nine, replicas running.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: demo-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: puppeteer
  minReplicas: 3
  maxReplicas: 9
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

HPA is easy to configure using different metrics sources, but other options offer even more precision for advanced use cases. For instance, Keda allows simple event-driven autoscaling, a concept that's highly applicable to Puppeteer screenshot APIs. You can use it to autoscale your service based on the current number of capture requests or job queue entries, for example.

6. Set Correct Resource Requests and Limits for Your Puppeteer and Browser Instances

Kubernetes resource requests and limits set Pod CPU and memory consumption constraints. It's important to set correct requests and limits so your browser instances have enough resources to capture screenshots performantly, without negatively impacting other workloads in your cluster. Constraints are configured within the spec.containers[].resources section of container manifests, such as in the following adaptation of our Deployment resource from above:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: puppeteer
spec:
  # ...
  template:
    metadata:
      labels:
        app: puppeteer
    spec:
      containers:
        - name: puppeteer
          image: urlbox-puppeteer-kubernetes-demo
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: 100m
              memory: 1Gi
            limits:
              cpu: 1000m
              memory: 2Gi

This example specifies that the Puppeteer Pods will only schedule onto cluster Nodes that can provide 100 millicores of CPU capacity (0.1 of a logical core) and 1Gi of memory. The Pods can actually consume up to 1000 CPU millicores and 2Gi of memory. If the CPU limit is exceeded, then the Pod will be throttled; if the memory limit is exceeded, then the Pod becomes eligible for termination.

Constraints that are too high cause resources to be wasted, but going too low leads to CPU throttling or unexpected Pod out-of-memory events. It's therefore important to analyze your service's actual resource utilization, then refine your requests and limits accordingly. You can also use the optional Kubernetes vertical pod autoscaler component to dynamically change your Pod's requests and limits based on observed conditions in your cluster.

7. Use Liveness and Readiness Probes to Prevent Sending Traffic to Non-Ready Containers

Puppeteer screenshot capture APIs can take time to become ready to use. You’ll need to wait while the browser launches and Puppeteer acquires a connection, for example. During this time, your service won't be able to successfully handle a capture request. To prevent errors, the Kubernetes Pod shouldn't receive any traffic until it's fully operational.

Kubernetes liveness and readiness probes are a mechanism for implementing this functionality. Readiness probes tell Kubernetes when a Pod is ready to begin receiving traffic from Services, while liveness probes enable failures to be detected.

To enable these probes for a Puppeteer API, you should provide an API endpoint that Kubernetes can periodically call. The endpoint should indicate whether your Puppeteer service is healthy. For instance, if the browser is running and connected, then you would return a successful response code (2xx). If the browser's not connected, then sending a 4xx or 5xx error status code informs Kubernetes either not to send traffic yet (for readiness probes) or to restart the failed container (for liveness probes).

Here's a simple example of a readiness probe configured for the Pods in a Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: puppeteer
spec:
  # ...
  template:
    metadata:
      labels:
        app: puppeteer
    spec:
      containers:
        - name: puppeteer
          image: urlbox-puppeteer-kubernetes-demo:latest
          imagePullPolicy: IfNotPresent
          readinessProbe:
            httpGet:
              path: /healthy
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 5

The probe's configuration instructs Kubernetes to make an HTTP GET request to the /healthy endpoint served on port 3000 in the container. The initialDelaySeconds option defers the first check until 10 seconds after container creation, while the periodSeconds option specifies a 5-second delay between subsequent checks until a healthy status is returned. Kubernetes will then start sending traffic to the Pod via Services.

8. Implement Robust Monitoring and Alerting

As with any service running at scale, continuous monitoring is key to the success of Puppeteer operations. Kubernetes doesn't come configured for observability by default, but it's easy to enable using popular solutions like kube-prometheus-stack. This runs a Prometheus and Grafana stack inside your cluster. It automates the process of collecting metrics and logs from your Pods and other cluster components.

Beyond resource utilization stats, you also need visibility into what's going on within your service. For instance, metrics such as the number of screenshots being generated, success and error rates, and average capture times all help you make more informed scaling decisions. You can instrument your Puppeteer code to provide these values as Prometheus metrics, ready to scrape alongside your cluster-level monitoring.

Security Considerations of Scaling Puppeteer on Kubernetes

Beyond the best practices discussed above, it's crucial to keep security in mind when running Puppeteer in Kubernetes. Browsers like Chrome and Firefox have a large attack surface. Successfully compromising a zero-day vulnerability could let attackers escape the browser's sandbox to affect your container or surrounding cluster.

You can mitigate these risks by applying Kubernetes security protections such as Pod security contexts:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: puppeteer
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: puppeteer
      template:
        metadata:
          labels:
            app: puppeteer
        spec:
          securityContext:
            runAsUser: 1001
            runAsGroup: 1001
          containers:
            - name: puppeteer
              image: urlbox-puppeteer-kubernetes-demo:latest
              imagePullPolicy: IfNotPresent
              ports:
                - containerPort: 3000
              securityContext:
                allowPrivilegeEscalation: false

Setting a security context lets you ensure your Puppeteer containers run as a non-root user with privilege escalation disabled, even if you forget to specify a non-root user in your Dockerfile. If an attacker escapes the browser, then their ability to cause more damage in the container or cluster will be limited.

Similarly, it's important to keep browser sandboxing technologies enabled. Chrome and Firefox isolate web content using sandboxed processes when running as a non-root user. These protections are disabled if you use Puppeteer's --no-sandbox launch option, a flag that's commonly found in Puppeteer containerization tutorials. Disabling sandboxing makes it easier to build a functioning Puppeteer Docker image by letting you run the container as the default root user, but this effectively eliminates all security protections.

When operating a Puppeteer screenshot API, you should also consider the general security implications of letting users capture web content. For instance, users may abuse SSRF attacks to try to capture internal services hosted in your cluster, perhaps in combination with DNS rebinding techniques. Improper session isolation could also be exploited to capture content belonging to other users, if your service lets users login and start persistent sessions.

Summary

Puppeteer-based screenshot APIs must be scalable so you can maintain consistent capture performance even during times of high demand. In this article, we've seen how Kubernetes provides a platform for deploying and scaling your Puppeteer services as containers. It lets you use autoscaling, load balancing, and resource requests and limits to ensure stable Puppeteer operations.

Even so, there's still significant complexity involved in preparing a Puppeteer container image and then correctly deploying it to Kubernetes. It takes even more work to configure vital capture features such as full-page screenshots, infinite scroll workarounds, and captcha defeats.

For an easier option, check out Urlbox. Our high-performance website screenshot API is built to give beautiful results every time. You don't need to maintain your own Puppeteer service or master any scaling settings. Urlbox provides predictable captures with over 100 customization options and built-in ad, popup, captcha, and cookie banner blockers. We even screenshot the challenging elements that don't always work with standard Puppeteer, including canvas elements, videos, and WebGL content. You can get started taking screenshots at scale with a 7-day free trial.

Free Trial

Ready to start rendering?

Designers, law firms and infrastructure engineers trust Urlbox to accurately and securely convert HTML to images at scale. Experience it for yourself.

7 day free trial.No credit card required.