Taking Screenshots of All Pages on a Website

Taking Screenshots of All Pages on a Website

Learn how to capture screenshots of every webpage on a website using sitemap extractors

A common question we receive is: "Are you able to take screenshots of all webpages on a website in one request?"

While Urlbox doesn't natively support bulk website captures in a single request, you can easily achieve this using some automation, or CaptureDeck and a sitemap Extractor. This guide will walk you through the process of capturing screenshots of every page on a website, with both code and no-code solutions.

Overview

The process involves two main steps:

  1. Extract all URLs from the website using its sitemap
  2. Capture screenshots of all URLs using CaptureDeck (for no-code) or your own automation with Urlbox

Step 1: Getting the List of URLs

Most websites publish a sitemap in XML format that contains all their pages. This sitemap is typically available at /sitemap.xml on the website's domain. Here is ours.

Finding the Sitemap

Companies often place a link to their sitemap in the footer of their main webpage.

Common sitemap locations include:

  • https://example.com/sitemap.xml
  • https://example.com/sitemap_index.xml
  • https://example.com/sitemaps.xml

For example, OpenAI's sitemap is available at: https://openai.com/sitemap.xml

Extracting URLs from the Sitemap

To convert the XML sitemap into a list of all of the website's URLs, you can use an online tool or do it programmatically.

Using an Online Tool

SEOwl Sitemap Extractor is a free tool that extracts URLs from sitemaps. Paste the sitemap URL into the tool, and it will generate a complete list of all pages on the website.

Using Node.js (Programmatic)

For automation or integration into your workflow, we recommend using the sitemapper package:

npm install sitemapper
import Sitemapper from 'sitemapper';
 
const sitemap = new Sitemapper({
  url: 'https://example.com/sitemap.xml',
  timeout: 10000,
});
 
const { sites } = await sitemap.fetch();
 
console.log(sites);
// ['https://example.com/', 'https://example.com/about', 'https://example.com/contact', ...]

The sitemapper package handles nested sitemaps (sitemap indexes) automatically, so you don't need to worry about parsing multiple sitemap files. You could extend this further by accepting just a website URL and then trying to fetch known sitemap locations until you find it, then pass it into the site-mapper.

Step 2: Capturing Screenshots

Once you have your list of URLs, you have several options for capturing screenshots:

Option 1: CaptureDeck (No-Code Solution)

CaptureDeck is a no-code tool built on top of Urlbox that's perfect for bulk screenshot captures. It's the fastest way to get screenshots of all the webpages on a website without writing your own code.

Steps:

  1. Sign up for a CaptureDeck account
  2. Create a new "Deck"
  3. Paste your list of URLs into the deck
  4. Run the capture

CaptureDeck will process all URLs and provide you with organised screenshots that you can view in the dashboard or download as a ZIP file.

CaptureDeck also allows you to create your own 'presets'. These are combinations of options that aim to take a particular type of screenshot. We have preconfigured social media presets, full page presets, or mobile presets. You can find them in your CaptureDeck team settings.

Option 2: Urlbox API with Custom Script

For more control or integration into existing workflows, you can use the Urlbox API directly. Here's a simple example in JavaScript to process multiple URLs:

const urls = [
  'https://example.com/page1',
  'https://example.com/page2',
  'https://example.com/page3'
  // ... your full list of URLs
];
 
const URLBOX_SECRET = 'your-urlbox-secret';
 
async function captureScreenshots(urls) {
  return Promise.all(
    urls.map(async (url) => {
      const response = await fetch('https://api.urlbox.com/v1/render/sync', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Bearer ${URLBOX_SECRET}`
        },
        body: JSON.stringify({
          url,
          full_page: true,
          format: 'png'
        })
      });
 
      const data = await response.json();
      return { url, screenshot: data };
    })
  );
}
 
captureScreenshots(urls)
  .then(results => {
    console.log(`Captured ${results.length} screenshots`);
    // Process your results here
  })
  .catch(error => {
    console.error('Error capturing screenshots:', error);
  });

Best Practices

Rate Limiting

When processing large numbers of URLs, be mindful of:

  • Rate Limits - For very large sites (1000+ pages), consider spacing out your requests to avoid rate limiting
  • Target site politeness - Don't overwhelm the target website with too many concurrent requests

Handling Large Sites

For websites with thousands of pages:

  1. Batch processing - Process URLs in batches to avoid overwhelming your system
  2. Storage - Consider using S3 or similar cloud storage for organising large numbers of screenshots

Troubleshooting

Sitemap not found?

  • Check /robots.txt file which often lists the sitemap location
  • Look for sitemap references in the website's footer or help pages

This method might not always work, as some websites don't include a sitemap.

Getting Help

If you run into any issues or need help processing a particularly large or complex website, don't hesitate to contact our support team. We're happy to help you diagnose your setup and optimise your bulk screenshot workflow.