AI Screenshot Analysis with LLM Structured JSON Outputs

Getting consistent, structured responses can be challenging when analysing your screenshots, metadata, markdown, or HTML. Doing so is crucial for building reliable applications that depend on Urlbox's screenshot API and AI. Raw text responses from LLMs can vary widely, making them harder to parse and integrate into your workflow.

That's why we've added support for LLM Structured Outputs to Urlbox. Whether you're building competitive intelligence tools, automating landing page optimization, or creating web automation workflows, structured outputs make it easier to extract insights from rendered pages.

Bonus – PDF renders are now supported for both OpenAI and Anthropic. Plus, you can now give your LLM a system prompt like this:

{
  "llm_system_prompt": "You are an expert visual analyst. When given a screenshot, identify its purpose, layout, and key visual and textual elements. Describe what the image communicates, who it targets, and how effectively it delivers its message."
}

What are Structured Outputs?

Structured Outputs allow you to define a JSON schema that your AI provider (OpenAI, Anthropic) must follow when responding to a prompt.

Instead of receiving unstructured text, you get predictable JSON that follows your schema. This makes it much easier to predict the LLM response and automate it into your own product.

How it Works

There are 2 new options to achieve a structured output when rendering with Urlbox:

llm_schema is the JSON schema that you would like your structured output to abide by. By default, all you need to include is this option, and you'll get a JSON object response that accords with your schema.

{
   "llm_schema": { "Your": "JSON", "Schema": "Here" }
}

The second option introduced is llm_output. This is optional (and set to object by default behind the scenes). Use this when you need the format of your response to be an array of objects or one in an enum of options.

{
  "llm_output": "object" | "array" | "enum"
}

Using arrays is helpful when you want your model to analyse multiple elements on a page and return them in a single, structured JSON response — for example, when identifying several call-to-action buttons or product cards.

The enum format, on the other hand, is best suited for simpler tasks that require a single categorical answer, such as classifying an image or determining the type of website.

If you’re currently using options like js to extract specific data from pages — for example, grabbing text, links, or pricing using JavaScript during the render — you might find that structured LLM outputs can simplify your process. Instead of writing and maintaining custom JS for each use case, you can define a schema and let the LLM extract exactly what you need in a consistent format. It’s a more flexible and declarative way to get structured data, especially when your goal is analysis rather than manipulation.

The three examples below assume you have already saved your LLM API keys in your project settings. They show partially omitted responses to highlight just the LLM response.

1. Object Output for Landing Page Analysis

This is great for extracting specific data points from websites for competitive analysis and marketing intelligence. For example, you could summarise a landing page's core value proposition, identify key call to action elements, or analyse how a brand positions itself.

Request:

{
   "url": "https://www.hey.com",
   "full_page": true,
   "use_llm": true,
   "llm_prompt": "Analyse this landing page screenshot as a marketing professional
                  evaluating a competitor. Focus on what makes this page
                  effective (or ineffective) at converting visitors into customers.
                  Extract insights about their marketing strategy, positioning,
                  and conversion tactics useful for competitive intelligence.",
   "llm_schema": {
    "type": "object",
    "properties": {
      "hook": {
        "type": "string",
        "description": "What immediately grabs attention and communicates value"
      },
      "target_audience": {
        "type": "string",
        "description": "Who they're targeting based on visual/text cues"
      },
      "differentiation": {
        "type": "string",
        "description": "How they position against competitors"
      },
      "main_cta": {
        "type": "string",
        "description": "Primary call-to-action and its prominence"
      },
      "social_proof": {
        "type": "array",
        "items": {
          "type": "string",
          "enum": ["testimonials", "logos", "numbers", "reviews", "case_studies"]
        },
        "description": "Types of trust signals present"
      },
      "conversion_score": {
        "type": "number",
        "minimum": 1,
        "maximum": 10,
        "description": "Likely conversion effectiveness (1-10)"
      },
      "best_element": {
        "type": "string",
        "description": "Most effective aspect to copy"
      },
      "biggest_weakness": {
        "type": "string",
        "description": "Main weakness or missed opportunity"
      }
    },
    "required": ["hook", "target_audience", "differentiation", "main_cta", "social_proof", "conversion_score", "best_element", "biggest_weakness"]
  }
}

Response (we tried it on Hey from 37 Signals):

{
  "llmResponse": {
    "response": {
      "result": {
        "hook": "Join more than 150,000 people who get our email newsletter.",
        "target_audience": "Tech-savvy individuals and privacy-conscious users looking for a new email solution.",
        "differentiation": "Emphasizes privacy, control over email, and innovative features like The Screener and Inbox.",
        "main_cta": "\"See how HEY works\" button prominently displayed.",
        "social_proof": [
          "testimonials",
          "numbers"
        ],
        "conversion_score": 8,
        "best_element": "Strong emphasis on privacy and user control, supported by testimonials.",
        "biggest_weakness": "Lack of detailed pricing information on the landing page."
      }
    }
  }
}

2. Array Output for Multi-Element Analysis

Use an array output when you need to analyze multiple similar elements on a page that share the same structure—like product cards, call-to-action buttons, or pain points. Each element will be returned as an object following your schema, all collected in a single array response.

Request:

{
  "url": "https://www.figma.com/",
  "full_page": true,
  "use_llm": true,
  "llm_prompt": "Analyse this landing page by breaking it down into distinct
                 sections (hero, features, benefits, social proof, etc.).
                 For each section, evaluate its marketing effectiveness,
                 what conversion goal it's trying to achieve, and what
                 persuasion techniques it uses. Focus on what each section
                 does well and how it could be improved.",
  "llm_output": "array",
  "llm_schema": {
    "type": "object",
    "properties": {
      "section_type": {
        "type": "string",
        "enum": ["hero", "feature", "benefit", "social_proof", "pricing", "faq", "cta", "footer"],
        "description": "Type of page section"
      },
      "primary_message": {
        "type": "string",
        "description": "Main message or value proposition of this section"
      },
      "target_audience": {
        "type": "string",
        "description": "Who this section is targeting (if specific)"
      },
      "conversion_goal": {
        "type": "string",
        "enum": ["awareness", "consideration", "conversion", "retention", "support"],
        "description": "What stage of the funnel this section addresses"
      },
      "persuasion_technique": {
        "type": "string",
        "enum": ["social_proof", "authority", "scarcity", "reciprocity", "consistency", "liking", "logical_appeal"],
        "description": "Primary persuasion technique used"
      },
      "visual_prominence": {
        "type": "string",
        "enum": ["high", "medium", "low"],
        "description": "How much visual attention this section commands"
      },
      "effectiveness_score": {
        "type": "number",
        "minimum": 1,
        "maximum": 10,
        "description": "How effective this section is at achieving its goal (1-10)"
      },
      "key_strength": {
        "type": "string",
        "description": "What this section does particularly well"
      },
      "improvement_opportunity": {
        "type": "string",
        "description": "Main way this section could be improved"
      }
    },
    "required": ["section_type", "primary_message", "conversion_goal", "persuasion_technique", "visual_prominence", "effectiveness_score", "key_strength", "improvement_opportunity"]
  }
}

Response (we tried this one on Figma):

{
  "llmResponse": {
    "response": {
      "result": [
        {
          "section_type": "hero",
          "primary_message": "Think bigger. Build faster. Figma helps design and development teams build great products, together.",
          "conversion_goal": "awareness",
          "persuasion_technique": "logical_appeal",
          "visual_prominence": "high",
          "effectiveness_score": 8,
          "key_strength": "Clear and bold value proposition that immediately communicates the benefit of using Figma.",
          "improvement_opportunity": "Include a more specific call-to-action to guide users on what to do next."
        },
        {
          "section_type": "cta",
          "primary_message": "Get started for free",
          "conversion_goal": "conversion",
          "persuasion_technique": "reciprocity",
          "visual_prominence": "high",
          "effectiveness_score": 9,
          "key_strength": "Strong call-to-action that lowers the barrier to entry by offering a free start.",
          "improvement_opportunity": "Add urgency or limited-time offers to increase conversions."
        },
        {
          "section_type": "footer",
          "primary_message": "Learn how to use Figma",
          "conversion_goal": "support",
          "persuasion_technique": "authority",
          "visual_prominence": "low",
          "effectiveness_score": 6,
          "key_strength": "Provides resources for learning and support, enhancing user experience.",
          "improvement_opportunity": "Make the learning resources more prominent to encourage engagement."
        }
      ]
    }
  }
}

3. Enum Output for Website Classification

Perfect for simple classification tasks:

Request:

{
  "url": "https://my-favourite-shop.com",
  "use_llm": true,
  "llm_prompt": "Classify this website into one of the predefined categories",
  "llm_output": "enum",
  "llm_schema": [
    "ecommerce",
    "blog",
    "news",
    "documentation",
    "social",
    "business",
    "portfolio"
  ]
}

Response:

{
  "llmResponse": {
    "response": {
      "result": "ecommerce"
    }
  }
}

These examples show what's possible when you combine visual analysis with structured outputs.

HTML, markdown, and metadata capture content, but they miss how it actually looks—the layout and design that determine how a page communicates. Structured outputs let you analyze the final rendered result, as if you had your own eyes on it, not just the underlying code.

You get consistent, predictable data from screenshots that's easier to work with than unstructured text responses.

JSON Schema Validation

All schemas are validated using the JSON Schema specification. We do not support draft-04 JSON Schemas, and recommend using version draft-07. This ensures:

Type Safety: Your responses will match the expected data types
Required Fields: Specified fields will be present
Data Validation: Enums, patterns, and constraints are enforced
Consistency: Every response follows the same structure

While structured outputs significantly improve reliability, AI providers like OpenAI and Anthropic still allow for some flexibility in formatting—so results may occasionally drift from your schema.

Supported LLM Providers

We support a wide range of LLM providers. Check out our docs page for the full list of 15+ supported providers including:

OpenAI (GPT-4o, GPT-4, etc.)
Anthropic (Claude models)
Google Gemini
Azure OpenAI
Mistral AI
Groq (fast inference)
And many more...

Best Practices

1. Clear Descriptions

Always provide clear descriptions for your schema properties:

{
  "properties": {
    "summary": {
      "type": "string",
      "description": "A 2-3 sentence summary of the main content"
    }
  }
}

2. Use Appropriate Types

Match your schema types to your expected data:

{
  "price": {"type": "number"},
  "available": {"type": "boolean"},
  "tags": {
    "type": "array", 
    "items": {"type": "string"}
  }
}

3. Set Required Fields

Mark essential fields as required:

{
  "required": ["title", "url", "price"]
}

Next Steps

Ready to get started with structured outputs?

Set up your LLM provider in your project settings.
Design your JSON schema using the JSON Schema documentation, or programmatically with a validator tool.
Test your schema in the Urlbox dashboard sandbox.
Integrate into your application using our API

Get consistent, structured AI analysis of your screenshots, PDFs, and other renders today with Urlbox LLM Structured Output.

Want More?

We’d love to hear from you.

If you have an idea for an AI feature that your provider supports, and you'd like to see integrated into Urlbox, please do get in contact.

Alternatively if there’s a new AI provider you want us to support, or you're having trouble integrating any of our AI options, drop us a message at [email protected]. We’re always keen to improve and prioritise based on what you need.

Happy Rendering 📸

Features

AI Screenshot Analysis with LLM Structured JSON Outputs

What are Structured Outputs?

How it Works

1. Object Output for Landing Page Analysis

2. Array Output for Multi-Element Analysis

3. Enum Output for Website Classification

JSON Schema Validation

Supported LLM Providers

Best Practices

1. Clear Descriptions

2. Use Appropriate Types

3. Set Required Fields

Next Steps

Want More?

Bulk Screenshot Tool
Capture 100+ webpages in minutes

Index

All Features

Free Trial

Rendering

Formats

Integrations

Free Trial

Features

AI Screenshot Analysis with LLM Structured JSON Outputs

What are Structured Outputs?

How it Works

1. Object Output for Landing Page Analysis

2. Array Output for Multi-Element Analysis

3. Enum Output for Website Classification

JSON Schema Validation

Supported LLM Providers

Best Practices

1. Clear Descriptions

2. Use Appropriate Types

3. Set Required Fields

Next Steps

Want More?

Bulk Screenshot Tool Capture 100+ webpages in minutes

Index

All Features

Free Trial

Rendering

Formats

Integrations

Free Trial

Bulk Screenshot Tool
Capture 100+ webpages in minutes