Proxies
Learn how to configure a proxy with Urlbox
When rendering certain sites, you may be blocked from rendering or scraping the content that they serve.
An example of this is when sites are using Cloudflare to protect their site from bots:
In order to get around these protections, the Urlbox API supports the use of proxies.
How proxies work
When you make a request to the Urlbox API, you can specify a proxy to use. The Urlbox API will then make the request to the target site using the proxy you specified.
This has the benefit of making the request appear to come from the IP address of the proxy, rather than from urlbox's data center IP address.
This reduces the chance that the target site will be able to detect that the request is coming from the Urlbox API.
Proxy providers
Urlbox does not provide proxies for you to use. Instead, you must bring your own proxy by using a proxy provider. There are many proxy providers available, and you can use any provider that you like.
Below are some proxy providers:
Using a proxy with Urlbox
Taking brightdata as an example, you can signup for an account there and then create a proxy.
They have several solution types and usually the best proxies are web unlockers, residential or 4G / mobile proxies.
In this example I've created a proxy using the Web Unlocker solution type, this also gives the following benefits:
- Bypass CAPTCHAs, blocks, and, restrictions
- Only pay for successful requests
- Automated IP address rotation
- User emulation & fingerprints
If you click on the proxy you created, then go to the Access parameters
tab, and finally click on the Check out code and integration examples
button. With the API
type selected and Language set to Node.js
you can copy the proxy URL:
The proxy URL should look something like: http://brd-customer-hl_3f08b01c-zone-social_networks:[email protected]:22225
To use this with Urlbox, you can pass it in directly in the request:
Request
Now urlbox will make the request to the target URL using the proxy you specified.
Solving ERR_TUNNEL_CONNECTION_FAILED error
If you are using a proxy and you get an error like ERR_TUNNEL_CONNECTION_FAILED
then it is likely that the proxy you are using is blocking requests to certain domains.
When using bright data residential proxies, some domains such as linkedin.com
are blocked, unless you go through their full verification process.
- If the
url
you're sending to Urlbox begins withhttps://
, try changing this tohttp://
instead, and see if there is any extra message. - For example, accessing
https://linkedin.com/
with an unverified residential proxy will give theERR_TUNNEL_CONNECTION_FAILED
error. However, if you change this tohttp://linkedin.com/
you will get a more helpful error message:forbidden requests to this domain are blocked using proxy networks, please get access via a web unlocker zone or IDE tools, or contact your account manager to assist
- This means that you either go through their full verification process, or you use a different proxy zone to access the specific domain.
Check proxy connection from command line
You can check whether the proxy works directly from your terminal, for example, here is a request to https://linkedin.com
using a proxy with curl:
and here is the same request using http://
instead of https://
switching to a web unlocker type of proxy, the request works:
Check proxy blacklist and whitelist settings
You should also check in the proxies settings page that you are not accidentally blocking or whitelisting any IP's from accessing the proxy.
At this moment Urlbox cannot share it's IP addresses as they are dynamic and subject to change. We run on a mixture of Google Kubernetes Engine (GKE) and Cloud Run and use Google Clouds IP ranges, so you can whitelist those if you need to.
Check proxy providers status page
As a last resort, it is often worth checking the status page of the proxy provider you are using, as they may be experiencing issues.
For example, bright datas status page is here: https://brightdata.com/network-status
Geolocation
Sometimes it is also beneficial to have an IP address from a specific country. For example, if you are rendering a site that has different content for different countries, you may want to use a proxy from that country.
A lot of proxy providers allow you to target locations down to country and city level, even the zipcode. You can also use proxies that originate from certain ASN's.
An example using brightdata again, you can specify the city and country as part of the proxy URL.
Here's an example of a proxy that would originate from New York, USA:
brd-customer-{YOUR_CUSTOMER_ID}-zone-{YOUR_ZONE}-country-us-city-newyork
Proxy gotchas
When using proxies along side Urlbox, expect slower render times, as the request has to go through the proxy before it reaches the target site.
When using residential proxies, there will be a higher change of request failures, as some devices may suddenly go offline, or the connection to the proxy is not stable. Some domains are still blocked by proxy providers, especially high value scraping targets such as linkedin, amazon etc, so you may need to go through their verification process to get access to those domains, or use a web unlocker zone.
Sites may still block proxies, so you may need to try a few different providers and proxy types before you find one that works for you.
Extraneous requests
It is also worth noting that when using a proxy, you may see some extraneous requests to domains such as accounts.google.com
in your proxy request logs.
This is because when booting the headless chrome browser, chrome will make some requests to google domains to check for account logins, or updates. We try to reduce as many of these extraneous requests as possible, but there are some that are not possible to remove.
You can use the various block_*
options to block many of these requests from happening.
Blocking requests by domain
You can use the block_urls
to block specific domains from being requested.
Request
Block requests by resource type
You can also block all requests of a certain resource type, to reduce the amount of bandwidth used by the proxy. For example, you can block images and fonts using the following options: