Blog/Engineering

How websites block scrapers, and how to get past it

Some sites refuse to answer anything that is not a real browser. Here is how bot detection works, what a stealth browser does about it, and why you do not want to run one yourself.

Jhon Snack
Founding Engineer · 19 May 2026 · 6 min read
Engineering

How websites block scrapers, and how to get past it

Most websites will hand their data to a plain HTTP request without a fuss. A stubborn few won’t. Send them an ordinary request and you get back an empty shell, a challenge page, or a polite lie. Those are the sites that need a real browser, and they’re worth understanding even if you never plan to wrangle one yourself.

When a request comes back empty

There are two common reasons a simple fetch returns nothing useful.

  • The page is built in the browser. Lots of modern sites ship a nearly empty HTML file and then draw the real content with JavaScript. Fetch the URL and you get the skeleton, not the data, because nothing ran the scripts that fill it in.
  • The site is actively blocking bots. Plenty of sites run bot detection that inspects the request itself and quietly serves a decoy, a CAPTCHA, or a flat refusal to anything that doesn’t look like a genuine visitor.

Either way, your trusty one-line request walks away empty-handed.

How sites tell humans from bots

Bot detection is mostly pattern matching against what a real browser looks like. A genuine Chrome or Safari gives off dozens of signals: the exact set and order of its headers, the way it negotiates a secure connection, quirks of its JavaScript engine, how it renders a page, even the rhythm of how a person moves and clicks. A bare scripted request gets almost none of that right.

Detection services collect those signals and compare them to a real browser’s fingerprint. When the numbers don’t add up, you’re flagged, and instead of data you get a challenge. None of this is exotic, it’s just a lot of small tells, and faking all of them convincingly is much harder than it sounds.

What a stealth browser does

The reliable way past all that isn’t to fake a browser better. It’s to use a real one. A stealth browser (we call ours Cloak) is an actual automated browser, tuned so that it looks like an ordinary person’s, not an obvious robot. It loads the page, runs the JavaScript, clears the challenge, and only then reads the data that’s finally there.

Because it’s a genuine browser doing genuine browser things, it presents the same signals a real visitor would, so the page treats it like one. That’s the whole trick: stop pretending, and just be the thing you were pretending to be.

A real browser is the heavy machinery of data collection. It gets you into the sites nothing else can reach, but running one well, keeping it fast, undetectable, and scaled, is a project in its own right. Most teams don’t want a browser farm. They want the data.

Let maviapi run the browser

This is exactly the part we take off your hands. For sites that need a real browser, maviapi drives Cloak behind the scenes and puts the result behind the same simple API as everything else. You never see the browser, the challenges, or the fingerprinting. You make one ordinary HTTP call and get clean JSON, whether the data came from a plain fetch or a full browser session:

terminal
curl https://api.maviapi.com/v1/sites/marinetraffic/vessel?imo=9379234 \
  -H "Authorization: Bearer $MAVIAPI_KEY"

# behind a bot wall? doesn't matter. you still just get JSON.

No headless browser to install, no fingerprints to maintain, no CAPTCHA solver to babysit. If a site you need only answers a real browser, that’s our problem now. Browse the catalog to see what’s already available, or request an API for a site that’s been giving you trouble.

Written by
Jhon Snack

Works across the stack that makes an endpoint hold: the extraction layer, the edge request path, and the docs. Writes about how the sausage gets made, and why it stays fresh when the web underneath it does not.

Keep reading

All posts →

New posts, no inbox required

We publish when we have something worth saying: new APIs, product updates, and engineering notes. Follow along however you like.