I strongly suspect someone from ad industry will start offering an option to serve a heavily obfuscated WASM-based rendering engine to render the website, with obligatory promises that it "protects the integrity of your content", "stops the AI crawl theft", and of course it also "lowers development costs by ensuring consistent rendering across all platforms".
Isn't that just Flutter (or other such WASM GUIs like Slint in Rust)? The issue is no SEO so until that's fixed, I see no way it would work. And well, if it does have SEO, then any bot can scrape it anyway.
Don't serve your website as PNGs wrapped in a single page application.
Also, if the main reason for blocking bots is to reduce server load, this solution is going to require running multiple browser instances on a server, which will require a lot of resources just to serve normal traffic.
Edit: I should also mention this is going to chew through bandwidth.
Literally anything other than converting websites to images. For example: captchas, proof-of-work captchas, bot detection using techniques such as tls fingerprinting, blacklisting known vps/vpn IPs, etc.
Well, I’m not going to use this for it’s intended purpose, however I AM going to try to see if I can use it to pipe webpages to a simple touchscreen e-ink display and retain interactivity.
I’ve actually seriously been thinking of using WebAuthn to “authenticate” every single page load with a passkey unlocked by a biometric device only, so that I can be sure that every single page load had a meat finger on TouchID or a meat face in front of FaceID before showing the page to them.
In the future I imagine that there will be biometrically secure browsers that will be required for top security applications, that can guarantee that a single physical person is actually physically present while using it.
Webauthn doesn't reveal if someone's a human and you can't see whether they used FaceID or whatever, and once someone proves they're human you should just give then a session token.
That'll just exclude most of your visitors. The visitors you're not refusing are using TPMs or secure elements you can't trust anyway because of a long history of side channel attacks for "known-good" hardware manufacturers.
WebAuthn doesn't work the way you think it does. Most computers don't have fingerprint readers or even webcams, let alone webcams capable of verifying a user. Instead, you'll end up making people type in their Windows password or scanning QR codes.
Cloudflare ran an experiment with it, and I think Apple and Cloudflare are working to make it into a proper RFC. It's only a matter of time before other browsers will support it too with the way things are going right now.
FWIW using WebAuthn to start a session, set up a cookie, and validating that cookie to get access seems like a pretty usable pattern. Not much more invasive than the "checking your connection" screen Cloudflare likes to throw.
That sounds awful. From there it's a miniscule step to actually needing to authenticate with every website, like a cookie you can't erase if you want to continue using the internet
Search engines have publicly listed IP ranges + user agents, setting up a HTML whitelist for search engines you do like shouldn't be impossible. Especially now that Google no longer presents its cache to visitors.
Doing the ad industry's work? :-)
I strongly suspect someone from ad industry will start offering an option to serve a heavily obfuscated WASM-based rendering engine to render the website, with obligatory promises that it "protects the integrity of your content", "stops the AI crawl theft", and of course it also "lowers development costs by ensuring consistent rendering across all platforms".
/s, obviously
Isn't that just Flutter (or other such WASM GUIs like Slint in Rust)? The issue is no SEO so until that's fixed, I see no way it would work. And well, if it does have SEO, then any bot can scrape it anyway.
Well, this is going to be extremely resource intensive (read: your site will be a DDOS magnet), and it's also an accessibility nightmare.
Curious if you have any better suggestions
Don't serve your website as PNGs wrapped in a single page application.
Also, if the main reason for blocking bots is to reduce server load, this solution is going to require running multiple browser instances on a server, which will require a lot of resources just to serve normal traffic.
Edit: I should also mention this is going to chew through bandwidth.
Literally anything other than converting websites to images. For example: captchas, proof-of-work captchas, bot detection using techniques such as tls fingerprinting, blacklisting known vps/vpn IPs, etc.
An option can be render each div/element as separate image wrapped in aria markup
Wouldn't the aria markup contain the very text they don't want bots to have?
Yea your right.
Well, I’m not going to use this for it’s intended purpose, however I AM going to try to see if I can use it to pipe webpages to a simple touchscreen e-ink display and retain interactivity.
> Render Webpage as PNG for static content
Why not just take a screenshot yourself and use an image tag to render the image without using puppeteers?
You can add a small CSS media query for showing different sizes of images for different types of devices.
Why do you even need puppeteer for above basic things?
Curious that the author works for BrowserStack, a browser-automation service
I’ve actually seriously been thinking of using WebAuthn to “authenticate” every single page load with a passkey unlocked by a biometric device only, so that I can be sure that every single page load had a meat finger on TouchID or a meat face in front of FaceID before showing the page to them.
In the future I imagine that there will be biometrically secure browsers that will be required for top security applications, that can guarantee that a single physical person is actually physically present while using it.
Webauthn doesn't reveal if someone's a human and you can't see whether they used FaceID or whatever, and once someone proves they're human you should just give then a session token.
But I believe there is a spoof-proof way to accept hardware keys only? Limit yourself to known good hardware manufacturers and you should be good.
That'll just exclude most of your visitors. The visitors you're not refusing are using TPMs or secure elements you can't trust anyway because of a long history of side channel attacks for "known-good" hardware manufacturers.
Couldn't someone just write a software key that lies and says it's biometric?
AFAIK end-to-end manufacturer attestation isn't part of the spec (yet...).
There's even an emulator built in to Chrome
Dev Tools -> 3 dots -> More tools -> WebAuthn -> Enable virtual authenticator environment
Attestation statements can be used to restrict WebAuthn use to certain hardware keys only.
But is there a chain of trust going all the way back to the manufacturer? Or is that simply metadata which could be spoofed?
Yes. That's what attestation is for.
WebAuthn doesn't work the way you think it does. Most computers don't have fingerprint readers or even webcams, let alone webcams capable of verifying a user. Instead, you'll end up making people type in their Windows password or scanning QR codes.
Remote attestation concepts already exists to solve this problem: https://httptoolkit.com/blog/apple-private-access-tokens-att...
Cloudflare ran an experiment with it, and I think Apple and Cloudflare are working to make it into a proper RFC. It's only a matter of time before other browsers will support it too with the way things are going right now.
Please don’t use WebAuthn on every page load.
Two reasons: the protocol is not designed to do this - and the UI/UX is not designed to support this. There are better ways.
2) it will likely not work. There are virtual/software authenticatators (available in dev tools) that could generate a valid response without a human.
FWIW using WebAuthn to start a session, set up a cookie, and validating that cookie to get access seems like a pretty usable pattern. Not much more invasive than the "checking your connection" screen Cloudflare likes to throw.
That sounds awful. From there it's a miniscule step to actually needing to authenticate with every website, like a cookie you can't erase if you want to continue using the internet
C'mon folks, PNG? Webauthn on every page?
If we're talking lunars, why don't you print posters and distribute physically in your neighborhood?
100% bot safe & "meat finger" guarantee.
I presume you're not interested in traffic coming from search engines, right?
Search engines have publicly listed IP ranges + user agents, setting up a HTML whitelist for search engines you do like shouldn't be impossible. Especially now that Google no longer presents its cache to visitors.
Please don’t make the web more inaccessible to assistive technology users
You're not wrong, but tell that to the bots :) Do you have any other suggestions?
>You're not wrong, but tell that to the bots :)
Tell that to the judge when you're slapped with an ADA lawsuit because you failed to provide "reasonable accommodation" to disabled users.
I have to say, I've been making websites professionally for 25 years, and I've never heard of this being a thing even once.
You must not be working for juicy clients then.
https://www.levelaccess.com/blog/title-iii-lawsuits-10-big-c...
Always companies with less than 15 employees so I think ADA doesn't apply.