Show HN: An experimental AntiBot, AntiCrawl reverse proxy for the web

28 points by pulkitsh1234 2 days ago

drdaeman 2 days ago

Doing the ad industry's work? :-)

I strongly suspect someone from ad industry will start offering an option to serve a heavily obfuscated WASM-based rendering engine to render the website, with obligatory promises that it "protects the integrity of your content", "stops the AI crawl theft", and of course it also "lowers development costs by ensuring consistent rendering across all platforms".

/s, obviously

satvikpendem 2 days ago

Isn't that just Flutter (or other such WASM GUIs like Slint in Rust)? The issue is no SEO so until that's fixed, I see no way it would work. And well, if it does have SEO, then any bot can scrape it anyway.

ronsor 2 days ago

Well, this is going to be extremely resource intensive (read: your site will be a DDOS magnet), and it's also an accessibility nightmare.

ranger_danger 2 days ago

Curious if you have any better suggestions
- ronsor 2 days ago
  
  Don't serve your website as PNGs wrapped in a single page application.
  Also, if the main reason for blocking bots is to reduce server load, this solution is going to require running multiple browser instances on a server, which will require a lot of resources just to serve normal traffic.
  Edit: I should also mention this is going to chew through bandwidth.
- 1231232131231 2 days ago
  
  Literally anything other than converting websites to images. For example: captchas, proof-of-work captchas, bot detection using techniques such as tls fingerprinting, blacklisting known vps/vpn IPs, etc.
- throwaway888abc 2 days ago
  
  An option can be render each div/element as separate image wrapped in aria markup
  - ranger_danger 2 days ago
    
    Wouldn't the aria markup contain the very text they don't want bots to have?
sanchezxh 2 days ago

Yea your right.

freen 2 days ago

Well, I’m not going to use this for it’s intended purpose, however I AM going to try to see if I can use it to pipe webpages to a simple touchscreen e-ink display and retain interactivity.

AnnaMere 2 days ago

> Render Webpage as PNG for static content

Why not just take a screenshot yourself and use an image tag to render the image without using puppeteers?

You can add a small CSS media query for showing different sizes of images for different types of devices.

Why do you even need puppeteer for above basic things?

rmbyrro 2 days ago

Curious that the author works for BrowserStack, a browser-automation service

myflash13 2 days ago

I’ve actually seriously been thinking of using WebAuthn to “authenticate” every single page load with a passkey unlocked by a biometric device only, so that I can be sure that every single page load had a meat finger on TouchID or a meat face in front of FaceID before showing the page to them.

In the future I imagine that there will be biometrically secure browsers that will be required for top security applications, that can guarantee that a single physical person is actually physically present while using it.

threatofrain 2 days ago

Webauthn doesn't reveal if someone's a human and you can't see whether they used FaceID or whatever, and once someone proves they're human you should just give then a session token.
- myflash13 2 days ago
  
  But I believe there is a spoof-proof way to accept hardware keys only? Limit yourself to known good hardware manufacturers and you should be good.
  - jeroenhd 2 days ago
    
    That'll just exclude most of your visitors. The visitors you're not refusing are using TPMs or secure elements you can't trust anyway because of a long history of side channel attacks for "known-good" hardware manufacturers.
hamandcheese 2 days ago

Couldn't someone just write a software key that lies and says it's biometric?
AFAIK end-to-end manufacturer attestation isn't part of the spec (yet...).
- KomoD 2 days ago
  
  There's even an emulator built in to Chrome
  Dev Tools -> 3 dots -> More tools -> WebAuthn -> Enable virtual authenticator environment
- myflash13 2 days ago
  
  Attestation statements can be used to restrict WebAuthn use to certain hardware keys only.
  - hamandcheese 2 days ago
    
    But is there a chain of trust going all the way back to the manufacturer? Or is that simply metadata which could be spoofed?
    
    gruez 2 days ago
    
    Yes. That's what attestation is for.
jeroenhd 2 days ago

WebAuthn doesn't work the way you think it does. Most computers don't have fingerprint readers or even webcams, let alone webcams capable of verifying a user. Instead, you'll end up making people type in their Windows password or scanning QR codes.
Remote attestation concepts already exists to solve this problem: https://httptoolkit.com/blog/apple-private-access-tokens-att...
Cloudflare ran an experiment with it, and I think Apple and Cloudflare are working to make it into a proper RFC. It's only a matter of time before other browsers will support it too with the way things are going right now.
dyml 2 days ago

Please don’t use WebAuthn on every page load.
Two reasons: the protocol is not designed to do this - and the UI/UX is not designed to support this. There are better ways.
2) it will likely not work. There are virtual/software authenticatators (available in dev tools) that could generate a valid response without a human.
- jeroenhd 2 days ago
  
  FWIW using WebAuthn to start a session, set up a cookie, and validating that cookie to get access seems like a pretty usable pattern. Not much more invasive than the "checking your connection" screen Cloudflare likes to throw.
Aachen 2 days ago

That sounds awful. From there it's a miniscule step to actually needing to authenticate with every website, like a cookie you can't erase if you want to continue using the internet
rmbyrro 2 days ago

C'mon folks, PNG? Webauthn on every page?
If we're talking lunars, why don't you print posters and distribute physically in your neighborhood?
100% bot safe & "meat finger" guarantee.

rmbyrro 2 days ago

I presume you're not interested in traffic coming from search engines, right?

jeroenhd 2 days ago

Search engines have publicly listed IP ranges + user agents, setting up a HTML whitelist for search engines you do like shouldn't be impossible. Especially now that Google no longer presents its cache to visitors.

tonyponydong 2 days ago

Please don’t make the web more inaccessible to assistive technology users

ranger_danger 2 days ago

You're not wrong, but tell that to the bots :) Do you have any other suggestions?
- gruez 2 days ago
  
  >You're not wrong, but tell that to the bots :)
  Tell that to the judge when you're slapped with an ADA lawsuit because you failed to provide "reasonable accommodation" to disabled users.
  - ranger_danger 2 days ago
    
    I have to say, I've been making websites professionally for 25 years, and I've never heard of this being a thing even once.
    
    gruez 2 days ago
    
    You must not be working for juicy clients then.
    https://www.levelaccess.com/blog/title-iii-lawsuits-10-big-c...
    
    ranger_danger 2 days ago
    
    Always companies with less than 15 employees so I think ADA doesn't apply.