123
-=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- (c) WidthPadding Industries 1987 0|628|0 -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=-
Socoder -> Site & Server -> Ripe Crawler / Agent Block

Wed, 18 May 2022, 06:50
Jayenkai

Ripe Crawler / Agent Block


There's something been hammering (HAMMERING!!!!) the server for the past few days.

It's one of those bastard spiderbots that doesn't claim to be anything, but whenever I reverse look-up its MANY different IP addresses, seems to be part of this..

Not having any kind of marker, nor ever bothering to check the robots.txt, it's not easy to tell it to piss off.
The only repeating/recognisable thing in all of its user agent requests is "Chrome/94.0.4606.81", so I've blocked that.
If your browser's still showing that, then I apologise. But you won't be able to read this anyway.

If I knew WTF the Ripe was supposed to be doing, or could find any legitimate reason for it doing what the hell it's doing, then I wouldn't have resorted to this.


* over 850,000 requests in a week.




-=-=-
''Load, Next List!''
Wed, 18 May 2022, 07:49
Pakz


(Joke )

I had noticed that python can be used to make those crawlers. Anyone can make these now on any device?
Wed, 18 May 2022, 07:50
Jayenkai
I don't normally mind crawlers, and have happily let most of them continue along on their merry way, but this one's been a persistent little bugger for a good week or so.
I've probably served more content to that bot in a week than to any one of the regulars on the forum for the past number of years.

-=-=-
''Load, Next List!''
Wed, 18 May 2022, 07:53
Pakz
Are they grabbing media and text and such? I have been reading about people gathering everything they can to train their deep learning systems. I have no idea why to have a crawler other than maybe a search engine or those deep learning things.
Wed, 18 May 2022, 07:54
Jayenkai
No, only the text, but SO MUCH of the text.
Evil little shit.

-=-=-
''Load, Next List!''
Wed, 18 May 2022, 07:58
Pakz
I just googled a bit. They apperantly also make these artificial intelligent crawlers for self learning. Seems maybe they can start to live their own lives
Wed, 18 May 2022, 08:49
rockford
Eek! Creepy crawlers!
Wed, 18 May 2022, 09:01
Dan

Fri, 20 May 2022, 00:04
PHS
Has this been giving you any more trouble?
Fri, 20 May 2022, 04:28
Jayenkai
Yeah, the graph is showing a much more stable server, and is now displaying about 6 hours of users, rather than just an hour of a spider.

.. Today's most frequent spider is Babber.tech, which has an AgentHost that says what it is, and crawls at a perfectly fine speed so that it doesn't overwhelm the server.
This is a good spider.
*pets spider*

-=-=-
''Load, Next List!''
Fri, 20 May 2022, 19:55
PHS

Sat, 21 May 2022, 04:41
steve_ancell
Those Wolf spiders are well funny.

|update| -=-=- |update|
Just a shame they're not bigger.
Sun, 22 May 2022, 19:30
PHS
Keep 'em small I say. Have you seen Harry Potter?


Sun, 22 May 2022, 21:42
steve_ancell
I mean like around 3 inches. LOL
Mon, 23 May 2022, 03:16
cyangames
You can use a few lines in a .htaccess file if needed

for example:



That's an old one I used to deploy on my websites when I was Refresh Creations so will need some adjustments.

-=-=-
Web / Game Dev, occasionally finishes off coding games also!
Mon, 23 May 2022, 03:21
Jayenkai
Yeah, I've a big long list too!! The problem in this case was that the bot wasn't declaring itself in the agent string, so the only thing I could cling onto was that "chrome/94" stat. A right slippery bugger, it is..

-=-=-
''Load, Next List!''
Mon, 23 May 2022, 03:35
cyangames
Ahhh, bummer.