Virtual Siege Warfare - Part 1

We have been fighting each other for a very long time.  Curiously the tactics by which we fight each other have not changed much.  The tactics are the same but the battlefield is different.

It is no mystery that Ad Revenue is a big deal.  People's entire livelihood depends on web traffic and people clicking on ads and collecting a small percentage of $ from an advertiser.  

This topic is going to be broken into multiple pieces.  As a I read and research I am learning a lot.  There is too much to slam into one article.

Let's look back to when we were holding spears and firing arrows.  Imagine you are an attacking army and you attempting to take a castle.  As an attacker who has supplies being sent to the front, your military leaders have planned for the long haul.  

Your ultimate goal is to take the castle, or destroy it.

Wouldn't it be nice if we could do damage without actually launching a full on attack?  That is where the beginning of siege warfare comes into play.  As the attacker I will surround the castle and prevent resources from getting to the people inside the castle.

Given some time the people will run out of food, their stash of armaments will be depleted.  Morale will be down, and they will be hungry and tired.  From a tactical standpoint this is perfect, my enemy is weak.

Maybe they will surrender or perhaps now is the perfect time to launch my attack.

Now keep this in mind as we travel to modern times.

You own and operate a virtual shop that sells niche shoes.  You have a competitor who has come into the market who is selling a very similar niche shoe.  They are cutting into your business and your revenue is down.  Both of you rely on sales but also people visiting and clicking on ads.

As an angry shoe monger, I want to stop them from tapping in to my sweet sweet shoe revenue.

Adsense and others use website traffic and combinations of analytics to determine eligibility to participate in their ad programs.  It is possible to destroy that source of revenue.  It can happen by accident, unintentionally breaking something in the terms and conditions, but can be directed at a target maliciously.

How is it done?

Put quite simply if a ton of the "wrong" traffic lands on a site, you run the risk of your ad revenue being discontinued.

Whomever(ad programme) you are working with may determine that you are attempting to commit ad fraud(fake clicks, fake visits etc) to grab more money from advertisers.

This happens a lot.

At this point I feel pretty comfortable with scraping.  I like being able to automate the retrieval of data from the web.  During my studying I have learned some techniques to evade common scraping prevention strategies.

As part one of the Siege discussion I wanted to highlight the technique spoofing user-agents.

What is a user-agent? [4]

"The User-Agent request header is a characteristic string that lets servers and network peers identify the application, operating system, vendor, and/or version of the requesting user agent."

When you spoof a user-agent you are telling the server that you are someone you are not.  

Why does this matter?

It matters because when I make a request of a webserver my HTTP header tells the server who I am, that webserver then decides if it will accept me or not.

Web servers can say "NO."  Maybe you are surfing the web with an outdated browser? maybe you are surfing from a geo-restricted location?  Or maybe you are clearly making scraping requests using python!

*Ominous Music !DUN DUN DUN!

It is common for developers to block headers that contain references to requests used by python bots.  Here is an simple bot(submits a request and retrieves the header information).

and the output


As you can see the user agent clearly states that I am using "python-requests."  If I want to block some bots from accessing my site I can specifically block this.

But I am a crafty scraper and I want to access your site regardless.  So I spoof the user-agent.

Below we see a script that rotates user-agent strings and sends requests to a webserver. (I have intentionally cropped the strings.  If you want the strings just give it a quick google :) )

And then the output


Closer look


My User-Agent has been obfuscated!

From a scraping viewpoint this is extremely valuable. From a malicious actor this also has merit.  If I can pretend to be someone different every time I access my target's website there is a chance that I can confuse the target's visit tracking system.

What are the next steps?

(Future posts are going to include the following)

Well, aspects I want to add too the script:

- build in proxies

- build in country selection that pairs common user-agents with country specific proxy

- build in sessions

- build in mouse movements

- add random timing for sessions

 

Stay tuned for future parts to this discussion!

 Andrew


References

[1] https://www.scrapehero.com/how-to-fake-and-rotate-user-agents-using-python-3/ 

[2] https://empireflippers.com/adsense-account-disabled/

[3] https://en.wikipedia.org/wiki/Web_scraping

[4] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

 

Popular Posts