How do traffic bots work

Google Analytics Spam: Get Rid of Fake Traffic and Referrer Spam

Not all traffic that arrives in Google Analytics is real traffic. Fake traffic and referrer spam falsify your access figures. You can read in the post how you recognize uninvited guests and which filters you can use to stop the spam in Google Analytics.

Update: At the end of January / beginning of February 2021, traffic appeared in our accounts that looked suspiciously like spam. Did you have that too? Feel free to write us a comment!

Table of Contents

What is Google Analytics Fake Traffic?

As Fake traffic are those hits that appear in your analytics data, but for whatever reason not really" are not from one person. It can do both technical problems as well as conscious deception have as a cause. No matter what the cause: You normally do not want this access in your evaluations, because they falsify access numbers and thus dilute the informative value.

Ghost traffic as spam

If someone generates access or general entries in your analytics reports to draw attention to websites or services, that is it classic spam - just not by email. Ghost traffic This is because the accesses often do not actually arise on your side, but are channeled directly into your GA report.

Example of bot traffic in Google Analytics

How does Google Analytics Spam work?

The analytics tracking script on your site is just a tool to collect all user data. The actual data transfer then happens to one Google server https://www.google-analytics.com through a specific URL with appropriate parameters.

Each parameter stands for certain information, e.g. aip = 1 means that IP anonymization has been activated. The catch: The Calling this URL is not protected. Anyone can call it up directly and write what he or she wants in the parameter fields.

Writing a script that goes through a list of Analytics account numbers one after the other and sends an appropriately prepared call to them is not an unsolvable task for a programmer. With the "correct" parameters, the desired entries end up in the page, event or, above all, in the link and campaign report.

Why, of all things, in the reference report?

The Report references shows in Analytics on which website users clicked a link to come to your website. The browser kindly passes this information in the referrer field. If a new entry appears in the links from which many users came, you look up where this great link came from and go to the URL.

Bot traffic in the referral report

The domain only houses a redirect that takes you to a website where you can buy such fake user access in this case. For example, to spruce up your own user statistics, for superiors, investors or, for example, customers who advertise on your site.

Fake access by bots, crawlers and spiders

There are a lot of programs on the net that call up, copy or scour websites. In addition to the well-known search engines like Google, there are thousands of more or less well-known programs that examine your pages. Not any of these Bots shows up in your user reports, you need to run the tracking codes of the site. If they do, however, Google Analytics can hardly distinguish them from the browser of a real user at first.

Some bots can be identified by their browser identification (in the Target group> Technology> Browser and operating system report). The Google crawler reports, for example, "Googlebot" as a browser ID. However, some bots will disguise themselves as "normal users" and pretend to come to the site as Chrome, Firefox or Edge.

What can you do against fake traffic?

With a few simple steps you can set up your Analytics account as a kind Spam filter for fake traffic build in. As with email spam, there is no such thing as 100% protection, but 80% is something.

Filter out bots

Some bots, such as your own Googlebot, Analytics always filters out. GA can also automatically filter out some other bots. To do this, tick the box in the administration of the data view "Filter out bots" option

Filter out bots

The important limitation here is "known bots and spiders", but GA is already aware of a large number of programs and crawlers.

Filter host name

If you exclude traffic with filters, you should always have created a raw data view beforehand. There you let accesses come in unfiltered and so you always have a backup view with which you can check your filters.

The tracking code transmits the Domain of originon which access is measured. On our website, for example, the host field contains the value www.luna-park.de on every page.

In the case of ghost spam that is sent to your property via direct calls, this field is not always filled correctly. Because for these calls, a property ID is often simply “guessed” and then a call is sent there. There is usually no check on which website it is actually installed. In the screenshot you can see many calls for the domain webanalysis-news.de, on which the tracking code is not installed at all.

Host names in Google Analytics

With a Host filter you can block such unwanted access from the start. To do this, add a under the settings of the data view Host type filter with which you only allow access from the domain from which you expect traffic.

Create host name filter

In the example you can see a regular expression: All domains are allowed which luna park included, i.e. luna-park.de, luna-park.net, www.luna-park.de, etc. In addition, accesses are also included usercontent let through in the hostname. These arise when a user accesses your site via the Google cache. Because tracking codes are also executed here, but they come from a different domain, as you can see in the screenshot.

Host name user content

Filter referrers and campaigns

Another way to limit fake traffic is to use it Completely block suspicious referrals and campaigns. Here you exclude certain sessions from the start. To do this, you set a filter for the campaign source or reference field.

Campaign source filter

You can find an up-to-date list or template for these filters at carloseo.com

Select property ID above 1

In the simplest form of this traffic spam, a script runs a list of analytics accounts and sends calls to them. In any case, the first property is "used" as this is available in most of the Analytics accounts. Other properties, on the other hand, are often not taken into account because they often do not even exist.

If you use a property with a "higher" ID for your primary domain, you are at least rid of these simple scripts.

Additional protection via the Google Tag Manager (advanced)

If you use the Google Tag Manager to play tracking codes on your site, you can use it to build in additional protection. To do this, create a custom dimension in your Analytics Property.

Custom dimension

Then add the dimension to your tags in the Google Tag Manager and pass any character string in it. It doesn't matter what or how long this string is.

Custom dimension in GTM

Back in the analytics administration you now set up a filter in the data view with which you only allow accesses that have this user-defined dimension.

Filter custom dimension

Now you only allow access from your Google Tag Manager Container.

What if the spam has already landed in Google Analytics?

With the help of analytics segments you can exclude unpleasant traffic based on referrals, campaigns, pages / events or many other criteria. However, these filters are only temporary and therefore not a really permanent solution.

What about new Google Analytics 4 properties?

According to Google, in Google Analytics 4 Properties, bot and spider traffic are filtered directly:

In Google Analytics 4 properties, traffic from bots and spiders is automatically excluded. This ensures that your analytics data, to the extent possible, does not include events from known bots. At this time, you cannot disable bot traffic exclusion or see how much bot traffic was excluded.

Bot traffic is identified using a combination of Google research and the International Spiders and Bots List, maintained by the Interactive Advertising Bureau.

Source: https://support.google.com/analytics/answer/9888366?hl=en

In addition, the new GA4 reports have not yet offered any comparable filter options as in the classic properties.

Conclusion

Unwanted traffic in the reports from Google Analytics is annoying and can make working with your data time-consuming or even impossible. Therefore, you should take precautions for good data quality. The settings and filters described will help you to set up a first protective wall for fake traffic.

You can find more tips on setting up your Analytics account correctly in the article Setting up your Google Analytics account correctly.

Markus Vollmert

Markus Vollmert has been involved in online marketing for a long time and is at home with numbers and data. As the founder and managing director of lunapark, he deals with tracking and data for websites and campaigns. Markus is also the author of Google Analytics - The comprehensive manual from Rheinwerk Verlag.

Integrate Google Analytics: create an account & implement tracking code

by Mareike Doll | May 5, 2021

The web analysis tool Google Analytics gives you the opportunity to track and analyze traffic and user movements on your website and to optimize your website, content and marketing measures based on this. All you have to do to get the data ...

The end of Google cookies, welcome first party tracking!

by Markus Vollmert | Mar 15, 2021

At the beginning of March 2021, Google announced the end of personalized advertising as we knew it before. From 2022 onwards, the Chrome browser will no longer support third-party cookies and Google will not enter any alternative technology into the race to ...

1 comment

  1. Reza on April 28, 2021 at 7:24 am