Don’t Trust Your Google Analytics Data Just yet (part 2): Get Rid of Noise With These Fundamental 9 Filters

In my last post, I discussed the importance of conducting a methodological audit of the Google Analytics snippet tag. Now, I would like to go farther down the rabbit hole and widen our data accuracy audit with additional proactive actions we can easily take. As mentioned in my previous article, Google Analytics is a great platform for gaining important insights and valuable business decisions, all of which may have a direct impact on the business growth scale. But, before jumping into the data-driven scientist suits, we should take a step back, making sure the data is as accurate and reliable as it could be. In this post, I want to tackle another aspect of the auditing process: filter out noise from an overwhelming flock of Google Analytics data.

Discover how to build actionable segments

and deliver optimized and personalized experiences that drive higher revenues and engagement.

Detecting any noise, anomaly or sporadic changes in the Google Analytics data is quite a difficult procedure for the untrained user. Not to mention finding the proper way of filtering the noise or figuring out what to do with all the discovered anomalies. I’ve seen hundreds of Google Analytics accounts over the years, and I must admit that it’s been extremely rare to find a properly configured and filtered account with a high certainty of data accuracy. Making the numbers right, especially when dealing with huge traffic sites or complex account structures, requires a somewhat tedious ongoing process and, of course, some degree of understanding of what and how to do it. Fortunately for us all, there are some basic filtering rules that are very easy to configure, which may help filter out unwanted data, making our analytics ecosystem more accurate and, as a result, more useful.

What are Google Analytics filters?

Google Analytics filtersGoogle Analytics Filters, as the name suggests, provide users a way to limit and modify data that is included in a view (also known as: profile). The Google Analytics interface includes several predefined filters to include or exclude traffic data, as well as custom filters that allow users to create various types of filters to suit their needs. Each filter can include or exclude data from the specific view (hence the term: “Filter Type”). The order of which filters are deployed is based on the order of which they are listed in the Assign Filter Order screen.

Applying filters to your view will affect all incoming data from this point and on. You will not be able to restore any of the historic filtered data. So, before setting up any filters, here’s one solid piece of advice: duplicate your view and make sure you keep an unfiltered one. It’s like a safety net, so you will always know that you’ve got one view without any modifications that’s collecting completely raw data. In order to access the filters screen, you will need an Edit permission to your Google Analytics user. You will see the filters set up screen at the Admin section.

Now that we know what filters are, and we’ve got our ‘Raw Data’ view and ‘Working View,’ here are a few of my favorite filters that are quite easy to configure and important for cleaning up noise and unwanted data.

Get Rid of Noise With These Fundamental Google Analytics Filters

1. Exclude internal IP filter.

Excluding visits from your own company is important for almost every kind of business. Generally speaking, those visits will alter your data with different behavior that is unique to them and different than the usual external visitor. So, simply exclude your internal office IP address from the view. You can apply several filters for networks with several IP addresses.

If you’re running a relatively large site, and hopefully you’re also using Google Tag Manager to serve the Google Analytics snippet, there’s really good advice from Luna Metrics’ Jon Meck: instead of setting up filters that may cause sampling issues, you could actually stop sending the data by modifying the Google Tag Manager with specific macros and rules. Read the complete article here: Say Goodbye to Exclude Filters in Google Analytics!

Another good tip is to create a duplicate view with an include filter on your internal office IP address. It’s a great way to test the account, play with it and make sure implementation is right.

Exclude internal IP filter

2. Force lowercase filter on campaign attributes.

When using tagging parameters to create custom URLs for campaigns for website tracking, one should always make sure that the naming process of each parameter is based on a shared and known convention. Very often I see supplicate campaign attributes in the Google Analytics reports, that messes up the data. As an example, one person would tag a campaign as “linkedin” while another as LinkedIn.” This would lead to two separate campaign rows instead of just one.

This lowercase filter will transform the filter field (e.g., campaign attributes) to all lowercase, making sure there are no duplicate URLs in your account and that the data is more readable.

I would recommend applying the campaign lowercase filter to the following fields:

  • Campaign Source
  • Campaign Medium
  • Campaign Name
  • Campaign Term
  • Campaign Content

Force lowercase filter on campaign attributes

3. Force lowercase filter on Request URI.

On many websites, the URLs are not case sensitive and can be approached in different ways by the user. As a result, URLs can be accessed in both lowercase and uppercase characters, making two different page views in the Google Analytics reports. As an example:

  1. http://www.example.com/about-us/
  2. http://www.example.com/About-Us/

Applying a lowercase filter on the request URI field would make sure the data is unified.

Force lowercase filter on Request URI

4. Include the Hostname to the Request URI.

The default behavior reports in Google Analytics show the visited pages listed by URI. The URI is the relative portion of a page’s URL following the domain name. For example: the URI portion of the following address: http://www.example.com/about-us/ is /about-us/. For sites with multiple subdomains tracked in a single web property, it is quite difficult to distinguish between page views of pages with similar names from different subdomains. As an example: the homepage of your site http://www.example.com/ versus the homepage of a subdomain on your site: http://support.example.com/ – both would appear as / – without any indication of the actual hostname. There are two ways of solving that:

  1. Adding a secondary dimension to the selected report, to include the hostname next to each URI.
  2. Applying a rewrite filter to the Request URI to attach the hostname, as a permanent solution to that specific view.

Google Analytics Filter to Include the Hostname to the Request URI

5. Exclude traffic to any testing environment.

Many site owners and web developers choose to duplicate their website to a testing environment, for quality assurance and development purposes. This could be done in a separate domain, subdomain or even in a sub folder. Wherever you chose to put it, I would recommend filtering your data from this property by including a simple exclude filter or, even better, by making sure the Google Analytics snippet tag is not running in those pages.

6. Filter unnecessary query parameters.

Filtering out arguments that are creating unnecessary duplicate URIs is quite important for cleaning up any noise in your reports. For example: the “sessionID” parameter would eventually create many different versions of the same URL in your reports. You want to be able to merge all of these page views into one single line. Locating and extracting the complete list of unnecessary query strings from your URLs is a bit of a challenge. Fortunately for us, there’s a great and practical solution, suggested by Stéphane Hamel from Cardinal Path and Peter O’Neill. Using an external tool like Analytics Canvas (there’s a free trial of it), you can import all URLs with query strings directly to Excel®, and manipulate the data so you’ll get a clear view of how many distinct arguments you have.

7. Rewriting referral webmail traffic.

Go to your referral traffic report, and I’m sure you’ll see incoming visits from webmail sources, such as mail.aol.com, webmail.earthlink.net, webmail.verizon.com, mail.yahoo.com and such. Clearly, it’s not that useful to see all of these mail referrals broken down into separate referrals. You want to be able to consolidate all of them into a single source.

I highly recommend applying Daniel Bianchini’s advanced filter to ensure all webmail referrals will be consolidated into a single campaign entry.

8. Separating mobile and non-mobile traffic.

Mobile traffic is on the rise and with a separation between mobile and desktop (or tablet) visitors, you will be able to better understand the different nuances in behavior. Duplicate your view again and create two new views for mobile and desktop only traffic. On the mobile only view, create the following include filter:

Google Analytics Include Mobile Traffic Filter

On the desktop and tablet only view, create the following exclude filter:

Google Analytics Exclude Mobile Traffic Filter

9. Filter bot traffic.

Google Analytics is a tag-based, client-side solution. It requires the client to execute the JavaScript tag and accept cookies. During the last few years, more and more “smart” web bots have started to show up on the Google Analytics reports, messing up the overall metrics. For example, this is a real report showing a massive spike in data from a service provider named “Microsoft Corp” and “Microsoft Corporation,” with almost 20,000 sessions collected from it during that time.

Bot Traffic Spike in Google Analytics

So, in order to filter out some of these smart bots, we can apply a custom exclude rule, filtering the following most common bots:

^(microsoft corp(oration)?|inktomi corporation|yahoo! inc\.|google inc\.|stumbleupon inc\.)$|gomez

Google Analytics Exclude Bots Filter

Update: Google Analytics has added the addition of bot and spider filtering. All you need to do is simply select a new checkbox option, labeled: “Exclude traffic from known bots and spiders”, which is included in the view level of the management user interface.

Bot and Spider Filtering Checkbox

Final Thoughts

These are just some of the most basic filter ideas that, to me, are considered mandatory. Hopefully, all of these ideas will be enough to get you started. If you think I’ve missed any fundamental filters, please leave a comment and I’ll add it to the list.

Discover how to build actionable segments and deliver optimized and personalized experiences that drive higher revenues and engagement.

Posted By

Categories:
Analytics
Don’t Trust Your Google Analytics Data Just yet (part 2): Get Rid of Noise With These Fundamental 9 Filters
4.63 (92.5%) 8 votes

  • Raghavendra S

    Excellent Job Yaniv Navot ! I add up these filters in my GA teachings…………

  • Thanks Yaniv, this checklist is super helpful and I’ll definitely be expanding my view list. A growing trend I’m seeing is referrals from spambots like buttons-for-website and semalt.com, etc. I’ve implemented both ISP domain and referral filters for these as well as checked the “Exclude Bots” setting, and nevertheless they plague my numbers. I’ve also limited my master view to only my hostnames. Any thoughts on ridding reporting of these once and for all?

    • Thanks Lea. Just create a new exclude filter for semalt.com. I also would recommend to spend some time finding all significant non-human referrals, and just exclude all of them with a single line of Regex filter.

      • Ok, yes I’ve already created exclude filters for semalt on ISP domain, referral and source, but nothing has worked yet. I see this is a problem across the board. If you have any other thoughts other than editing the .htaccess file, let me know!

  • Great post Yaniv, answered a couple of questions that have been floating around my mind for a while!

  • Rajiv

    I have configured Advance Filter to track the sub-domains
    record as follow :

    Filter
    Type: Custom filter > Advanced

    Field A: Hostname

    Extract A: (.*)

    Field B: Request URI

    Extract B: (.*)

    Output To: Request URI

    Constructor: $A1$B1

    After
    that, I am able to see sub-domains record and View Full Page URL In Reports.
    But when I check reports in All page (e.g. Behavior >> All Pages)
    or selecting Landing Page as a Primary Dimension. Further I click on Icon given
    next to displayed Full URL to visit to same domain page, in browser the page
    opened but the double domain name comes so page not open successfully.

    For example :

    In landing page list following URL given :

    http://www.sitegeek.com/compareHosting/arvixe_vs_hostgator

    If I click on icon given next the displayed URL, in browser following URL will
    open

    https://sitegeek.comwww.sitegeek.com/compareHosting/arvixe_vs_hostgator

    Is First Domain with HTTPs coming from ‘View’ where this is taken ?

    How Can I remove double domains?

    Thanks,

    Rajiv

    • Thanks for your question @disqus_KkVLMU6UIQ:disqus. I’m afraid there isn’t any way of fixing this.

      • Blacula79

        I can’t thank you enough for this article. It’s been a big help. Unfortunately I’m having the same issue as Rajiv, so if you ever find a work around for this please let us know.

  • Is there a way to filter out views from after a landing page? I sometimes work on my site from campus and I don’t want to filter out that IP address but I also don’t want all of those page views to be counted in my analytics. Thanks!

    • Hi Jessica, to accomplish this you can use custom segments. Simply exclude all of the users who visited the landing pages.

  • Nattapong Phuntusil

    Thanks, What if I need to set filter of View for someone who is able to see statistics only sub domain (example: a.domain.com)? How should I do settings for this one? And hide stat of other sub domain at Behavior menu as well.

Menu Title
Contact Us
×