In my last post, I discussed the importance of conducting a methodological audit of the Google Analytics snippet tag. Now, I would like to go farther down the rabbit hole and widen our data accuracy audit with additional proactive actions we can easily take. As mentioned in my previous article, Google Analytics is a great platform for gaining important insights and valuable business decisions, all of which may have a direct impact on the business growth scale. But, before jumping into the data-driven scientist suits, we should take a step back, making sure the data is as accurate and reliable as it could be. In this post, I want to tackle another aspect of the auditing process: filter out noise from an overwhelming flock of Google Analytics data.
Discover how to build actionable segments
and deliver optimized and personalized experiences that drive higher revenues and engagement.
Detecting any noise, anomaly or sporadic changes in the Google Analytics data is quite a difficult procedure for the untrained user. Not to mention finding the proper way of filtering the noise or figuring out what to do with all the discovered anomalies. I’ve seen hundreds of Google Analytics accounts over the years, and I must admit that it’s been extremely rare to find a properly configured and filtered account with a high certainty of data accuracy. Making the numbers right, especially when dealing with huge traffic sites or complex account structures, requires a somewhat tedious ongoing process and, of course, some degree of understanding of what and how to do it. Fortunately for us all, there are some basic filtering rules that are very easy to configure, which may help filter out unwanted data, making our analytics ecosystem more accurate and, as a result, more useful.
What are Google Analytics filters?
Google Analytics Filters, as the name suggests, provide users a way to limit and modify data that is included in a view (also known as: profile). The Google Analytics interface includes several predefined filters to include or exclude traffic data, as well as custom filters that allow users to create various types of filters to suit their needs. Each filter can include or exclude data from the specific view (hence the term: “Filter Type”). The order of which filters are deployed is based on the order of which they are listed in the Assign Filter Order screen.
Applying filters to your view will affect all incoming data from this point and on. You will not be able to restore any of the historic filtered data. So, before setting up any filters, here’s one solid piece of advice: duplicate your view and make sure you keep an unfiltered one. It’s like a safety net, so you will always know that you’ve got one view without any modifications that’s collecting completely raw data. In order to access the filters screen, you will need an Edit permission to your Google Analytics user. You will see the filters set up screen at the Admin section.
Now that we know what filters are, and we’ve got our ‘Raw Data’ view and ‘Working View,’ here are a few of my favorite filters that are quite easy to configure and important for cleaning up noise and unwanted data.
1. Exclude internal IP filter.
Excluding visits from your own company is important for almost every kind of business. Generally speaking, those visits will alter your data with different behavior that is unique to them and different than the usual external visitor. So, simply exclude your internal office IP address from the view. You can apply several filters for networks with several IP addresses.
If you’re running a relatively large site, and hopefully you’re also using Google Tag Manager to serve the Google Analytics snippet, there’s really good advice from Luna Metrics’ Jon Meck: instead of setting up filters that may cause sampling issues, you could actually stop sending the data by modifying the Google Tag Manager with specific macros and rules. Read the complete article here: Say Goodbye to Exclude Filters in Google Analytics!
Another good tip is to create a duplicate view with an include filter on your internal office IP address. It’s a great way to test the account, play with it and make sure implementation is right.
2. Force lowercase filter on campaign attributes.
When using tagging parameters to create custom URLs for campaigns for website tracking, one should always make sure that the naming process of each parameter is based on a shared and known convention. Very often I see supplicate campaign attributes in the Google Analytics reports, that messes up the data. As an example, one person would tag a campaign as “linkedin” while another as LinkedIn.” This would lead to two separate campaign rows instead of just one.
This lowercase filter will transform the filter field (e.g., campaign attributes) to all lowercase, making sure there are no duplicate URLs in your account and that the data is more readable.
I would recommend applying the campaign lowercase filter to the following fields:
- Campaign Source
- Campaign Medium
- Campaign Name
- Campaign Term
- Campaign Content
3. Force lowercase filter on Request URI.
On many websites, the URLs are not case sensitive and can be approached in different ways by the user. As a result, URLs can be accessed in both lowercase and uppercase characters, making two different page views in the Google Analytics reports. As an example:
Applying a lowercase filter on the request URI field would make sure the data is unified.
4. Include the Hostname to the Request URI.
The default behavior reports in Google Analytics show the visited pages listed by URI. The URI is the relative portion of a page’s URL following the domain name. For example: the URI portion of the following address: http://www.example.com/about-us/ is /about-us/. For sites with multiple subdomains tracked in a single web property, it is quite difficult to distinguish between page views of pages with similar names from different subdomains. As an example: the homepage of your site http://www.example.com/ versus the homepage of a subdomain on your site: http://support.example.com/ – both would appear as / – without any indication of the actual hostname. There are two ways of solving that:
- Adding a secondary dimension to the selected report, to include the hostname next to each URI.
- Applying a rewrite filter to the Request URI to attach the hostname, as a permanent solution to that specific view.
5. Exclude traffic to any testing environment.
Many site owners and web developers choose to duplicate their website to a testing environment, for quality assurance and development purposes. This could be done in a separate domain, subdomain or even in a sub folder. Wherever you chose to put it, I would recommend filtering your data from this property by including a simple exclude filter or, even better, by making sure the Google Analytics snippet tag is not running in those pages.
6. Filter unnecessary query parameters.
Filtering out arguments that are creating unnecessary duplicate URIs is quite important for cleaning up any noise in your reports. For example: the “sessionID” parameter would eventually create many different versions of the same URL in your reports. You want to be able to merge all of these page views into one single line. Locating and extracting the complete list of unnecessary query strings from your URLs is a bit of a challenge. Fortunately for us, there’s a great and practical solution, suggested by Stéphane Hamel from Cardinal Path and Peter O’Neill. Using an external tool like Analytics Canvas (there’s a free trial of it), you can import all URLs with query strings directly to Excel®, and manipulate the data so you’ll get a clear view of how many distinct arguments you have.
7. Rewriting referral webmail traffic.
Go to your referral traffic report, and I’m sure you’ll see incoming visits from webmail sources, such as mail.aol.com, webmail.earthlink.net, webmail.verizon.com, mail.yahoo.com and such. Clearly, it’s not that useful to see all of these mail referrals broken down into separate referrals. You want to be able to consolidate all of them into a single source.
I highly recommend applying Daniel Bianchini’s advanced filter to ensure all webmail referrals will be consolidated into a single campaign entry.
8. Separating mobile and non-mobile traffic.
Mobile traffic is on the rise and with a separation between mobile and desktop (or tablet) visitors, you will be able to better understand the different nuances in behavior. Duplicate your view again and create two new views for mobile and desktop only traffic. On the mobile only view, create the following include filter:
On the desktop and tablet only view, create the following exclude filter:
9. Filter bot traffic.
So, in order to filter out some of these smart bots, we can apply a custom exclude rule, filtering the following most common bots:
^(microsoft corp(oration)?|inktomi corporation|yahoo! inc\.|google inc\.|stumbleupon inc\.)$|gomez
Update: Google Analytics has added the addition of bot and spider filtering. All you need to do is simply select a new checkbox option, labeled: “Exclude traffic from known bots and spiders”, which is included in the view level of the management user interface.
These are just some of the most basic filter ideas that, to me, are considered mandatory. Hopefully, all of these ideas will be enough to get you started. If you think I’ve missed any fundamental filters, please leave a comment and I’ll add it to the list.
Discover how to build actionable segments and deliver optimized and personalized experiences that drive higher revenues and engagement.