Jump to content

How Google Analytics Collects, Processes, and Displays Data

+ 1
  chco's Photo
Posted Aug 23 2010 03:14 PM

The following excerpt from Google Analytics explains how Google Analytics collects and processes your site data. Understanding how the data is collected and processed will help you understand how best to interpret the data.
Google Analytics uses a common data collection technique called page tags. A page tag is a small piece of Javascript that you must place on all the website pages you want to track. We affectionately call this code the Google Analytics Tracking Code, or GATC for short. If you do not place the code on a page, Google Analytics will not track that page.

The data collection process begins when a visitor requests a page from the web server. The server responds by sending the requested page back to the visitor’s browser (step 1 in the image below). As the browser processes the data, it contacts other servers that may host parts of the requested page, like images, videos, or script files. This is the case with the GATC.

Google Analytics processing flow

Attached Image


When the visitor’s browser reaches the GATC, the code begins to execute. During execution, the GATC identifies attributes of the visitor and her browsing environment, such as how many times she’s been to the site, where she came from, her operating system, her web browser, etc.

After collecting the appropriate data, the GATC sets (or updates, depending on the situation) a number of first-party cookies (step 2), which are discussed later in this section. The cookies store information about the visitor. After creating the cookies on the visitor’s machine, the tracking code waits to send the visitor data back to the Google Analytics server.

While the data is collected and the cookies are set, the browser is actively downloading a file named ga.js from a Google Analytics server (also step 2). All of the code that Google Analytics needs to function is contained within ga.js.

Once the ga.js file is loaded in the browser, the data that was collected is sent to Google in the form of a pageview. A pageview indicates that a visitor has viewed a certain page on the website. There are other types of data, like events and e-commerce data, that can be sent to Google Analytics (we will discuss these later).

The pageview is transmitted to the Google Analytics server via a request for an invisible GIF file (step 4) named __utm.gif. Each piece of information the GATC has collected is sent as a query-string parameter in the __utm.gif request, as shown below:

http://www.google-analytics.com/__utm.gif?utmwv=4.6.5&utmn=1881501226&utmhn
    =cutroni.com&utmcs=UTF-8&utmsr=1152x720&utmsc=24-bit&utmul=en- s
    &utmje=1&utmfl
    =10.0%20r42&utmdt=Analytics%20Talk%20by%20Justin%20Cutroni&utmhid
    =465405990&utmr=-&utmp=%2Fblog%2F&utmac=UA-XXXX-1&utmcc=
    __utma%3D32856364.1914824586.1269919681.1269919681.1269919681.1%3B%2B
    __utmz%3D32856364.1269919681.1.1.utmcsr%3D(direct)
    %7Cutmccn%3D(direct)%7Cutmcmd%3D(none)%3B&gaq=1


When the Google Analytics server receives this pageview, it stores the data in some type of temporary data storage. Google has not indicated exactly how the data is stored, but we know that there is some type of storage for the raw data. Think of this data storage as a large text file or a logfile (step 5).

Each line in the logfile contains numerous attributes of the pageview sent to Google. This includes:

  • When the data was collected (date and time)
  • Where the visitor came from (referring website, search engine, etc.)
  • How many times the visitor has been to the site (number of visits)
  • Where the visitor is located (geographic location)
  • Who the visitor is (IP address)


After the pageview is stored in the logfile, the data collection process is complete. The data collection and data processing components of Google Analytics are separate. This ensures Google Analytics will always collect data, even if the data processing engine is undergoing maintenance.

The next step is data processing. At some regular interval, approximately every 3 hours, Google Analytics processes the data in the logfile. Data processing time does fluctuate. Google Analytics does not process data in real time. While data is normally processed about every 3 hours, it’s not normally complete until 24 hours after collection. The reason the data is not complete until 24 hours after processing is that the entire day’s data is reprocessed after it has been collected.

Warning: Be aware that this processing behavior can lead to inaccurate intraday metrics. It is best to avoid using Google Analytics for real-time or intraday reporting.

During processing, each line in the logfile is split into pieces, one piece for each attribute of the pageview. Here’s a sample logfile; this is not an actual data storage line from Google Analytics, but a representation:

65.57.245.11 www.cutroni.com - [21/Jan/2010:19:05:06 −0600] 
"GET __utm.gif?utmwv=4.6.5&utmn=1881501226&utmhn=cutroni.com&utmcs=UTF-8&utmsr
    =1152x720&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.0%20r42&utmdt
    =Analytics%20Talk%20by%20Justin%20Cutroni&utmhid=465405990&utmr
    =-&utmp=%2Fblog%2F&utmac=UA-XXXX-11&utmcc
    =__utma%3D32856364.1914824586.1269919681.1269919681.1269919681.1%3B%2B
    __utmz%3D32856364.1269919681.1.1.utmcsr%3D(direct)%7Cutmccn%3D(direct)
    %7Cutmcmd%3D(none)%3B&gaq=1"__utma
    =32856364.1914824586.1269919681.1269919681.1269919681.1; __utmb
    =100957269; __utmc=100957269; __utmz=100957269.1164157501.1.1.utmccn
    =(direct)|utmcsr=(direct)|utmcmd=(none)"


While most of this data is difficult to understand, a few things stand out. The date and time (Jan 21, 2010 at 19:05:06) and the IP address of the visitor (65.57.245.11) are easily identifiable.

Google Analytics turns each piece of data in the logfile record into a data element called a field. Later, the fields will be transformed into dimensions. For example, the IP address becomes the Visitor IP field. The city that the visitor is visiting from becomes the Visitor City field and the City dimension.

It’s important to understand that each pageview has many, many attributes and that each one is stored in a different field or dimension. Later, Google Analytics will use fields to manipulate the data and dimensions to build the reports.

After each line has been broken into fields and dimensions (steps 6-9), the configuration settings are applied to the data. This includes features like:

  • Site search
  • Goals and funnels
  • Filters


This is shown in step 7.

Finally, after all of the settings have been applied, the data is stored in the database (step 10).

Once the data is in the database, the process is complete. When you (or any other user) request a report, the appropriate data is retrieved from the database and sent to the browser.

Warning: Once Google Analytics has processed the data and stored it in the database, it can never be changed. This means historical data can never be altered or reprocessed. Any mistakes made during setup or configuration can permanently affect the quality of the data. It is critical to avoid configuration mistakes, as there is no way to undo data issues.This also means that any configuration changes made to Google Analytics will not alter historical data. Changes will only affect future data, not past data.

Cover of Google Analytics
Learn more about this topic from Google Analytics. 

Take advantage of Google Analytics' powerful and free tools to understand exactly how users behave when they visit your website or use your web application. This hands-on guide shows you how to probe general traffic, marketing, and ecommerce information with these tools, and teaches you how to supplement them with add-ons and external tools when you want to dig even deeper. You'll also learn how to create custom reports to analyze specific issues.

Learn More Read Now on Safari


Tags:
0 Subscribe


0 Replies