Do you ever read The New York Times, The Atlantic, or Gizmodo online? Do you look up medical information on the Mayo Clinic's website, or shop on Home Depot's e-commerce site? Have you ever had to utilize the US Federal Trade Commission's IdentityTheft.gov portal?
If the answer to any of these is “yes”, then it is likely Google knows where you were physically sitting when you browsed these websites. All of these websites, plus several thousand others, use Google's 'free' web analytics service and have not configured IP anonymization.
Google Analytics is a part of the Google Marketing Platform, which allows website administrators to track and analyze website traffic. Google shares data from its Google Analytics silos with its Google Advertising Divisions. For example, data pools from first-party cookies set by Google Analytics can be shared with Google Ads infrastructure.
If a website is using Google Analytics and has configured the "Anonymize IP" option, network requests made to Google's servers will include the query string parameters "aip" (or as part of the POST body).Screenshot of Chrome browser Developer Tools Network panel, illustrating an HTTPS network request sent to www.google-analytics.com. This screenshot was taken on irs.gov, and includes the “aip” parameter. The “aip” parameter needs to be configured for Google Analytics to anonymize a user’s IP address.
By analyzing network traffic from the 100,000 most popular websites (according to the Tranco 1M list), this analysis observed that at least 31,639 websites use Google Analytics on their webpages. Of these 31,639, only 4,435 (14%) had enabled the "aip" parameter. The other 86% of websites did not have this query string parameter present in their HTTPS requests to google-analytics.com/collect, meaning they were sending their customer's full IP addresses to Google. As you may already be aware, an IP address can be converted into a physical geolocation through the use of services such as whatismyipaddress.com or ip2location.com.
Why does enabling the “aip” parameter matter?
If one assumes that Google is a good-faith data processor that honors the IP address anonymization request parameter, then there are several reasons why using this parameter may be important.
The first reason is a matter of legal liability and regulatory compliance. Users' IP addresses may be categorized as "personal data" or "Personally identifiable information" (PII) by various legal privacy frameworks, such as EU's GDPR, California's CCPA, and Brazil's LGPD. If a website is sharing personal data with third parties such as Google, this may trigger additional regulatory and compliance requirements. Several sources have noted that IP anonymization may be a requirement for meeting GDPR compliance. In 2016, European Court of Justice reviewed the Breyer case, in which the Court ruled that "IP addresses may be personal data even though information may have to be sought from third parties to identify the subjects."
Google's own documentation suggests the use of various privacy controls if a website may be regulated under GDPR or CCPA. The Google Analytics Terms of Service somewhat ironically stipulate that website owners "will not and will not assist or permit any third party to pass information to Google that Google could use or recognize as personally identifiable information".
The second is a matter of potential punitive action. The GDPR Enforcement Tracker includes 498 examples of fines issued under GDPR by various European Union data protection authorities. Some of these were issued for "Non-compliance with general data processing principles" or "Insufficient fulfillment of data subjects rights". A handful of these fines cost tens of millions of Euros. Google itself is currently facing a class action lawsuit in California over the use of tools like Google Analytics to track users, even when they have switched to "Incognito mode" on their browsers.
The third reason is a matter of simple consumer trust. Robin Berjon, the Vice President of Data Governance at the New York Times (NYT), wrote in July that "privacy is about trust" and "the trust of our readers is essential." Berjon notes that readers "are overwhelmingly unhappy with data being shared with third parties that can use the data for entirely different purposes."
Consumers may lose trust in institutions like the Mayo Clinic if they find out that their browsing patterns, geo-location, and device fingerprints are being relayed by these organizations directly to Google without any anonymization.
So who is sending their users' IP addresses to Google?
By analyzing network HTTPS requests to google-analytics.com/collect or google-analytics.com/__utm.gif endpoint made when a user browses to different websites, one can look for the presence or absence of the "aip" query string or POST body parameters. Of 31,639 websites in the top Tranco 100k, only 4,435 appear to include the "aip" parameter in at least one network request (14%).Screenshot of Chrome browser Developer Tools Network panel, illustrating an HTTPS network request sent to www.google-analytics.com, with the “aip” parameter enabled. This is from one of the 4,435 websites that were observed to be using IP address anonymization in Google Analytics.
This is consistent with consumer expectations. A recent Twitter poll found that the vast majority of consumers expect <25% of websites to properly anonymize their IP address before transmitting data to Google Analytics.Twitter poll conducted in December, 2020
A large number of United States government institutions appear to be using Google Analytics without IP address anonymization. Somewhat ironically, the Federal Trade Commission's (FTC) IdentityTheft.gov is not using IP address anonymization. Other sensitive government websites, such as FBI.gov, FCC.gov, clinicaltrials.gov, studentaid.gov, and the US Patent Office's uspto.gov, also transmit the full IP addresses of their users to Google. California’s ca.gov portal was also in this list.
A previous study found that nearly all US Senators and Representatives (98.9%) are using third party tracking scripts on their Congressional websites. 502 out of 537 (93.4%) members of Congress utilize Google Analytics on their taxpayer-funded senate.gov and house.gov websites. The vast majority of these lawmakers are also relaying their constituents full IP addresses to Google’s servers, including self-proclaimed champions of digital privacy such as Elizabeth Warren or Ron Wyden.Table showing which domains are using IP address anonymization in their Google Analytics accounts.
In Europe, where there are stricter regulations on consumer PII, far fewer government websites use Google Analytics to begin with. However, even here one can observe possible examples of PII leakage. For example, the Polish government's portal for requesting EU grants, funduszeeuropejskie.gov.pl and the Polish national pension and social insurance agency ZUS (zus.pl), do not appear to use the "aip" parameter.
Major newspapers in virtually every EU country were observed to be using Google Analytics without IP address anonymization. German news portals t-online.de and stern.de, the French newspaper Le Figaro, the Polish Gazeta Wyborcza, the Dutch public broadcaster NH, Austrian daily Kurier, the Italian IT industry site agendadigitale.eu, and the leading European Biotech news site labiotech.eu were all observed making Google Analytics requests without the "aip" query string.
Even venerable medical information sources such as the Mayo Clinic (both mayoclinic.com and mayoclinichealthsystem.org), Cleveland Clinic, healthcare.gov, clinicaltrials.gov, and Duke University Health System were operating Google Analytics without the "aip" parameter. This theoretically means that, if an individual user was browsing articles about sexually transmitted diseases or depression on the Mayo Clinic's website, Google would know about this and could target that IP address with ads for STD medication or mental health treatments.
Other entities that are 'copying' Google Analytics data
It appears that a number of third parties are making direct copies of data sent to Google. When a user browses to a webpage like feedingamerica.org or marchofdimes.org, these pages send over a dozen different data points to Google Analytics' server. These include:
- cid - Google Analytics client ID, which is a “unique identifier for a browser–device pair”
- uid - Google Analytics user ID
- _gid - Used to distinguish users for 24 hours
Each of these parameters is a highly unique identifier that can be used to label users or devices when they browse the internet. It appears though that these unique identifiers are being copied and sent to other domains besides Google's. For example, on feedingamerica.org or marchofdimes.org, the Google Analytics _gid parameters are also being sent to 'px.steelhousemedia.com', which is owned by a California-based ad tech company. Browsing on allrecipes.com or cargurus.com shows Google Analytics query string parameters being copied to beacon.krxd.net, which is owned by Krux, a Data Management Platform that was acquired by Salesforce in 2017.Screenshot from Chrome browser’s Developer Tools on feedingamerica.org, illustrating that the some of the user ID query string parameters sent to Google Analytics are also being sent to px.steelhousemedia.com
Conclusion & recommendations
This analysis was performed from a static US IP address, which may bias the results. For example, it is possible that many websites behave differently with regards to Google Analytics telemetry if a user is in a different geographic jurisdiction. The study also did not look at other Google Analytics query string parameters, such as "npa" (for disabling advertising personalization", “ua” (for user agent override) or “uip” (for user IP address override). Nonetheless, this study illustrates that the vast majority of web properties that have installed Google Analytics do not appear to instruct Google to perform server-side IP address anonymization before processing HTTPS requests.
One possibility for this is the need to obtain very granular information on user geo-location. A previous study found that adding IP anonymization reduces the accuracy of user city-level geo-location data (but not country-level data).
On a further note, even when a web developer enables the "aip" parameter, HTTPS requests are still being made from a users' browser to the Google Analytics server. The anonymization feature is purely a server-side functionality that Google has enabled prior to saving the data to disk. Web developers who use Google Analytics and have enabled this feature are effectively trusting a third party to be correctly executing this IP address anonymization on their hardware.
To avoid the risk of leaking any personal data to Google's servers, web developers can consider a number of privacy-focused alternatives. Many of these are designed with GDPR and CCPA compliance by default, and may give more accurate page view data by virtue of not being on ad block lists. Some of the ones the most highly recommended are: