Updated Feb. 2nd, 2021 - In my original post, I described data that appeared to be advertiser blocklists. However, I also noted that my conclusions were based on purely observational data, and I invited readers to submit additional information. Recently, we received additional information from Integral Ad Science (IAS) and the following response from an IAS spokesperson: 'This blog post reflects a misunderstanding of Integral Ad Science (IAS), its technology, and how advertisers work with IAS to address their brand suitability and safety needs. The data that has been reported to be advertiser keyword blocklists are, in fact, not advertiser keyword blocklists.' My original post also included references to specific brands. Based on IAS's response, I've decided to remove any ambiguity and edited the original post to remove these brands.

⚠️ Here is the AirTable of the keyword lists (link)

Viagra. Suing Google. Torture. Christianity. Buddhism. Islam. Cannabis. Blunt. Methamphetamine. European Commission, Wiretapping. Liquidation. Idiots. Michael Flynn. Hijab.

What does this random bag of words have in common?

All of these words and phrases appear in “Negative_Keywords_3.2”, a keyword list used by Integral Ad Science, an American ad tech company that specializes in digital ad verification.

Integral Ad Science (IAS) scans the text content and the URL strings of millions of web pages, and uses Natural Language Processing (NLP) techniques to extract the entities, key phrases, and general sentiment of each article. This information can be used to help decide whether a given article is ‘brand safe’ for an ad to appear on. The intended goal of technologies such as this is to prevent ads from appearing on violent or extremist webpages.

Advertisers who use IAS (or other semantic contextual solutions) can configure specific keyword or sentiment lists to control where their ads appear. If a brand producing luxury cat food wants to avoid their ads from loading on pages about dogs, they can add the keyword “dog” to their avoidance keyword list. Generally speaking, this should prevent the feline food company’s ads from showing on webpages that contain the string “dog”.

A list of the most commonly blocked keywords in November, 2019. Screenshot of list from IAS's website.

Through careful inspection of the Javascript code present on several dozen websites, as well as by looking through tens of thousands of webpage recordings in repositories such as URLScan.io, MIT Common Crawl, and the Internet Archive, it is possible to assemble a dataset of what appear to be IAS customer keyword lists. This analysis discusses such a dataset, which consists of 445 different keyword lists.

Screenshot of a CNBC article showing the Adalytics browser extension

The Adalytics browser extension allows users to see some of the brand safety related data on different news websites. Here you can see various admantx.com derived keyword lists on a cnbc.com article by Megan Graham.

Several dozen international and Fortune 500 companies appear in this dataset. Numerous brands or projects, such as the male enhancement medication Cialis or the NASA Space Launch System (SLS), also appear to have their own IAS keyword lists.

The dataset contains 7,020 distinct keywords or phrases from the union of these different keyword lists. These lists include what may be scary or sad subjects such as “rape” and “murder”, as well as social justice topics, such as “Black Lives Matter”, “class action”, “1989 crackdown” (possible Tiananmen Square allusion), “Jamal Khashoggi”, and “bisexual”. But there are also many idiosyncratic keywords, such as “Cook Islands”, “Oprah”, “Taylor Swift”, and “Kobe Bryant”.

Introduction

Note: If you want to jump straight into interesting findings, please go directly to the Results sections. If you want some background on contextual brand safety technology or how this analysis was performed, these Introduction and Methodology sections are intended to provide context.

A previous Adalytics blog post discussed the concept of brand safety and named some ad tech companies that provide brand safety services. That study showed that tens of thousands of articles on major news sites appear to be labeled as “unsafe” by brand safety vendors such as Oracle Grapeshot.

For example, 21% of economist.com articles, 30.3% of nytimes.com, 43% of wsj.com, and 52.8% of articles on vice.com were labeled as brand “unsafe” by at least one vendor.

The analysis noted how this text-based technology can be highly sensitive to the presence of certain keywords. For example, an article in The Economist that talked about progress in immunology research was marked as “unsafe”, likely due to the usage of the term “cell death” for apoptosis.

The BRANDED newsletter covered the research. Marketer advocates Nandini Jammi and Claire Atkin noted that it is possible that the pervasive marking of news articles as “unsafe” may deprive journalistic outlets of much-needed advertising revenues.

A subsequent Adalytics blog post noted that certain journalists are disproportionately likely to have their content marked as “unsafe”.

Nicholas Kristof, a double Pulitzer Prize winner who frequently advocates for human rights, has over 300 of his NYT articles (60.5%) labeled as brand unsafe.
Rukmini Callimachi, a four-time Pulitzer Prize finalist who covers ISIS and violent extremism (91.7% unsafe)
Jan Ransom, who covers criminal courts and jails in New York City, and who covered the trial of Harvey Weinstein (92% unsafe)
Marilyn Stasio, who writes about crime fiction for the Book Review (89.7% unsafe)
Ali Winston, an investigative reporter who covers the NYPD (97% unsafe)

Even journalists covering relatively ‘benign’ topics, such ad tech and marketing news, are not immune from having their content registered as “unsafe”. Megan Graham, a reporter who covers advertising and marketing for CNBC and previously worked at Ad Age, had 23 out of her cnbc.com 219 articles (11%) marked as “moat_unsafe”. 15 were unsafe due to “gv_death_injury”, 7 were marked as unsafe due to “gv_arms”, and 3 were marked as unsafe due to “gv_tobacco”.

The previous posts focused on websites that primarily use brand safety Javascript code from either Comscore or Oracle Moat and Grapeshot. This article focuses on websites that use Javascript code from Admantx, a division of Integral Ad Science.

👉 Do you like this research? Keep track of what ads you are shown and participate in ad tech research: sign up for the Adalytics browser extension.

Methodology

For further details and methodology, please reach out here or @kfranasz.

If you open the Chrome browser’s developer tools console, and navigate to the reuters.com landing page, you can see what appear to be names of various brands and companies.

Screenshot of Chrome browser developer tools console on the reuters.com landing page

Screenshot of Google Chrome browser’s Developer Tools console logs after opening reuters.com. Note the variables, such as ‘MSFT_Neg_1’, ‘SaudiAramco_Negative’, ‘IBM_neg’, ‘Intel_Negative_keywords’, or ‘JPMorgan_Neg’. These values originate from admantx.com, a domain owned by Integrаl Αd Science’s Admantx platform.

By checking the Chrome network tools console, one can see that this list of company names appears to originate from admantx.com, a domain that was registered to the Italian startup Admantx. Admantx was a semantic contextual intelligence technology company that was acquired by IAS in 2019.

These Admantx string variables appear to be incorporated into the Reuter’s Google Publisher Tag (GPT), and included in network HTTPS requests sent to doubleclick.net, one of Google’s ad serving domains.

Screenshot of a HTTP network request sent to doubleclick.net, illustrating the use of keyword list parameters

Screenshot of a HTTPS request URL sent to securepubads.g.doubleclick.net/gampad/ads, Google’s ad serving domain. This request is generated when a user navigates to reuters.com. One can see the names of some of IAS Admantx keywords in this URL string.

The IAS Admantx technology does not appear to have binary classifications for “safe” or “unsafe”, but rather, seems to allow individual advertisers to control the placement of their ads depending on the presence or absence of individual keywords.

If you open the following thetimes.co.uk article about pro-democracy activists in Hong Kong and search for “euasync01.admantx.com/admantx/service” in the Chrome developer tools Network panel, you can see not only the names of various keyword list names, but also the individual keywords that are present in those lists. For example, one can see the keyword lists name “bs_custom_VW_neg”, the “targeting” parameter “Avoidance”, and the children “lawyer”. In object-oriented programming, a “child” object is a property of a given variable or data segment. This would suggest that the given IAS client, wants to avoid placing their ads on webpages that contain the word “lawyer” somewhere in the text.

Screenshot of Chrome browser developer tools on a thetimes.co.uk article, showing specific keywords and keyword lists

Screenshot of Google Chrome browser’s Developer Tools after opening an article on The Times, showing what appears to be keyword lists data from IΑS Admantx. In this example, a keyword list with the title “bs_custom_VW_neg” contains the children “lawyer”. Child is used in computer science contexts to describe a property of a given variable or collection. The Times article contains the keyword “lawyer”, and the “targeting” parameter “avoidance” suggests that this given IAS client wants to avoid showing their ads on articles that contain mentions of “lawyer”.

In this specific Times article, one can also see that IAS data is incorporated into the pbjs header bidding settings and in prebid network calls to Rubicon Project's ad auction servers.

Screenshot of Chrome browser developer tools showing IAS adserver targeting in pbjs

Screenshot of Chrome browser developer tools showing IAS query string parameters in a network HTTPS request

Screenshot of Google Chrome browser’s Developer Tools, showing header bidding settings and network calls that utilize data from IAS. The query string parameters shown in the second screenshot are from a network request to rubiconproject.com. Rubicon Project is a digital advertising infrastructure company based in Los Angeles.

A number of major publishers appear to use IAS Admantx contextual semantic data on their websites. These include CNBC, Politico, Reuters, thetimes.co.uk, Rolling Stone, NBC News, MSNBC, Daily Mail, metro.co.uk, La Repubblica (Italian newspaper), corriere.it, tijd.be (Belgian business news site), and today.com. You can also find thousands of record network calls made to admantx.com in web archives such as URLScan.io, by searching for scans that include filename:"admantx.com". Here is an example HTTPS request that was sent to admantx.com from the Belgian business newspaper tijd.be, and recorded by URLScan.io. One can find similar recordings in other web capture archives, such as MIT Common Crawl and Internet Archive.

By cross-referencing data from tens of thousands of pages on different publishers that use IAS Admantx, one can assemble a list of keyword lists and the corresponding keyword ‘children’ that make up each list. Some keyword lists only appear on certain publisher domains - for example, the list titled “BP_Politico_Keyword Blocking_Mar2019” was only observed on some articles on politico.com. Therefore, it is not definitive whether brands that may be using IAS’s technology are applying their keyword lists in the same fashion across all publisher domains. It is possible that the lists are both keyword and publisher domain specific.

Important note: this analysis is entirely observational, and cannot in any way draw firm conclusions about the usage of these keyword lists. Ultimately, the values seen herein are simply Javascript string variables that are seen in HTTPS data coming from admantx.com and being sent to various ad server domains like doubleclick.net and rubiconproject.com. The keyword lists may be set up by someone other than the brands named. These string variables could be used for ad blocking and avoidance, or they could be used merely for post-impression monitoring. In some cases, the keyword lists may be used for contextual targeting (i.e. an advertiser wants their cat food ads to show up on articles about cats). It appears that many keyword lists that contain the abbreviation “BS” (for “brand safety”), or the stems “_neg” or “_negative” contain keywords that a brand might intend to avoid.

Many of these lists were seen in conjunction with the “Targeting” variable “avoidance”. Lists that lack these stems are more likely to be contextual targeting lists. For example, the keyword list “JuliusBaer2020_Retirement” (reference to a Swiss wealth management company) contains the keywords “home ownership“, “internal revenue service“, and “retirement savings“. A keyword list called “JacksonHewitt” (a tax relief finances company), only contains the keyword “tax relief”. In these cases, it seems likely that the lists are being used to place rather than block advertiser’s bids on given pages.

Results - List of brands that may be using IAS Admantx

The interactive data table below contains 445 different keyword lists from admantx.com that were seen on websites such as reuters.com or cnbc.com. This table is organized by keyword list titles, such as “JPMorgan_Neg”, “MSFT_Neg”, or "ExxonBrandSafety”. For each keyword list, you can see how many distinct keywords or phrases were present in that list. For example, “JPMorgan_Neg” contains 1,080 different keywords, while “Toyota_Neg” contains 411 different phrases. The keywords column enumerates the specific words or phrases found in a given list - e.g. ”JPMorgan_Neg” contains the words “taylor swift”, “soldier shot”, and “dangerous weapon”. Note that you can expand a given list by clicking into it to see a detailed view (through the left-most column) - this is particularly useful if you are browsing on mobile.

Interactive AirTable illustrating 445 keyword lists that were observed coming from admantx.com. The table shows the title of a given keyword list (such as “JPMorgan_Neg”), the number of distinct keywords or phrases in a given list, and the actual words that were found in that list For example, the list ”JPMorgan_Neg” contains the words “taylor swift”, “soldier shot”, and “dangerous weapon”, in addition to 1,077 others. The table is interactive - you can click into an individual keyword list (through the leftmost column) to expand it and better view the words present in each keyword list.

Absent of external confirmation, it is not possible to fully understand what each keyword list is used for. However, given the presence of the “targeting” parameter value “avoidance” used in conjunction with many of these lists, as well as the presence of strings such as “brandsafety” or “_negative”, one can reasonably infer that a subset of these lists are being used by IAS’ customers to prevent their ads from rendering on pages containing these specific keywords.

It appears that IAS’s clients include many of the Fortune 500 companies and encompass many different industries.

Amongst finance companies, one can see keyword lists such as “JPMorgan_Neg”, “MStanley_Neg”, “Citibank_Neg”, “Schwab_Negative”, “Fidelity_Negative” and “UBS_BrandSafety”. The list “JPMorgan_Neg” is the largest observed in this dataset, and includes 1,080 different keywords. This list was observed across a large number of different reuters.com articles. It includes terms such as “armed man”, “oral sex”, “taylor swift”, “adele”, “nerve agent”, and “lebron james”. If this is indeed an exclusion or avoidance list for ad placements on reuters.com, it is not clear who configured this list or why they included so many different terms. According to Adweek and CampaignLive “JPMorgan handles its $200 million digital account media in-house” and launched “its internal media agency Inner Circle in 2015.”

While one can infer why JPMorgan’s Inner Circle arm might consider articles about oral sex or nerve agents to be contentious, in it is unclear why they would seek to avoid articles about celebrities. One possibility is that these individuals are brand ambassadors for JPMorgan’s competitors. For example, Taylor Swift has partnerships with American Express and Capital One.

Several pharmaceutical companies or drug brands appear to have their own keyword lists, including “J&J_talc_negative”, “Johnson_Johnson_BP”, “Eli_Lilly_Cialis”, “Cialis”, “Cialis_2”, “Tagrisso_Negative”, “BoehringerJardiance_Neg”, “Abbvie-Q120-BlockList”, “AbbvieBlockListPart2”, and “Shire_Vyvanse”.

In the “J&J_talc_negative” list that was observed on cnbc.com and msnbc.com, one can see words such as “sanctions”, cancer”, “court”, “oncology” and “lawsuit”. It is likely that these are related to a recent class action lawsuit, in which Johnson & Johnson was ordered to pay “$2.12 billion damages award to women who blamed their ovarian cancer on asbestos in its baby powder and other talc products.”

The three Cialis-related lists may be an example of where an advertiser (the pharma company Eli Lilly) is using IAS Admantx technology for different purposes. The “Cialis_2” lists contain many words which may have negative connotations, such as “addiction”, “substance abuse”, “rapist”, “sexual offence”, and “massacres”. This list may therefore be an avoidance or exclusion list. On the other hand, the “Eli_Lilly_Cialis” list contains terms such as “business and finance”, “personal finance” “stocks and bonds”, “European Union” and “market”. This second list may be used by Eli Lilly to specifically target ads to businessmen who are reading finance related news articles.

Screenshot of the Adalytics browser extension on a CNBC article about a Chinese electric car company. You can see the keyword list "Eli_Lilly_Cialis" present on the article.

There are several examples where the name of a brand appears in both the list title and within the keywords of that list. For example, the list “Boeing_Neg” contains both the terms “boeing 737 max” and “boeing”. The lists “facebook-negative-june2018”, “FB2020BlockList2”, and “FB2020BlockList1” all contain the word “Facebook”, in addition to words such as “Cambridge”, “sheryl sandberg”, and “Mark Zuckerberg”. The list “Google_NegativeKeywords_Feb2019” contains the term “Google”, in addition to others such as “terror attack”, “school shooting”, and “sexual misconduct”. In these cases, it is possible that the brands are trying to avoid placing ads on articles that mention the brands by name in order to avoid some form of negative publicity.

👉 Do you like this research? Keep track of what ads you are shown and participate in ad tech research: sign up for the Adalytics browser extension.

Results - List of ‘interesting’ keywords

The interactive data table below contains 7,020 different keywords from admantx.com. Some of these keywords or phrases only appear in a single keyword list, while others, such as “terrorism”, appear in over 100 different lists.

The table is organized by keyword, and you can see which lists contain that given keyword by clicking into a given row. For example, by clicking on the row containing “terrorism”, you can see that this word appears in the lists “Politico-Boeingkeywordslistnov2019”, “UBS_Keywords3” “MSFT_Neg”; “BofA_Neg_Topics”, and 106 other lists.

Interactive AirTable illustrating 7,020 keywords that were observed coming from admantx.com. The table is indexed by keyword, and shows how many lists contain that given keyword. For example, the keyword “isis” appears in 68 different lists, including “Politico-Boeingkeywordslistnov2019” and “Chase_Negative”. You can click into each row (on the leftmost column) to expand it and better see which lists contain a given keyword.

The top of the data table is dominated by keywords that could hold a negative connotation. “Violence” and “terrorism” were seen in over a hundred different keyword lists. Each of the top 25 most prevalent keywords are related to either violent or adult content themes. However, as you scroll down to the bottom of the data table and look at the keywords that appear only on a handful of lists, this is where more ‘unusual’ terms make an appearance.

For example, the phrases “black lives matter”, “black lives”, “blacklivesmatter”, or “#blacklivesmatter”, appear in keyword lists such as “Barclays_Negative2”, “Microsoft_Negative”, “Cartier_Neg”, “AbbvieBlockListPart2”, “Toyota_Neg”, “Vmware_neg”, “Sabic_Negative”, “ExxonBrandSafety”, and “Smuckers_Jiff_Neg”. It is not clear who created these keyword lists or how they are being exactly utilized. However, it is possible that these companies, some of which have publicly professed to support the Black Lives Matter initiative, are blocking their ads from monetizing articles that use the phrase.

“Jamal khashoggi” or “khashoggi” appears in the lists “JanusJMBS_Neg”, “Citibank_Neg”, “SaudiAramco_PG_BlockList_Feb2020”, “IBM_JeffreyEpstein”, and “Sabic_Negative”. Saudi Aramco is the Saudi Arabian public petroleum and natural gas company, while SABIC is a Saudi Arabian multinational chemical manufacturing company. Janus Henderson is a British global asset management company. Both Janus Henderson and Citibank may have been involved in pre-IPO discussions with Saudi Aramco. Please note these are all circumstantial observations, and may not in fact be in any way related.

Screenshot of the Adalytics browser extension on a Politico article about a deceased Washington Post journalist. You can see the keyword list "Citi_KeywordBlockingList_August2019" present on the article.

The term “battery” appears on the “Samsung_Neg” list, which may be related to incidents of battery fires on Samsungs’ Galaxy Note 7 phones. “Cambridge” and “cambridge analytics” appear in the “FB2020BlockList2” and “Mobkoi_FB_Negative” lists respectively. “1989 crackdown”, a possible allusion to the 1989 Tiananmen Square protects, appears in the list “Cartier_Neg”. Various legal and regulatory phrases make a few appearances. “Senate judiciary” and “senate vote” both appear in the “Citibank_Neg” list, while “financial regulation” is in the “CME_Negative” and “CME_Neg” lists.

Numerous celebrities, such as Taylor Swift, Justin Bieber, Adele, Kobe Bryant, Lebron James, Beyonce, Jay-Z, and Oprah, appear in one or more keyword lists.

👉 Do you like this research? Keep track of what ads you are shown and participate in ad tech research: sign up for the Adalytics browser extension.

Conclusion

Caveats

I will emphasize again: this study was purely observational. I have not reached out to the brands mentioned in this article. The string variables seen in various Javascript parameters and network calls made to admantx.com, doubleclick.net, and prebid servers such as rubiconproject.com appear to contain the names of various brands and related keywords, but their exact function is unknown. One cannot be sure who configured these string variables, for what purpose, or how they are exactly used.

While this analysis enumerates numerous hypotheses about why specific advertisers may be using certain keywords for ad targeting avoidance lists, these hypotheses may be completely false or incorrect. This exploratory analysis is meant as a starting point rather than a final conclusion. To fully understand the significance of these keyword lists, one would have to reach out directly to either IAS or potentially to the brands which are named in the various list titles.

Additionally, this analysis is based on historical data and various web archives. It is not comprehensive, and the keyword lists are also likely to change over time. Additionally, publishers may remove the IAS Admantx Javascript code from their websites any time in the future.

Lastly, the keyword list analysis was performed by taking the union of observed data points from different publishers. But that does not necessarily mean if a list contains the word “child”, that the advertiser is blocking their ads from all articles across all news sites that contain the word “child”. One example of this was the “BP_Politico_Keyword Blocking_Mar2019” list, which was only observed on politico.com. The lengthy list “JPMorgan_Neg” was only observed on reuters.com articles.

Discussion

If one assumes that these keyword lists are in fact directing the placement of various brands’ advertisements and are used for avoiding certain text content, then this raises several follow-on questions.

Firstly, what is the motivation behind many of these keywords? By blocking ads from certain commonly occurring phrases, these advertisers may be effectively preventing their brands from showing up on tens of thousands of news articles. A previous Adalytics blog post documented how another brand safety vendor was labeling tens of thousands of New York Times, The Economist, and Wall Street Journal articles as “unsafe”. Are the keyword lists being configured by IAS, by the brands themselves, or by their media buying agencies?

Secondly, does it make sense for certain brands to publicly profess they support social initiatives such as Black Lives Matter while simultaneously hiding their ads from news articles that discuss the movement?

Thirdly, what is the benefit to publishers like Reuters or CNBC to use this third party contextual semantic technology on their websites? Does using such technology actually increase ad revenues or attract certain premium brands which would otherwise take their advertising budgets elsewhere?

Take away points

Several Fortune 500 companies appear in the keyword lists of a major brand safety vendor
Some keyword lists are hundreds of phrases long

👉 Do you like this research? Keep track of what ads you are shown and participate in ad tech research: sign up for the Adalytics browser extension.

Major brands appear to be blocking ad targeting keywords

Introduction

Methodology

Results - List of brands that may be using IAS Admantx

Results - List of ‘interesting’ keywords

Conclusion

Caveats

Discussion

Take away points

Receive future blog posts