Tens of thousands of news articles are labeled as unsafe for advertisers

Updated Feb. 2nd, 2021 - In my original post, I described data that appeared to be advertiser blocklists. However, I also noted that my conclusions were based on purely observational data, and I invited readers to submit additional information. Recently, we received additional information from Integral Ad Science (IAS) and the following response from an IAS spokesperson: ‘This blog post reflects a misunderstanding of Integral Ad Science (IAS), its technology, and how advertisers work with IAS to address their brand suitability and safety needs. The data that has been reported to be advertiser keyword blocklists are, in fact, not advertiser keyword blocklists.’

Most contemporary news publishers depend on ad revenue in order to finance their journalistic endeavors. Unfortunately, it seems that a number of brand safety technology vendors consider thousands of articles on leading news sites to be unsuitable environments for marketers to advertise on. 

Companies like Mοat, Grapeshοt, Cοmscοre, and Integrаl Αd Science (IΑS) offer advertisers tools that scan the text content of a given webpage and use AI or simple keyword heuristics to determine whether a given article is “brand unsafe.” The technology can block a given brand’s digital ads from loading on a given article, potentially changing the pricing dynamics of digital ad auctions and reducing revenues for the news outlets that published those articles.

By carefully examining the Javascript code and network traffic from various publisher websites and public archive data sources, this exploratory study shows how an estimated 21% of economist.com articles, 30.3% of nytimes.com, 43% of wsj.com, and 52.8% of articles on vice.com are being labeled as “brand unsafe.”

The analysis also shows how in some cases, different brand safety vendors disagree with each other with regards to how they classify articles as safe or unsafe. For example, on wsj.com, Cοmscοre and Mοat’s brand safety tech is only in agreement ~58.7% of the time when classifying an article as safe or unsafe. 

Lastly, this study illustrates how journalists who focus on certain “serious” topics, such as Middle East affairs, obituaries, or political events, are disproportionately likely to have their work marked as “unsafe” by brand safety vendors.

Background

Why is digital advertising revenue so important for journalism?

Why should an average citizen care about digital advertising? Because advertising is the most critical revenue stream for news media and journalism. 69% of all US domestic news revenue is derived from advertising. According to Tim Hwang, the author of The Subprime Attention Crisis: Advertising and the Time Bomb at the Heart of the Internet, “advertising is a critical if tenuous force for funding journalists.”  As print advertising has declined over the last decade, news publishers have moved increasingly to online ads. However, in recent months, global events such as the pandemic have contributed to a drop in digital ad revenues. At Vice Media, 155 people lost their jobs. Quartz laid off 80. Condé Nast cut 100 due to falling ad revenues. The Guardian expects revenue to drop by $27 million, or 10%, and furloughed 100 of its staff.

The insertion of numerous middlemen into the digital advertising supply chain has also cut into publisher revenues. One study by The Guardian suggests that some 70 percent of the money spent by ad buyers is consumed by ad tech middlemen, with the news publisher retaining only a remainder.

Another potential siphon of publisher revenues is brand safety technology, an automated ad tech service which tries to prevent digital ads from appearing “adjacent to or in a context that can damage an advertiser’s brand.”

What is brand safety & keyword blocking?

According to the Internet Advertising Bureau (IAB), brand safety is defined as keeping a brand’s reputation safe when they advertise online. In practice, this means avoiding placing ads next to inappropriate content. An advertiser may wish to avoid funding pornographic sites or violent video producers via their ad spend.

In response to ads being placed next to undesirable content, companies have cut advertising budgets and pulled ads from online advertising and social media platforms. 

To address this need to prevent ads from showing next to ‘inappropriate’ content, a number of ad tech vendors have developed brand safety technology that can scan the context of a given article or webpage, and then use natural language processing (NLP) or keyword matching heuristics to classify the article. These systems are highly automated, and can prevent ads from appearing on a given webpage, or from even being entered into an ad auction.

How does brand safety affect news publishers’ revenue?

Over-zealous or non-specific use of keyword blocking and brand safety technologies has in some cases deprived publishers of ad revenue. Buzzfeed reported that in March 2020, one unnamed brand, which typically spends $3 million on advertising, saw its ads blocked more than 35 million times by IΑS.

In March, IΑS automatically blocked 309,726 — roughly 36% — of ads that a specific brand attempted to place on the New York Times’ website. In January, only 3% were blocked, and in February, 6%. This ten-fold increase from January to March was partially attributed to widespread blocking on Covid19 related keywords. The Tweet below from ad tech expert Ari Paparo illustrates an example of a brand safety “swap” occurring on nytimes.com’s front page, where the tech concealed an ad that had won the auction for that placement.

null

Twitter screenshot illustrating how an ad slot is replaced by an image of clouds. This occurs when an advertiser wins an ad auction but a brand safety tech vendor classifies the given page as “unsafe” and tries to conceal the underlying ad from being seen.

Thirty-four percent of the ads the company attempted to place on USA Today's website were blocked in March, as were 45% of those on the Washington Post's website, and 29% on CNN's website. In total, nearly 2.2 million ads for the brand were blocked from appearing. 

In a June 2020 analysis, Vice Media Group discovered that content related to the death of George Floyd and resulting protests was monetized at a rate 57% lower than other news content. Vice claimed that this was attributed to brands and agencies blocking their ads from appearing on articles related to such sensitive topics.

Many advertisers and publishers do not currently have a full understanding on how many articles are blocked by brand safety tech. This exploratory study seeks to address that outstanding question.

Methodology: extracting brand safety labels from public internet archives

For further details and methodology, please reach out here or @kfranasz.

Websites that use client-side header bidding code for ad auctions often have brand safety Javascript tags installed on their websites. When a user goes and open a website like msnbc.com or wsj.com, their browser makes an HTTPS network request to domains like admantx.com (owned by Integrаl Αd Science),  mb.mοatads.com (owned by Οracle Mοat), zqtk.net (owned by Cοmscοre), or gscontxt.net (owned by Οracle Grapeshοt). 

The output from these network calls is stored in various window Javascript variables which can be accessed from the Chrome Developer Tools console. The responses from these network calls are also stored in various web capture datasets, such as the Internet Archive, URLScan.io, and MIT Common Crawl. These public resources include tens of thousands of website article crawls and captures, which offer a rich dataset for understanding brand safety tech. These datasets, as well as information from whotracks.me, builtwith.com, and wappalyzer.com were used in this study to analyze brand safety values from Cοmscοre, Mοat, Αdmantx, and Grapeshοt. This list is by no means exhaustive, but serves as an exploratory sample of brand safety values for various news or content publishers.

Any consumer can open the Chrome browser’s Developer Tools console, load a webpage like wsj.com, and see what appear to be the corresponding brand safety values from some of these aforementioned vendors. In some cases, these lists include items like the categories assigned to a given article (‘safe’ vs. ‘not safe’) or contextual topic classifications (‘gs_covid’, ‘gv_military’, ‘gv_crime’, etc.). In certain cases, it is also possible to see plaintext, human readable list of what appear to be company specific keyword blocklists (i.e. ‘neg_cg_apple’, ‘neg_geico_us’, ‘custom_fr_gucci_english_brand_safety’, ‘neg_airbnb_uk, ‘ ‘neg_marriot_us’), which may potentially indicate that a given company has instructed the brand safety ad tech vendor to block said company’s digital ads from appearing on a given website. It appears that these brand safety values are incorporated into client side header bidding HTTP network calls made to ad servers or ad auctions. For example, the ‘gs_notsafe’ value is seen in network calls to doubleclick.net, a domain owned by Google that is used for serving digital ads. 

If a user opens the Chrome web browser and navigates to the reuters.com landing page, they can see a list of of what appear to be company specific brand safety keyword lists from IΑS Αdmantx:

Screenshot of Chrome Developer Tools console logs on reuters.com

Screenshot of Google Chrome browser’s Developer Tools console logs after opening reuters.com. Note the variables, such as ‘MSFT_Neg_1’, ‘SaudiAramco_Negative’, ‘IBM_neg’, ‘Intel_Negative_keywords’, or ‘JPMorgan_Neg’. These values originate from admantx.com, a domain owned by Integrаl Αd Science’s Admantx platform.

Αdmantx was a company that developed Natural Language Processing for scanning and categorizing large numbers of websites according to their text contents. Αdmantx was acquired by Integrаl Αd Science (IΑS), an ad fraud and brand safety vendor, in 2019.

Screenshot of Chrome Developer Tools console logs on www.thetimes.co.ukScreenshot of Google Chrome browser’s Developer Tools after opening an article on The Times, showing what appears to be brand safety data from IΑS Admantx.

As another example, a user can open the Chrome developer tools console, go to the landing page of wsj.com, and type the following command in the console: `pb_keywords.brandsafe` or `mοatPrebidApi.getMοatTargetingForPage()`. This should reveal what appears to be brand safety classification originating from zqtk.net (owned by Cοmscοre) or mb.mοatads.com (owned by Mοat), respectively.

Screenshot of Chrome Developer Tools console logs on wsj.com

Screenshot of Google Chrome browser’s Developer Tools console logs after opening wsj.com and typing the command `mοatPrebidApi.getMοatTargetingForPage()`. Note the variables, such as ‘mοat_safe’. These values originate from mb.mοatads.com, a domain owned by Οracle’s Mοat division.Screenshot of Chrome Developer Tools console logs on wsj.comScreenshot of Google Chrome browser’s Developer Tools console logs after opening wsj.com and typing the command `pb_keywords.brandsafe`. Note the variables, such as ‘brandsafe: safe’. These values originate from zqtk.net, a domain owned by Cοmscοre.

For websites that included Javascript code from Mοat, a given article was considered to have been classified as ‘unsafe’ if the Mοat variable contained the keyword ‘mοat_unsafe’. If it contained the word ‘mοat_safe’, then it was considered to have been marked as ‘safe’ for the purposes of this analysis. For websites using Javascript from Cοmscοre, an article was considered to be unsafe if ‘pb_keywords.brandsafe’ was equal to ‘notsafe’. For websites that used Javascript code from Grapeshοt, a given site was considered to be ‘safe’ if it contained the term ‘gv_safe’ (based on information released in Grapeshοt’s documentation), and ‘unsafe’ if it included any labels such as ‘gv_crime’, ‘gv_adult’ or other unsafe categories. For websites that used IΑS Αdmantx, there was no clear designation that explained whether an article was marked as ‘safe’ or ‘unsafe’ - it appears that Αdmantx includes lists of brand specific keywords or sentiment classifications. In these cases, articles were analyzed on the quantity of ‘negative’ category lists, such as ‘ErnstYoung_Neg’, ‘Barclays_Negative’, or ‘IBM_Negative’.

As mentioned earlier, public and open-source resources, such as MIT Common Crawl, whotracks.me, builtwith.com, URLScan.io, Wayback Machine Internet Archive, and wappalyzer.com were used in this study to collect a larger sample set of brand safety values from Cοmscοre, Mοat, Αdmantx, and Grapeshοt.

How many news articles are labeled as “unsafe”?

By sifting through tens of thousands of web page captures in the MIT Common Crawl, Internet Archive, and urlscan.io, it is possible to compare which articles on given news sites appear to be labeled as “unsafe” by various brand safety vendors. Surprisingly, even respected news sites such as The New York Times or Wall Street Journal have thousands of articles labeled as “unsafe”.

The table below illustrates what percentage of articles on a given site are classified as “unsafe” by a given vendor. For example, out of the 1,980 articles on economist.com, 424 were labeled as “unsafe” by Οracle Mοat and 1,556 were labeled as “safe.” Based on this sample, Mοat considered 21.4% of The Economist’s news articles to be unsafe” for digital ads to appear on.

Table illustrating the number of “safe” versus “unsafe” articles (as a percentage) published on 27 different news and content websites. This table includes information from Mοat, Grapeshοt, and Cοmscοre brand safety or contextual semantic intelligence Javascript code.

In this dataset, 6,874 nytimes.com articles (or 30.3% of the total) were labeled as unsafe by Οracle Grapeshοt’s contextual intelligence solution. Many of these articles were labeled as unsafe due to “gv_death_injury” or “gv_crime.” According to Grapeshοt’s official documentation, these are “avoidance categories.” 

The American Association of Advertising Agencies (4A’s) Advertiser Protection Bureau (AP) introduced a Brand Safety Framework in 2018. This framework lists thirteen content categories that, "pose risk to advertisers, whereby advertisers might choose to adopt a 'never appropriate' position for their ad buys." These 13 categories are mapped to corresponding Grapeshοt avoidance categories based on webpage textual content. “gv_death_injury” likely is related to “Promotion or advocacy of Death or Injury; Murder or Willful bodily harm to others; Graphic depictions of willful harm to others.” 

In the 4A’s framework, “gv_crime” is defined as “Graphic promotion, advocacy, and depiction of willful harm and actual unlawful criminal activity – murder, manslaughter & harm to others. Explicit violations/demeaning offenses of Human Rights (eg, trafficking, slavery, etc.).”

Whether or not most readers would agree with the 4A and Grapeshοt’s assessment that 30.3% of New York Times articles are advocating for death, or promoting and depicting unlawful criminal activity is a matter for future public discourse.

With regards to the effects of these brand safety values on ad auctions and publisher revenues, it appears that these brand safety values are incorporated into client side HTTP network calls made to ad servers or ad auctions. For example, the “gs_notsafe” value is seen in network calls to doubleclick.net, a domain owned by Google that is used for serving digital ads.

Screenshot of a network request call made to doubleclick.net, with brand safety values passed as query string parametersScreenshot from network call made to Google’s doubleclick.net ad domain when a user browses to nytimes.com. The brand safety value “gv_death_injury” from Οracle Grapeshοt appears in the request call.

Additionally, Οracle’s documentation states that the technology “works in the pre-bid environment: it notifies ad systems to include or exclude pages before an advertising bid has been placed. This feature contrasts with systems that block an ad from appearing after a bid has been placed and won for a spot on a page. By using pre-bid technology, advertisers avoid being billed for placements on which they never bid.” This would corroborate the hypothesis that brand safety technology affects ad auctions and thus potentially publisher revenues. 

null

Screenshot from Οracle Contextual Intelligence documentation.

On certain websites owned by News Corp, such as wsj.com, a close inspection of the client side Javascript code reveals that the ad auction header bidding scripts and Google Publisher Tags are configured to use brand safety labels from IΑS and Οracle Mοat, respectively.

Screenshot of Chrome Developer Tools on wsj.com, illustrating the use of brand safety labels in Google Publisher Tags and Header Bidding

Screenshot of Google Chrome browser’s Developer Tools console logs after opening wsj.com. It appears that brand safety labels from Οracle Mοat and IΑS are incorporated in the website’s Google Publisher Tag and header bidding targeting parameters. “adt” likely stands for “adult”, “alc” is “alcohol’, “drg” is “drugs”, “hat” is “hate”, “off” is “offensive”, “vio” is “violence”.

While the previous tables illustrate in aggregate that thousands of articles are being labeled as brand “unsafe,” they do not reveal to what degree publisher revenues are affected by brand safety tech. Certain articles are likely to draw a greater number of readers than others. If the front page or a viral news article is labeled as “unsafe” and excluded from certain advertising campaigns, it may lead to a disproportionate loss of publisher revenue.

The color map below illustrates how many articles on the New York Times landing page were labeled as “safe” (light green) or “unsafe” (light pink) by Οracle Grapeshοt. A significant portion of the landing page’s prime real estate was labeled as “unsafe.”

null

Annotated colormap of the nytimes.com landing page on December 3, 2020. Pink indicates that a given article was labeled as unsafe by Grapeshοt, whereas green indicates a given article was labeled as safe.

A publisher like the New York Times could take a look at its own internal page view metrics and ad revenue data, and cross reference with these brand safety labels to empirically determine whether or not a label of “unsafe” leads to a significant reduction in ad revenue per article.

What content is most likely to be labeled as “unsafe”?

It is hard to predict what content will be marked as safe or unsafe. A recent New York Post article about a female US soldier committing suicide after getting gangraped in the military was marked as “brandsafe,” while an Economist article discussing progress in molecular biology research was marked as “mοat_unsafe” and “gv_death_injury,” likely due to the use of terms related to apoptosis (biology-speak for “programmed cell death”).

The Economist’s weekly publication is organized into 21 different sections, each of which contain between one to four articles. Each article appears to have brand safety data present from Οracle Mοat. While economist.com overall has on average 21.4% of its articles marked as “mοat_unsafe,” this percentage belies large variation in its specific news sections.  The Middle East & Africa section has half of its articles labeled as “mοat_unsafe,” while the Obituary section has 77% of its articles labeled as “mοat_unsafe.” A recent article about the death of Argentine football player Diego Maradona was marked as “mοat_unsafe” and “gv_death_injury”.

null

Screenshot of Chrome Developer Tools console showing that a recent economist.com article about the death of Diego Maradona was labeled as “moat_unsafe.”

The table below illustrates the relative proportion of articles that were labeled as “moat_unsafe” per each of the 21 economist.com weekly sections.

Table illustrating what proportion of economist.com articles are labeled as “mοat_unsafe,” grouped by section.

It is interesting to note that foreign policy topics are enriched for the “mοat_unsafe” label, while articles about Business or Finance tend to have the least “mοat_unsafe” labels. The cartoons section is always marked as “mοat_safe,” despite the fact that this section often contains illustrations of military, death or political caricatures. 

For example, the cartoon from August 17th, 2017 includes the text “white power” and a Nazi symbol, but is marked as “mοat_safe.” The Oct 22nd, 2020 cartoon image includes the word “sex,” and Sept. 24th, 2020 graphic includes the word “dead,” yet both are marked as mοat_safe.”

This suggests that the Mοat contextual intelligence technology is entirely text centric, and does not use Computer Vision or Optical Character Recognition techniques to classify images or text inside images. Official documentation from Οracle and Grapeshοt seems to corroborate this hypothesis. This raises the possibility that a publisher could “evade” brand safety technology by putting certain controversial pieces of text into images on its web pages.

Somewhat ironically, an Economist article titled “Papers should print offensive language if it is crucial to a story” was marked as “mοat_unsafe” due to the presence of a category called “gv_obscenity.” An article in the Books and Arts section featured an opinion piece which argued that in certain contexts, it is necessary for journalists to be able to directly quote individuals who may use an expletive.

How consistent is brand safety technology?

The Wall Street Journal appears to use both Οracle Mοat and Cοmscοre technology on its webpages, which lends itself to a concordance analysis: how often do the two brand safety vendors agree or disagree in their classifications? How often is an article labeled as “mοat_safe” by Mοat also labeled as “brandsafe: safe” by Cοmscοre? 

null

Screenshot of Chrome Developer Tools console showing that a recent Wall Street Journal article was labeled as “notsafe” based on “pxSegmentIDs” from Cοmscοre’s zqtk.net domain, and “safe” by Mοat’s brand safety tech. In this case, the two systems appear to disagree with regards to whether the given article is brand safe.

Based on a sample of 25,730 wsj.com articles found in various internet archive repositories, it appears that Mοat and Cοmscοre are only in agreement 58.7% of the time. There were 10,614 wsj.com articles labeled as “notsafe”/”safe” or “safe”/“notsafe” by Cοmscοre and Mοat, respectively. For every 10 articles on wsj.com, an average of four would receive conflicting brand safety values from the two vendors. This would suggest that the vendors have highly divergent approaches as to how they classify text as safe or unsafe.

Which marketers are using brand safety tech? 

On certain websites, in addition to seeing the brand safety labels such as ‘safe’ or ‘notsafe’, it is also possible to see what appear to be names of individual clients that use brand safety technology for their digital marketing campaigns.

null

Screenshot of Chrome Developer Tools console showing what appear to be Οracle Grapeshοt brand safety lists on a nytimes.com article. The string “neg_mastercard” or “neg_ibm” may be a reference to brand safety keyword blacklists by Mastercard and IBM, respectively.

Certain brands appear to be blocking thousands of different nytimes.com articles. As per the table below, “neg_mastercard,” “neg_bofa,” and “neg_ibm” each appear on more than 10,000 New York Times articles out of a sample of 22,722 nytimes.com articles.

Table illustrating how many nytimes.com articles have a given Grapeshοt keyword label, out of a total of 22,722 nytimes.com articles.

It is interesting to note that “neg_mastercard” appears on more nytimes.com articles than “gv_safe,” which would suggest that MasterCard may be blocking its digital ads from appearing even on articles that were labeled as safe by Grapeshοt. MasterCard may have a custom keyword blacklist that is thus far more extensive.

The word cloud below visually illustrates some of brands which may be blocking on the greatest number of nytimes.com articles via Grapeshοt’s technology. Brands such as MasterCard, IBM, BP , Chanel, Microsoft, Capital One, Citi, and Rolex can all be seen in these keyword lists.

null

Word cloud illustrating which brands appear to be using Οracle Grapeshοt and which keyword lists appear on large numbers of nytimes.com articles.

Conclusion

Caveats & Limitations

This exploratory study relies on a number of publicly available web page archives, such as urlscan.io, MIT Common Crawl, and the Internet Archive. As such, the selection of articles from each publisher that were used for estimates may be biased or skewed in unforeseen ways. For example, many recent wsj.com articles did not appear in these resources, and as such, could not be included in the brand safety analyses. This could be an artifact of how quickly pages are indexed.

Secondly, this analysis is entirely observational and is based on the presence or absence of certain Javascript string variables or HTML code. The analysis makes certain assumptions as to what individual strings signify. The terms “mοat_unsafe,” “gv_death_injury,” “neg_mastercard,” “brandsafe” or others could theoretically be entirely unrelated to brand safety, which would negate the inferences drawn by this research.

Lastly, this study did not have access to internal pageview metrics or publisher revenues. Thus, it was not possible to assess the effect of brand safety classifications on publisher revenues. This may be possible in the future, if a publisher expresses interest in cross-referencing their internal data with brand safety labels.

Discussion

The Global Disinformation Index (GDI) and NewsGuard conduct research to quantify the levels of journalistic integrity and authenticity in different publications. On a per publisher basis, there does not seem to be any correlation between the percentage of articles labeled as “unsafe” and the GDI or NewsGuard ratings. 

Based on this analysis, it appears that brand safety tech is calibrated to be highly sensitive to specific phrases, key words, or sentiments, but not to overall meaning or intent i.e., if a disreputable publisher was to create 10,000 articles about a seemingly innocuous topic such as gardening or zoo animals, where each phrase is a thinly-veiled metaphor for a political leader such as Mitch McConnell or Nancy Pelosi, it is likely this brand safety provider’s Natural Language Processing (NLP) tech would not detect the ruse. 

The tech is also optimized for various specific categories seen in the 4A Advertiser Protection Bureau (AP) Brand Safety Framework, like drugs, violence, and crime. Although the Internet Advertising Bureau (IAB) added an additional category for “fake news,” this study did not identify any examples where the brand safety tech appeared to be classifying an article as unsafe due to “fake news.” It is possible that the identification of “fake news” or subtle misinformation surpasses the present day capabilities of NLP based brand safety technology.

Based on these deductions, it appears that brand safety tech functions in a vacuum, where it assesses each article by itself. It has no conceived notion about who the publisher is. If a registered foreign propaganda website writes about bunnies or petunias, the technology will most likely label those articles as “safe” for digital ads to appear on.

Conversely, the tech may disproportionately “penalize” outlets and journalists which cover serious topics, such as wars, pandemics, politics, or civil unrest. The possible unintended effect of this is that reputable news outlets that have journalists covering these topics will be blacklisted by brand safety tech and thus potentially receive less ad revenue. If you're a war correspondent in the Middle East, your articles are much more likely to be blacklisted as unsafe and thus receive less ad revenue per impression than another journalist who is covering celebrity gossip news. The brand safety tech effectively may function as a modulator of ad revenue.

Overall, brands and marketers may want to consider "Is it better to advertise on a seemingly innocuous article on a disreputable site, or a story about Black Lives Matter on wsj.com? Will consumers be more offended by the specific article content that my ad shows on, or rather, the publication and ideology I am funding via a controversial and disreputable domain?" If a publication like the New York Times or the Economist is being read by highly affluent individuals such as Bill Gates, do digital marketers really want to allow an automated NLP system to remove their ads from being shown to a highly valuable audience?

Take away points

  1. Brand safety ad tech vendors appear to be labeling tens of thousands of articles on reputable websites as “unsafe".
  2. Brand safety tech appears to be highly sensitive to the presence of certain keywords, but cannot assess publisher context, image content, or misinformation.
  3. Publishers may be losing revenue due to brand safety tech, and digital advertisers may be losing valuable impressions.

If you are a publisher or advertiser interested in analyzing the impact of brand safety technology on your operations, please reach out via the contact page or @kfranasz.

Receive future blog posts

Subscribe below to get new articles