Marketers & ad tech companies should regularly audit log files

Many ad tech companies, advertisers, and publishers receive access to granular log files that enable them to analyze each ad auction or impression they won. These log files can contain ad impression timestamps, page URLs, monetary transactions, and other pertinent details that enable more insight into ad serving behavior than is possible with an aggregated spreadsheet or dashboard–summary information that is often provided by ad tech vendors.

This research blog post will discuss one example of how analyzing individual ad impressions (with purely open-source, non-proprietary data sources) can enable various stakeholders to identify a particular type of discrepancy: page URL and domain mismatches.

Gannett Media, the largest U.S. newspaper publisher as measured by total daily circulation and purveyor of USA Today, the Detroit Free Press, The Indianapolis Star, and several hundred other local news sites, was observed using custom Javascript that appeared to mis-declare what pages and domains are submitted into header bidding ad auctions. Several thousand examples were observed, in which a user is reading one specific Gannett Media-owned domain (such as indystar.com), but the page URL and domain submitted to header bidding requests to Sell Side Platforms (SSPs) such as TripleLift and Pubmatic were for a completely different Gannett Media-owned domain, such as usatoday.com.

This discovery was originally made by Braedon Vickers, an independent researcher, who documented his findings on his blog.

pubmatic-indystar-seacoastonline.png

Screenshot of Chrome Developer Tools’ Network tab, taken on a specific indystar.com article from November, 2021. HTTPS requests sent to hbopenbid.pubmatic.com are observed encoding the “domain” attribute as a different Gannett Media owned domain - seacoastonline.com.

This phenomenon was observed for approximately 9 months, from May 25th, 2021 to March 4th, 2022.

It is unclear as to why Gannett Media used Javascript to submit a different page and domain to header bidding auctions other than the actual page a user was reading. However, this phenomenon presents a unique research opportunity to observe in real world conditions how various ad tech entities, including SSPs, DSPs, and Invalid Traffic detection vendors deal with ad auctions where the declared page URL and domain do not correctly match the actual page.

Why does this matter?  

According to eMarketer, “[d]omain spoofing is probably the most common type of ad fraud, and involves the fraudulent party disguising its URL or domain as another”. Therefore, it is critical that ad tech vendors implement various quality control and detection mechanisms in order to identify whether any intentional or unintentional page mis-declarations are occurring.

Domain or brand safety mis-declarations in header bidding requests would seem to be a fundamental area of concern for any provider of fraud detection or brand safety, yet mis-labeled impressions from Gannett Media have been consistently observed on ad exchanges such as Pubmatic and Index Exchange since approximately May 25th, 2021. Given that one of the primary reason advertisers choose ad exchanges, as well as ancillary ad tech services such as those provided by Integral Ad Science, Human (neé White Ops), Oracle Moat, and Pixalate, is to get accurate information about ad impressions/placements, one wonders: did anyone see something or say something?

This study carefully analyzed the ad serving mechanics of various Gannett owned domains, particularly in situations where the declared bid URLs and actual page URLs were not the same. This allows one to examine whether ad tech vendors have configured their infrastructure to detect (and block) domain spoofing.

The analysis will discuss several interesting observations, such as:

  1. Several major SSPs, including Rubicon Project (Magnite), Pubmatic, and TripleLift were observed serving ads when a mis-declared ad bid request was submitted, for 9 months (the time period observed)
  2. White Ops (Human), Pixalate, Oracle Moat, and Integral Ad Science (IAS) received telemetry about the bid URL mislabeling on Gannett Media websites, for 9 months
  3. Several major DSPs, including Google’s DV360, Xandr, Adobe, Conversant, The Trade Desk, MediaMath, and others, were observed serving on impressions with a mis-declared bid URL, for 9 months
  4. Several major media agencies, such as Dentsu, Publicis, IPG, and WPP, were observed transacting on impressions with a mis-declared bid URL, for 9 months
  5. Several dozen major advertisers were observed bidding on ad impressions with mislabeled bid URLs, including Johnson & Johnson, Procter & Gamble, GSK, Facebook, General Motors, Sears, Nike, Adidas, Ford, State Farm, Starbucks, Braven Health, SemRush, the Susan Komen Breast Cancer Foundation, and the New Jersey state government
  6. Many of the ad tech vendors who were observed on transacting on ad auctions with mis-declared page URLs have been “Certified Against Fraud” by third parties, such as the Trustworthy Accountability Group (TAG)

Adalytics shared the results of this study with a C-Suite executive at one Fortune 500 advertiser. The executive commented that:

“Major corporations take many precautions to ensure their advertising spend is properly targeted and appropriate. Misdirection of that spend, for example by site fraud and domain spoofing, is not only ad fraud but can lead to companies advertising in places that go against a company’s core beliefs. This can create damaging customer reputation and credibility issues.”

Background: Gannett Media websites appear to mis-specify header bidding parameters

Many websites that serve ads use a technology known as client-side header bidding, wherein a publisher’s website can collect bids (in a user’s browser) before making a final request to the publisher’s ad server. Gannett Media websites, such as USA Today, make use of a specific header bidding Javascript library known as Prebid.js. When this Javascript library sends ad auction details and receives bid responses, it is possible to use a standard browser’s Developer Tools to inspect HTTPS network requests for details about ad serving mechanics.

For example, between May 25th 2021 and March 4th, 2022, if you opened Chrome’s Developer Tools, and then navigated to usatoday.com, you would observe HTTPS requests being sent to hbopenbid.pubmatic.com, ib.adnxs.com/ut/v3/prebid, or tlx.3lift.com/header/auction?lib=prebid (owned by ad exchanges Pubmatic, Xandr, and TripleLift, respectively). Some of these requests would contain bid responses that encode which advertiser chose to bid a certain amount for the opportunity to show you an ad on usatoday.com.

Many of the HTTPS requests sent to header bidding endpoints encode various salient information about the ad auction, including the page URL, domain, contextual keywords, and brand safety values. This information can be automatically used by ad tech vendors and advertisers when deciding how much (if at all) they want to pay for a given ad slot.

However, with Gannett Media’s websites, there were many instances where the parameters encoded in these HTTPS requests do not match the expected values. For example, on November 10th, 2021, if you were to visit a specific usatoday.com article titledFeds ask 51-month sentence for Jake Angeli, who raided U.S. Capitol in fur horned hat”, and inspect the HTTPS requests sent to Pubmatic’s header bidding endpoint (hbopenbid.pubmatic.com), you would notice that the submitted page “domain” attribute lists a different Gannett owned domain - wisfarmer.com.

user-on-usatoday-wisfarmer.png

Furthermore, the “page” attribute does not list the actual page, but rather, a different one from a 2016 article in wisfarmer.com about restaurants in Salisbury, Maryland. The declared “keywords” attribute does not align with the article about the US Capitol raid; rather, it includes terms such as “Salisbury, MD”, “Shopping Malls” and “Restaurants”, which align with the contents of the 2016 article from wisfarmer.com. Lastly, the “brandsafety” attribute also does not align with the actual usatoday.com article, but rather, with the wisfarmer.com article.

These discrepancies they can be observed in other HTTPS requests, such as those sent to fastlane.rubiconproject.com - Rubicon Project’s (Magnite) header bidding endpoint, and s.acexedge.com (an ad viewability endpoint API operated by Human, also formerly known as White Ops).

rubicon-usatoday.pngHTTPS requests to fastlane.rubiconproject.com, with a mis-specified “tg_i.domain” and “tg_i.page” attributes.

This phenomenon was reproducibly observed happening thousands of times across scores of Gannett Media owned properties. Researcher Braedon Vickers wrote a blog that goes into depth about this phenomenon. Based on data from the Internet Archive, it would appear that this phenomenon of header bidding parameter mislabeling was happening on Gannett Media websites from at least May 25th, 2021 until March 4th, 2022.

How was this happening?

During this time period, when a browser makes an initial HTTP request to usatoday.com or another Gannett owned domain, it first fetches the HTML document that contains the skeleton of the webpage. That initial HTML document contains an inline Javascript tag, which encodes specific instructions on how your browser should formulate and send requests to header bidding auctions.

Specifically, this inline script tag configures settings for the pbjs (Prebid) Javascript variable, and sets the ortb2 parameter (open real time bidding) to a specific domain, page, contextual keywords, and set of brand safety attributes. The Prebid.js documentation states that this ortb2 interface is used to encode various Open Real Time Bidding data, and “allows publishers to supply attributes related to their content and users”. The documentation states that this interface is meant to include “data that might be useful in ad targeting”.

The screenshot below includes an example of (stylized) code taken from this inline script. The code was obtained on November 10th, 2021 from the usatoday.com landing page. In the screenshot below, one can see that the domain attribute is being hard-coded to be jsonline.com, and the page attribute is configured to be a specific article from 2018 about college basketball. The keywords, brandsafety, and section attributes are all consistent with the 2018 article from jsonline.com.

mislabeled-jsonline-ortb2-request.pngThis pbjs ortb2 parameter encodes information that is then sent to fastlane.rubiconproject.com, hbopenbid.pubmatic.com, ib.adnxs.com/ut/v3/prebid, tlx.3lift.com/header/auction?lib=prebid, and other header bidding related API endpoints.

Because this inline Javascript tag is served as part of the initial HTML document, it is likely being server-side rendered or injected on Gannett Media’s Fastly content delivery network (CDN). The declared ortb2 parameter seems to change every few days for each Gannett Media article or page.

Study objective & motivation

It is unclear what is the intent (if any) behind this specific setup on Gannett Media websites. It is possible that the root cause of this issue is an error in Gannett Media’s Fastly CDN configuration or content management system. If anyone at Gannett Media would like to offer some insights as to how this phenomenon occurred, I would love to learn more. The Media Rating Council (MRC), a US-based nonprofit that manages accreditation for media research and rating purposes, defines “Domain and App misrepresentation”, including “falsified domain / site location”, as a form of Sophisticated Invalid Traffic (SIVT). The Interactive Advertising Bureau (IAB), a tradegroup, describes “site fraud” and domain spoofing as one of the more common forms of ad fraud. According to the IAB, “Domain spoofing fraud [...] is often challenging to detect and prevent. With a simple line of code, fraudsters can change the URL of a site to make advertisers think lower quality sites (e.g., copyright infringement, gambling, pornography etc.) are reputable publishers.”

According to eMarketer, “Domain spoofing is probably the most common type of ad fraud, and involves the fraudulent party disguising its URL or domain as another”.

In theory, many ad tech vendors have set up automated systems to detect, flag, and block ad auctions that involve misdeclared or spoofed ad inventory. The IAB’s ads.txt protocol was designed to help prevent domain spoofing. Some ad tech vendors state that they will block parties who mis-declare pertinent information in ad bid requests.

In theory, many ad tech vendors have set up automated systems to detect, flag, and block ad auctions that involve misdeclared or spoofed ad inventory. The IAB’s ads.txt protocol was designed to help prevent domain spoofing. Some ad tech vendors state that they will block parties who mis-declare pertinent information in ad bid requests.

Twitter-Exchange-About-Spoofed-Bid-Requests.png

An exchange from Twitter between two ad tech professionals.

The issue – of consistency between the actual inventory and the inventory identified in the bid request – is so important that many companies want external validation that they have adequate processes in place to detect inaccuracies in prebid data and inventory. The Trustworthy Accountability Group (“TAG”) is, according to its website, “the leading global initiative fighting to stop criminal activity and increase trust and transparency in digital advertising.” (emphasis in original). TAG was formed by the three largest advertising trade associations, the American Association of Advertising Agencies, the Association of National Advertisers, and the Interactive Advertising Bureau. TAG offers “certifications” that a firm has processes in place to prevent fraud, ensure brand safety, and ensure that its ads are not distributing malware.

The consistent–and constantly mutating–misidentifications of Gannett’s pages provide us with an excellent natural experiment for evaluating how well the ad tech ecosystem responds to nonconforming inventory. To be clear, the Gannett situation is almost certainly not fraud–there is simply no clear advantage to any of Gannett’s consistent mislabeling of its inventory. Thus, without judgment, we can objectively evaluate the effectiveness of the rest of the ad tech supply chain.

Studying the ad serving behavior on Gannett Media owned websites allows one to empirically study whether or not various ad tech vendors are changing the behavior of their systems when ad inventory is spoofed or mis-declared. In theory, an ad server, SSP, or DSP could stop serving ads when they detect something is amiss. For example, some ad exchanges will return error codes if it appears that ad slots are being mis-represented.

Methodology: passively observing ad serving mechanics

This study collected HTTP network traffic data from a number of open-source, publicly available resources, such as the Internet Archive, MIT Common Crawl, URLScan.io, and custom HAR ( HTTP Archive format) files. A HAR file records all the inbound and outbound network traffic served to a browser when a user is navigating on a given webpage, including all the ad serving details. Careful inspection of HAR files allows one to determine, for example, if an ad server consistently returns error codes when an ad auction is mis-declared. 

Furthermore, an HTTP network file includes initiator data, so one can analyze which specific Javascript, iframe, or tag was responsible for making a specific HTTPS request. Analyzing HTML and initiator data chains allows one to determine if a request was coming from code that runs on the main publisher page, or within an ad iframe.

This study analyzed over 800,000 HTTPS network traffic requests from more than 1,600 page view sessions to determine how ad serving behavior changes (if at all) from various ad tech vendors, when the Prebid ortb2 parameter on a given Gannett Media article appears to be mis-labeled. The study looked at pixels and Javascript served from a variety of different domains, including those of supply side platforms (SSPs), demand side platforms (DSPs), brand safety and invalid traffic (IVT) detection vendors, and audience traffic measurement vendors.

First, the captured network and HAR files from various Gannett Media owned sites (including usatoday.com, tennessean.com, delawareonline.com, coloradoan.com, and others) were analyzed.  For each HAR file, one can determine if the Pre-bid ortb2 parameter correctly encodes the actual page URL and domain, or mis-declared page data.

Of these 1,600+ page view sessions, exactly zero had the actual page URL correctly declared - in every observed instance, a different page URL was being inserted into the ortb2 configuration. 223 had the correct page domain declared (13.5% ) while 1,423 had a different domain declared in the ortb2 configuration (86.4%). For example, there were 217 distinct page view sessions where the user was reading a usatoday.com article about White House press secretary Jen Psaki testing positive for Covid-19 (usatoday.com/story/news/politics/2021/10/31/jen-psaki-tests-positive-covid/6227793001), but the declared page URL observed in the ortb2 parameter was a 2019 article about from The Telegram about a New Jersey high school football match ( telegram.com/story/sports/high-school/football/2019/11/02/nj-football-austin-percy-defense-lifts-east-brunswick-win-over-perth-amboy/4107502002/ ).

Next, each of the 800,000+ HTTPS requests was analyzed to determine they contained the actual page URL and domain, or the mis-declared page parameters identified in the Pre-bid ortb2 configuration. Each API and third party endpoint can encode information differently. HTTPS GET requests sent to fastlane.rubiconproject.com often contain a tg_i.domain and tg_i.page parameter. HTTPS POST requests sent to bidder.criteo.com often contain body data with a publisher:ext:domain and publisher:ext:page parameters.

Which SSPs are bidding and serving ads on auctions with mis-declared bid URLs?

10,315 HTTP GET requests were observed sent to fastlane.rubiconproject.com with mis-specified tg_i.domain and tg_i.page values. According to Prebid documentation, this is the header bidding API endpoint for Magnite (formerly known as Rubicon Project), a supply side platform. 9,598 POST requests with publisher:ext:domain and publisher:ext:page parameters containing mis-specified page data, were sent to bidder.criteo.com. This API endpoint is operated by Criteo. 8,449 POST requests were sent to hbopenbid.pubmatic.com, the header bidding endpoint of SSP Pubmatic, wherein the site:page and site:domain parameters contains mis-specified page data.

A large number of requests to Index Exchange’s htlb.casalemedia.com/cygnus appeared to encode the mis-specified page URL as a JSON query string parameter. 

index-exchange-usatoday-mislabeled.pngPrebid ad requests sent to Index Exchange-owned casalemedia.com, wherein the page and domain parameters encode “jouranldemocrat.com”, rather than the actual page that the user is on (“usatoday.com”)

Similarly, one can see that the mis-declared page data originating from the Pre-bid ortb2 configuration was sent to other SSPs, including TripleLift (tlx.3lift.com - 6,754 requests). A small number of HTTPS requests were submitted to apex.go.sonobi.com (which may be the Sonobi SSP’s analytics endpoint), Teads SSP’s a.teads.tv that contained the mis-declared page URL. However, it appears that these last three examples may be referrer parameters, which indicate what previous page a user navigated from. In those cases, the parameters could ostensibly be encoding the correct information - Adalytics’ lacks the data to diagnose these last three examples.

Some header bidding adapters are architected in a way that the publisher does not supply any page or domain information. For example, there were 19,727 HTTPS requests submitted to c2shb.ssp.yahoo.com, which is Yahoo’s header bidding API endpoint. None of these 19,727 requests appear to contain any parameters that indicate page URL or page domain data. There were 9,952 observed HTTPS GET requests submitted to OpenX’s gannett-d.openx.net, though all 9,952 of these HTTPS requests contained the correct, actual page URL in the 'ju' parameter. 

Lastly, 9,955 different HTTPS POST requests were observed going to ib.adnxs.com/ut/v3/prebid, which is AppNexus/Xandr’s header bidding API endpoint. None of these 9,955 requests contain any parameters that seem to indicate page URL or page domain data. It is possible that these three SSPs (Yahoo, OpenX, and AppNexus) have removed the ability for publishers to self-declare page URL and page domain data in their header bidding adapters - a potentially prudent security precaution. Lastly, every single request to Amazon’s c.amazon-adsystem.com/e/dtb/bid (which does not utilize Prebid.js) was observed encoding the correct domain and page URL data via the 'pr'' and 'u'' attributes, respectively.

an-ad-bid-request-for-amazon-from-usatoday.pngRequests sent to c.amazon-adsystem.com/e/dtb/bid always appeared to encode the correct page URL and page domain. This request was observed on a specific usatoday.com article.

With regards to the first set of SSPs (TripleLift, Pubmatic, Rubicon Project, Index Exchange, Criteo), it is unclear if they were aware that, since at least May 25th, 2021 until March 4th, 2022, a potentially large volume of ad auctions and header bidding HTTPS requests contained page URL, domain, brand safety, and contextual keywords which appear to have been mis-specified. If they were aware of this phenomenon, these SSPs seem to have elected to continue to serve ad impressions on various Gannett Media owned websites throughout that nearly six month period.

One might pose a follow-up question: do these SSPs “act on” and incorporate the mis-declared ortb2 data sent to them into downstream ad serving processes? It is ostensibly possible that they universally ignore whatever information is encoded inside of the header bidding HTTPS requests, and choose to rely on other signals, such as the HTTPS Referer header. If that were the case, it does not matter if a publisher mis-specifies page data in an HTTPS request - the SSP will still know how to correctly represent the ad auction to exchanges and DSPs.

Which SSPs return bid responses containing mis-specified page data?

When a header bidding sequence finishes, each SSP returns zero, one, or multiple bid responses. Those bid responses contain tags or HTML that is used to fetch an ad, if the bid won the ad auction.

Conversant-Media-Pixel.pngExample of an HTTPS to hbopenbid.pubmatic.com, with the ad markup encoded in the adm parameter. This ad was purchased via the Conversant Media DSP, as evidenced by the dspid = 32.

The previous section left open an implicit question – does the mislabeled information from Gannett have any consequential effect? Upon careful inspection of bid responses we can diagnose whether the correct Gannett Media page URL was incorporated within an ad, or alternatively, if the mis-specified page data sent to the header bidding server was incorporated into the rendered bid response.

With regards to Pubmatic, parsing the HTTPS responses’ adm attribute for ad markup revealed 788 distinct instances where the mis-declared page domain that was specified in the ortb2 parameter ended up encoded within the ad creative itself. For example, an HTTPS request sent to hbopenbid.pubmatic.com from a usatoday.com page, where the site.domain parameter was mis-specified as lcsun-news.com

In the HTTPS response returned from hbopenbid.pubmatic.com, the adm parameter contains an ad that was purchased via Conversant Media’s DSP on behalf of client Omni Agent Solutions. Analyzing the returned creative shows that lcsun-news.com, not usatoday.com, is encoded throughout the ad creative (five times). usatoday.com does not appear in the returned creative once.Thus, it appears that Pubmatic is submitting the mislabeled inventory into header bidding auctions.

pubmatic-ad-creative-screenshot-2.pngHTML markup from a bid response returned from hbopenbid.pubmatic.com on a usatoday.com article, where the ortb2 domain parameter was mis-specified as lcsun-news.com. In the ad creative markup, one can see that the aktrack.pubmatic.com iframe resource declared the sURL parameter as “www.lcsun-news.com”, not “usatoday.com”.

Several Pubmatic creatives, returned from different brands and from different DSPs, show the same pattern, where the mis-specified ortb2 domain is encoded in the ad creative returned by the hbopenbid.pubmatic.com server, but the actual page domain cannot be observed anywhere within the creative. If the tracking or win pixels embedded in the ad encoded the mis-specified URL, it is possible that the DSP and advertiser who paid for that ad receives incorrect reporting data. Pubmatic is currently enrolled in the Verified by TAG (Trustworthy Accountability Group) program, and has achieved the Certified Against Fraud seal.

pubmatic-tag-certification.png

Screenshot from Trustworthy Accountability Group (TAG) registry, showing that Pubmatic is “Certified Against Fraud”.

On January 31st, 2022, Pubmatic was also recognized as a “TAG Trust Champion”.

According to Pubmatic’s "Quality Efforts: Cleaning Up the Programmatic Landscape", the company has several mechanisms in place to improve inventory quality, and “has no tolerance for fraud on our platform”. The company uses “Pre-Bid Blocking” with Integral Ad Science, and also works with Pixalate, White Ops, and Oracle Moat for inventory quality and view-ability monitoring.

In addition, the company has an inventory quality team:

“Our IQ team reviews fraud reporting, and investigates inventory flagged for quality from both internal and external signals before relaying findings to our customer success management teams. If improvements are not seen within a defined timeframe, we move to pause or terminate the publisher account.”

It is not clear whether Pubmatic considers domain or brand safety mis-declarations in header bidding requests to be a significant issue.

In spite of Pubmatic’s best efforts, and those of its outside vendors, several layers of ad tech appear to have not sufficiently alerted Pubmatic of the erroneous initial Gannett prebid declaration.

We say “appears,” because it is not known whether Pubmatic has publicly or privately contacted Gannett Media or other ad tech suppliers with regards to this issue since May 25th, 2021. Similarly, it is unclear whether IAS, Pixalate, Human, or Oracle Moat alerted Pubmatic to the presence of domain mis-declarations in header bidding ad auctions requests submitted to Pubmatic’s ad auction in the last ~9 months. As also noted in the previous section, Pubmatic is hardly alone in forwarding mislabeled inventory to its SSP. 

For Index Exchange, numerous ad creatives appeared to include the mis-specified page URL and domain. For example, one user was perusing usatoday.com, but they had ads from The Trade Desk and Epsilon load, via Index Exchange, wherein the pixels and tags within the ad creative included references to www.courier-journal.com, the mis-declared page URL. Not unlike the Pubmatic instance described above, Index Exchange took measures that it thought would discover and prevent mislabeled ad inventory from being sold, purchased, and used to run ad creatives.

casalemedia-bid-request-1.pngAd creative returned from htlb.casalemedia.com/cygnus, where the user is on usatoday.com, but the ad creative returned includes pixels and Javascript tags that refer to the mis-declared page URL (courier-journal.com in this case).

As with Pubmatic, Index Exchange is also “Certified Against Fraud” by TAG. On January 31st, 2022, Index Exchange was also recognized as a “TAG Trust Champion”.

index-exchange-tag-certification.pngScreenshot from Trustworthy Accountability Group (TAG) registry, showing that Index Exchange is “Certified Against Fraud”.

Index Exchange uses ‘ads.txt monitoring’ to “help prevent domain spoofing”, and also “has a partnership with HUMAN, Inc. (Human; formally White Ops), a leading fraud detection vendor.

Index uses a tool called HUMAN, Inc. by MediaGuard [sic] to scan each advertising opportunity received by our exchange to filter out ad requests originating from bots or fraudulent traffic. Once an ad request clears this filtration step, it is assigned a lookupID. The lookupID is sent in bid requests to DSPs to indicate that MediaGuard has scanned the ad request and determined that it is not fraudulent.”

“Fraudulent” may sound like a strong term to refer to what we are describing with Gannett Media. Though Gannett is the ultimate owner of the web pages on which ads are being sold, the bidders and buyers may not have been receiving the inventory being described. If declaring a different page in the header bidding requests submitted to htlb.casalemedia.com does indeed constitute a (perhaps slightly more benign) form of page mis-declarations, it is not clear whether Index Exchange and/or Human have notified Gannett Media or other ad tech stakeholders about this issue in the last ~9 months. Other SSPs, though, despite not rejecting the error in the prebid submission, were able to prevent delivery of ad creative to the errant URL address.

The HTTPS responses sent from fastlane.rubiconproject.com encode the mis-specified domain parameter in an attribute called inventory. However, other than this one attribute, no instances of the mis-specified page URL or domain within an ad creative itself were observed in various Rubicon-mediated ads. This would suggest that either Rubicon does not encode bid page URLs within ad creatives, or that they were correctly detecting the page on which the ad was loading via other means, such as the HTTPS Referer header. Similarly, for TripleLift and Criteo, no HTTPS response or ad creative was observed encoding the mis-specified page URL or domain, though they did each serve ad responses.

To summarize, while various SSPs and exchanges were observing HTTPS requests with mis-specified page, domain, keyword, and/or brand safety parameters. Virtually all of these SSPs were observed to return ad bids and creatives in response to the bids. Only Rubicon, Index Exchange, and Pubmatic appeared to encode the mis-specified parameters in some part of the HTTPS response. Only Pubmatic and Index Exchange were observed actually inserting the mis-specified domain attributes into their ad creatives, including into the tracking and win pixels of the DSPs who had transacted upon those ad impressions. Both Pubmatic and Index Exchange are certified against fraud by TAG, and were named “TAG Trust Champions” on January 31st, 2022. Both Pubmatic and Index Exchange state that they work with a number of invalid traffic and fraud detection vendors, including IAS, White Ops, Oracle Moat, and/or Pixalate.

Which DSPs are transacting on impressions with mis-declared bid URLs?

One can analyze header bidding responses to deduce which demand side platform (DSP) was used to purchase a given ad impression by a marketer. Many ad creatives are embedded with pixels that are used by DSPs to determine if an ad auction was won and an ad successfully rendered on a given page. For example, the Trade Desk DSP uses pixels served from ny1-bid.adsrvr.org/bid/feedback/ for win notifications. One can inspect header bidding responses and rendered ads, and if one observes that an ad iframe contains an <img> tag where the src attribute has the aforementioned ny1-bid.adsrvr.org/bid/feedback/ URL, one can likely deduce that the ad was purchased via the Trade Desk.

trade-desk-index-exchange-usatoday.pngHTTPS request sent to a pixel used by The Trade Desk DSP, after an ad has rendered on the page. In this case, the user is on usatoday.com, but The Trade Desk’s pixel is encoding the mis-declared page URL which was sent to Index Exchange (lansingstatejournal.com).

As mentioned previously, when one carefully analyzes ad creatives returned by hbopenbid.pubmatic.com or htlb.casalemedia.com, one can identify which DSPs tracking and win pixels appear to encode the mis-specified page domain, rather than the actual page domain.

In the aforementioned hbopenbid.pubmatic.com bid responses, the following DSPs were observed bidding on impressions where the page URL, domain, brand safety, and/or contextual keywords were mis-specified in the bid request:

  • Amobee - DSP ID 22 - bid response included script tags from presentation-atl1.turn.com
  • MediaMath - DSP ID 27 - returned ad markup included included script tags from tags.mathtag.com
  • Conversant - DSP ID 32 - included image tags from secure.cdn.fastclick.net and script tags from ad-usadmm.dotomi.com
  • Google Display & Video 360 - DSP ID 80 - included image, script and iframe tags from doubleclick.net
  • IQM Platform - DSP 1097 - included script and image tags from win.iqm.com and pxl.iqm.com, respectively
  • Criteo - DSP ID 97 - included iframe tags from ads.us.criteo.com/delivery/r/afr.php
  • Adobe Advertising Cloud - DSP ID 218 - included image tags and script tags from statsf-tm.everesttech.net and displayf-tm.everesttech.net, respectively
  • The Trade Desk - DSP ID 377 - included image tags from de1-bid.adsrvr.org/bid/feedback/pubmatic and click tags from insight.adsrvr.org
  • Verizon Media / Yahoo - DSP 452 - included script tags from pr-bucket.ybp.yahoo.com/ab/secure/true/imp
  • Dealer.com Automotive Digital Marketing Solutions - DSP 1165 - script tags from pixe.esm1.net, iframe tags from adse.esm1.net

In several instances, the embedded resource scripts and tracking or conversional pixels from the DSP owned domains’ appeared to contain references to the mis-declared page data. For example, there were 374 observed instances where a Pubmatic-served bid response from the Adobe DSP includes Javascript tags loaded from displayf-tm.everesttech.net/feed/placement.js. This is a domain that is owned and operated by Adobe Audience Cloud. In these 374 observed instances, the page_url parameter encoded the mis-specified page URL from the ortb2 configuration, rather than the actual page URL.

adobe-dsp-ad-tag.png

Example of a Javascript tag that was returned within the ad bid response from hbopenbid.pubmatic.com. This specific bid response came from DSP ID 218, which is Adobe Advertising Cloud’s DSP; everesttech.net is a domain operated by Adobe. The tag contains a parameter called page_url, which references a usatoday article. However, the user was actually reading an article on courierpress.com; not usatoday.com.

As another example, several HTTPs requests sent to hbopenbid.pubmatic.com were observed on indystar.com, wherein the domain and page parameters were set to lancastereaglegazette.com. Several bid responses came from DSP ID 32, which is likely Conversant Media’s DSP. In these bid responses, one can observe multiple Javascript tags from dotomi.com (another Conversant operated domain) which contain a parameter called "btcurl". The btcurl parameter was set to www.lancastereaglegazette.com (from the mis-specified page from the ortb2 parameter), rather than the actual indystar.com page URL.

epsilon-conversant-ad-creative.pngScreenshot of an ad bid response served from hbopenbid.com, via DSP ID 32 (Conversant). This ad response was observed on indystar.com. The reportingUrl, reportingUrlDualStack, and messageUrl variables included references to another Conversant owned domain, called dotomi.com. Each of these resources contained a different URL encoded in the btcurl query string parameter, referencing btcurl=www.lancastereaglegazette.com. Lancastereaglegazette.com was the parameter that was encoded in the ortb2 object.

Conversant is Verified by TAG and has achieved the TAG Certified Against Fraud seal in the past.

The Trade Desk and Google’s DV360 was observed being used by many advertisers, and bidding on auctions wherein the specified page URL submitted to Pubmatic or Index Exchange was mis-labeled. In the example below, one can observe a bid response from Google’s DV360 on behalf of Adidas, wherein the ad creative encodes the mis-declared page URL (“starcourier.com”), instead of the actual page URL (“usatoday.com”). Adidas’s agency of record is MediaCom, part of WPP’s GroupM.

dv360-pubmatic.pngExample bid response from Google’s DV360 in response to a Pubmatic ad auction. The user was on usatoday.com, but the creative tag includes references to the mis-declared page domain: starcourier.com

Both Google and The Trade Desk are Certified Against Fraud, according to TAG.

Xandr Invest DSP was also observed placing ad bids on Index Exchange and Pubmatic ad auctions, wherein the page domain was mis-represented. For example, a user was reading indystar.com, wherein an ad auction request sent to htlb.casalemedia.com mis-labeled the page as austin360.com. Xandr Invest DSP returned a bid response on behalf of St. Mary’s Food bank. The ad creative contained a Javascript tag, whose src attribute referenced the mis-declared page URL (austin360.com). The Javascript tag did not have any references to the actual page the user was on (indystar.com). 

Lastly, both MediaMath and Verizon Media’s Yahoo DSP were observed placing bids on ads wherein the page URL was mis-declared.

Which IVT and brand safety vendors received data about mis-labeled bid URLs?

Many media buyers rely on third party verification and measurement vendors to check if ads are being served properly. These vendors often assess whether an ad loaded in view or whether it was served to a real human user. Some of these vendors also help catch ad fraud.

One such company, Integral Ad Science (IAS), lists several types of ad fraud in their Ad Fraud 101 guide. The guide asks “So, what exactly is ad fraud?”, and lists several possibilities. One of these explainers says: “Serving ads on a site other than the one provided in an RTB request—this is known as domain spoofing”.

IAS-domain-spoofing-screenshot.png

Source: IAS_Ad-Fraud-101-Guide_UK.pdf

As mentioned previously, this study observed several thousand instances over the course of 9 months, wherein various Gannett websites appeared to mis-declare the page URL, domain, brand safety, and keyword data submitted to ad auctions run by certain SSPs.

IAS appears to have some kind of publisher-side integration with Gannett Media, as the IAS script (https://static.adsafeprotected.com/iasPET.1.js) appears to load on every single page load of a Gannett Media website. This script loads on the top level page frame, and the script load is initiated directly as part of the website’s initial HTML content. 

Many brands also use IAS’s code within their ad creatives. Several hundred ad creatives loaded on Gannett Media pages with an IAS pixel in the ad markup. In one instance, an ad creative for Ally, a digital financial services company rendered on usatoday.com. However, at this time, the page was being mis-declared as detroitnews.com

The ad creative was served via Index Exchange, and was purchased via The Trade Desk DSP. 

The ad creative for Ally contained a Javascript tag for “pixel.adsafeprotected.com/jload”. This pixel appears to contain the mis-declared “detroitnews.com” value in the “bidUrl” query string parameter.

IAS-detroitnews-usatoday.png

In another instance, a reader was on usatoday.com, but the ad auction incorrectly encoded the page domain as desertsun.com. When an ad served on this page, it relayed the “bidurl” parameter to IAS’s pixel, with the mis-declared desertsun.com in the telemetry.

IAS-detroitnews-usatoday-2.png

Given that IAS may have been receiving telemetry from both the publisher embedded Javascript and from the pixel’s within their client’s ads, it is not clear if the company noticed the discrepancy in the submitted page URLs in various ad auctions. A simple comparison of the “bidurl” parameter from their pixel, alongside the other page data they receive, would have been sufficient to detect an anomaly. If IAS did note that the bidurls were being in-accurately submitted in certain ad auctions, it is not clear whether the company notified Gannett Media or any other stakeholders. 

IAS is triple TAG certified, including against fraud.

IAS-TAG-certified.png

Oracle Moat, another measurement vendor, also appeared to receive a large amount of telemetry over the course of 9 months that would have suggested that the submitted bid URLs did not match the actual page URLs. In one instance, an ad loaded on usatoday.com, but the page had been mis-labeled in the ad auction as lansingstatejournal.com. In this case, Moat’s px.moatads.com pixel appeared to relay both the correct and the mislabeled page URL. If Moat detected this discrepancy, it is not clear if they noticed Gannett Media or any advertising partners.

oracle-moat-usatoday-lansingstatejournal.png

Moat is also “Certified Against Fraud” by TAG.

In other ad auction responses, there were observed pixels from s.acexedge.com and q.adrta.com (White Ops/Human and Pixalate, respectively). These pixels also relayed information about the correct as well as the mis-declared page URLs, but it is not apparent whether these vendors notified anyone of the discrepancy. Both vendors are “Certified Against Fraud” by TAG.

white-ops-screenshot-photo.png

Screenshot of a White Ops/Human pixel being invoked from an ad purchased via Adobe’s DSP. The pixel relays the mis-specified page URL (wisfarmer.com) via the di query string parameter. The user was actually reading an article on usatoday.com

Brands, advertisers, & media agencies

Several hundred large brands and media agencies were observing bid and placing ads on Gannett Media sites when the ad auction contained a mis-declared page URL.

Ads from Nike, Sears, Johnson & Johnson, Procter & Gamble, GSK (Pepto Bismol), Facebook, General Motors, Sears, Adidas, Ford, Spotify, Ally, State Farm, Starbucks, Braven Health, Semrush, the New Jersey state government, and the Susan Komen Breast Cancer Foundation were served serving with mis-declared page data.

Furthermore, media agencies such IPG (Matterkind), Dentsu, Publicis (Epsilon), and WPP (Xaxis) were observed bidding on various ad ad auctions containing mis-declared page data.

Adalytics notified one media agency executive about this phenomenon, and asked the executive for a comment regarding how they perceived this situation. The media agency executive responded:

“I think there was a failure in the media supply chain. It creates an increased sense of caution and distrust for brands and buyers who cannot be sure their impressions are showing up where they're intended, and leads me to wonder why the companies who are paid to flag anomalies like these didn't. It proves that no matter how much tech we throw at these challenges there's still a very important role for human oversight. Put another way, we just can't trust the tech to solve these issues for us. There's always concerns about these environments due to those limitations. It's precisely why large media conglomerates can command significant portions of budget - because there's a level of credibility and trust. Depending on the media vendor in question it could severely damage that trust and likely lead to increased demands for guarantees.”

Conclusion

Caveats & Limitations

Interpreting the results of this observational study requires nuance and caution.

The study cannot deduce whether the page data mis-specifications observed on Gannett Media sites were intentional or accidental. It is theoretically possible that there is some unique purpose or requirement for the configurations observed. If anyone at Gannett would like to comment as to the reason for the observed phenomenon in this study, that would be kindly appreciated.

The study relied entirely upon publicly accessible or available data; it made use of no proprietary or internal data on how ads are placed or paid for. Because the study was entirely based on browser client-side observations, it is unable to deduce if there are any post-ad impression reconciliations, or corrections that take place. For example, it is possible that if an anti-fraud vendor or exchange detects that an ad was served on a page where the bid URL and the actual page URL were not consistent, they could block any monetary value from changing hands. Thus, even if a brand, SSP, DSP, or media agency was placing ads on Gannett Media sites where the page data was mis-specified, it is possible this had no monetary consequences for the brand.

If any of the agencies, brands, or ad tech companies discussed in this study would like to share further details about this phenomenon, please get in touch.

Lastly, this study observed the phenomenon of the incongruent page data submitted to certain ad auctions for approximately 9 months (May 25th, 2021 to March 4th, 2022). It is not clear if it started earlier, and why it ended in March. It may be the case that the configuration was an error, and someone noticed and corrected it.

Discussion

While many brands and marketers may not care on which specific Gannett Media owned article or their domain their ad rendered, this phenomenon of mis-declaring page URLs, contextual keywords, and brand safety values in header bidding requests could theoretically be used by publishers with whom an advertiser definitely does not want to associate with. For example, it is ostensibly possible that an extremist, adult content, or otherwise “brand unsafe” publisher could leverage this tactic (in tandem with other exploits not publicly disclosed in this article) to extract ad spend from an advertiser who thinks they are placing their ads on a reputable website. 

According to the IAB, “Domain spoofing fraud [..] is often challenging to detect and prevent. It is also one of the most lucrative types of fraud to perpetrate. With a simple line of code, fraudsters can change the URL of a site to make advertisers think lower quality sites (e.g., copyright infringement, gambling, pornography etc.) are reputable publishers.”

Notwithstanding the caveats and limitations discussed previously, the observations from this study of ad serving mechanics on Gannett Media  owned websites suggest that several large ad tech vendors, including SSPs, DSPs, and IVT vendors, were receiving data that could have alerted them to discrepancies in the declared and actual page URLs (and brand safety values). Based on the Internet Archive, this phenomenon of bid URL inconsistencies has been going on since at least May 25th, 2021 until March 4th, 2022 - approximately 9 months.

It is unclear if or why these vendors chose not to act upon this anomalous data. In some cases, ad tech vendors may choose to block a website or app if they observe it mis-declaring attributes in bid requests. If that is the case, it does not appear that any of the ad tech companies mentioned in this study have chosen to “block” Gannett Media owned properties from serving ads.

Twitter-Exchange-Regarding-Spoofed-Bid-Requests.png

It is possible that the ad tech vendors noticed the bid URL discrepancies, and alerted other stakeholders privately, who then made an informed decision to continue placing ads on Gannett Media sites despite these discrepancies. It is also possible that the discrepancies have no material effect on media buyers’ decisions or strategies - for example, many advertisers may be equally comfortable with their ad rendering on a war-related news article on USA Today as a high school sports article on The Indianapolis Star.

However, another sobering possibility is that many of these aforementioned ad tech vendors receive telemetry on ad serving mechanics, but are intentionally or unintentionally not acting upon that information. Programmatic advertising can result in billions of ad impressions and terabytes of data being generated in a short amount of time. Processing such large volumes of data requires expertise in data warehousing, ETL, and big data analytics. Many vendors may not have the human capital or operational bandwidth to make such data engineering investments.

Twitter-Quote-Screenshot-1.pngScreenshot of a Tweet from an ad tech professional.

Many of the ad tech vendors discussed in this blog have the ability to use their own, internal log data to identify discrepancies such as bid URL misdeclaring. But what are media agencies, buyers, and marketers supposed to do to prevent such situations?  Any marketer or media agency who relies purely on aggregate dashboard or spreadsheets would be limited in their ability to identify phenomena such as the page data mis-labeling described in this study. The Media Rating Council explains that Sophisticated Invalid Traffic (SIVT), including “domain and app misrepresentation”, “consists of more difficult to detect situations that require advanced analytics, multi-point corroboration/coordination, significant human intervention, etc., to analyze and identify.”

If a sophisticated media agency or in-house media buying team has access to log files from their DSP, ad server, and/or IVT vendor, such a media buyer can try to reconcile the log files to determine if there are any mismatches. For example, one brand was observed buying ads using the MediaMath DSP and with Oracle Moat pixels in their ad creatives. The media buying team at this brand could do a log file reconciliation to audit whether ad impressions that were supposed to have been bought from publisher A appeared on publisher B’s website after the ad creative was rendered and the tracking pixel loaded.

Twitter-Screenshot-2.png

Many media agencies or advertisers may lack the data engineering or data science skills needed to perform such analyses. This potentially puts their clients or brands in danger of various operational or reputational risks, particularly if domain mis-labeling is occurring on less reputable websites. Using the same techniques described above, Adalytics is helping advertisers, publishers, and media agencies to analyze even more vast log level data to verify the extent of the phenomena of mismatched page urls to those specified in bid requests.

Take away points & recommendations

  1. Gannett Media owned websites appear to have been mis-declaring page URLs, contextual targeting keywords, and brand safety values in header bidding requests since May 2021 until March 2022, yet many ad tech vendors appear not to have reacted to this phenomenon, or acted on it even though they received telemetry/data that there was a mis-match
  2. DSPs, SSPs, and IVT vendors should monitor their internal data more closely to identify potential anomalies
  3. Media buyers, agencies, and brands need to do log file reconciliation and audit

👉 If you've made it this far, and you're interested in auditing log files to better understand your programmatic media buys, please reach out to me here or @kfranasz. 👈

Receive future blog posts

Subscribe below to get new articles