Using network analysis to identify related websites

In early August 2021, several ad tech experts on Twitter discussed the phenomenon of “made for advertising” inventory. These websites appear to serve large amounts of ads, but are structurally and editorially distinct from typical news media or blog websites. Many of these sites use interesting sounding headlines, such as “14 Rules That You Didn’t Know Amish Women Must Abide By”, and use the names of made-up writers or famous French philosophers as the listed authors.

Screenshot of a Twitter post from August 2021

Screenshot of a Twitter post from August 2021

This discussion was partially fueled by a research blog posted by publisher fraud detection firm DeepSee, which found that some websites appear to change their appearance and ad-serving behavior if a user arrived on those websites by clicking on an ad on a different website, rather than arriving through direct navigation. Some of the identified publishers appear to “buy traffic” by running sponsored ads via Taboola, Outbrain, Twitter, Facebook, and other channels to attract readers to their webpages. In some cases, more than 90% of the websites' traffic originates through these sponsored content widgets (as opposed to organic search or social medial links).

Example of sponsored ads for itsthevibe.com and definition.org, observed on cnn.com

Example of sponsored ads for “itsthevibe.com” and “definition.org”, observed on “cnn.com”

Marketing Brew, a marketing industry newsletter, reached out to Adalytics with a list of publisher websites that they suspected as “made for advertising” sites. In this post, Adalytics will discuss how careful analysis of publicly available information about these websites and network graph analysis can be used to identify large numbers of ad-serving websites that were likely created by the same entities. Based on this analysis, it appears that two companies - Perion Networks and Pesto Harel Shemesh (also known as Crunchmind) - appear to operate many of these websites. Perion, through its subsidiaries, appears to operate at least 41 domains, whilst Pesto Harel appears to operate 88 websites. 

This post will also discuss which media agencies, brands, supply chain platforms (SSPs), and demand side platforms (DSPs) are placing ads on these websites.

In what ways can websites be connected?

Let’s say your goal is to understand whether two websites are owned or operated by the same entity. 

If you look at “theverge.com”, “sbnation.com”, and “vox.com”, the footer section at the bottom of each website clearly indicates that the websites are controlled by Vox Media LLC. “cosmopolitan.com”, “elle.com”, and “menshealth.com” are similarly clearly labeled as being owned by Hearst Media.

However, in some cases, publishers do not clearly label who owns or operates the website. Their contact pages are devoid of any information beyond a form or an anonymous email address.  One can try to use the WhoIS database to look up to whom a domain is registered to, but many websites choose to hide their ownership behind an anonymizing “privacy shield” to avoid having to deal with spam messages from entities who scrape WhoIS for lead generation.

In the absence of clearly identifying information, one can turn to more technical resources to try to see who operates a given website. For websites that host programmatic display ads, you could check a website’s “ads.txt” file. These publicly accessible files list who is authorized to sell and distribute digital ad inventory on a given website. For example, “nytimes.com/ads.txt” authorizes Google, Index Exchange, OpenX, and several other companies to serve ads on nytimes.com. Each of these supply side platforms (SSPs) or ad exchanges has a “sellers.json” file which enumerates to whom the given seller ID is registered to. For example, in “openx.com/sellers.json”, one can see that the seller ID 536872336 is registered to the “New York Times Company”.

Screenshot of a seller.json file showing the nytimes.com seller ID

Screenshot of OpenX's Sellers.json file showing the nytimes.com seller ID

However, in many cases, ads.txt and sellers.json IDs are not specific enough to figure out who operates a given website. Some ads.txt files list seller IDs that are used by tens of thousands of different publishers; other times, the sellers.json files list a given seller ID as belonging to a shell company or a legal entity that does not exist. In many cases, Sellers.json files incorrectly list intermediary ad networks at “publishers”, which can further increase confusion about who actually services a given seller ID. In this sort of situation, you can try to create a network graph to see how a given publisher might be related to other publishers. Triangulating information offers more insight in aggregate than is possible by looking at individual websites. A network graph is like a way of representing relationships between different entities. Facebook and other social media platforms use network graphs to model interpersonal relationships between people. Manufacturers use network graphs to model raw material supply chains.

When building a network graph, it is important to identify indicators that are both specific and informative. If Facebook tried to model social relationships just by looking at what foods people like, they would struggle to understand groups of friends or communities. Similarly, in trying to find clusters of related publisher websites, one must identify indicators that are unlikely to occur due to random chance and are indicative of some kind of shared relationship.

In this context, social media and analytics account IDs can be highly informative. Many websites use Google Analytics to track who visits their pages, and each Google Analytics installation has an account ID that designates who can access and review that data. For example, there are over 5,700 websites that share the Google Analytics account ID “UA-33523145-1”. This ID is used by the US federal government’s Digital Analytics Program, which is a unified Google Analytics account for US federal government agencies. The websites “gizmodo.com”, “theonion.com”, “jalopnik.com”, and “lifehacker.com” all share the Google Analytics ID “UA-142218-33”, as these websites are all owned by G/O Media Inc.

Researcher Lawrence Alexander published an article in 2015 on Bellingcat showing how analytics IDs can be used for “site fingerprinting”.  

Analysts at Palo Alto Networks (a cybersecurity firm) and Stony Brook University showed in 2018 how analyzing shared Google Analytics and other IDs enabled them to identify phishing websites which were likely created by the same hacking groups.

A single shared Google Analytics ID does not necessarily imply shared ownership between sites - you need multiple distinct signals to corroborate that a website is likely to be operated by the same entity. Other specific indicators include:

  • Facebook Pixel account IDs
  • Twitter analytics account IDs
  • Google Adsense IDs
  • Taboola IDs
  • Outbrain IDs
  • Yahoo web analytics IDs
  • Comscore pixel IDs
  • Pinterest pixel IDs
  • Snapchat pixel IDs
  • Quantcast Easy Tag ID

Combining these account IDs, which are publicly accessible on any publisher site, with information from ads.txt files, allows one to create an interconnected network graph that can then be used to identify shared ownership across many websites. This combined graph analysis approach can yield insights which are difficult to obtain by looking at websites individually.

Analyzing specific publishers

As mentioned above, the newsletter Marketing Brew reached out to Adalytics for help in trying to figure out who operates a list of specific publisher websites. Listed below are insights gleaned through a careful examination of public data, including using network graph techniques.

MagellanTimes.com and EliteHerald.com - Perion Networks cluster

We will start by analyzing “magellantimes.com”.

If a user arrives on “magellantimes.com” by first clicking on a Sponsored content ad from Taboola or Outbrain, the structure of the website changes drastically. This phenomenon was previously documented in a research post by DeepSee, who dubbed this phenomenon as “misleading content formats”. DeepSee’s researcher stated that “the arbitrage specialists design sites in such a way that advertising analysts who click around their home page wouldn’t find anything objectionable.”

Screenshot of the magellantimes.com, with a direct page navigation. Few to no ads render.

Screenshot of “magellantimes.com”, when reached via direct browser navigation.Screenshot of the magellantimes.com, reached by clicking on an Outbrain ad. This site shows a high density of ads.

Screenshot of “magellantimes.com”, when reached by clicking on an Outbrain ad located on another website.

In the past, if a user visited “Magellantimes.com”, the website appeared to use the Google Analytics ID “UA-43750835-36”. At least seven other websites use the same Google Analytics ID (“UA-43750835”), including “petsfanatic.com”, “totalpast.com”, “guerillainsurance.com”, “autoinquirer.com”, “happinesstimes.com”, “nextrefinance.com”, and “unleashedfinance.com”. 

null

As of August 2021, “magellantimes.com” switched to using a different Google Analytics ID - “UA-178993256-6”. This alternative Google Analytics ID appears on at least 13 other websites, including “eliteherald.com”, “equitymirror.com”, “zenherald.com”, “pawszilla.com”, and “opulentexpress.com”.

null

One can also observe that “magellantimes.com” uses the Taboola tracking pixel ID “1240533”, which appears on at least 28 websites, such as “eliteherald.com”, “zenherald.com”, “affluenttimes.com”, “historicalpost.com”, and “affluenttimes.com”.

null

The site also uses a specific Yahoo tracking pixel ID: “10122972”, which appears on many of the same websites.

null

Furthermore, “magellantimes.com” appears to load a custom tracker script that mentions “Pubocean”, a company which will be discussed below.

null

Comparing various shared analytics IDs and ads.txt seller IDs from various websites reveals a clustering pattern that connects “magellantimes.com” to “scribol.com” and several dozen other publisher websites.

Network diagram showing sites related to magellantimes.com

A manual inspection of these websites confirms that many of them appear to be visually similar in layout.

Websites with similar visual appearance to the magellantimes.com

“Magellantimes.com” lists in its footer section that it is a property of “Battery Media Group”, a limited company registered in the same office building in London as several of the other limited companies identified in this study. 

A quick search for “Battery Media Group” reveals some results from Linkedin, which suggest that the company is a part of Perion Networks.

Screenshot of Google search results for

Battery Media Group’s website indicates that it also operates eliteherald.com, pawszilla.com, zenherald.com, historicalpost.com, affluenttimes.com, and atlanticmirror.com.

Screenshot of the The Battery Media Group's website

The Battery Media Group website does not list certain publisher websites in the above network graph, which upon manual inspection, also indicates in their respective footer sections that they are owned by Battery Media Group. These include sites such as “equitymirror.com”, “opullentexpress.com”, or “historicalgenius.com".

Screenshot of a footer from eliteherald.com

One of the connected websites, “scribol.com” is of particular interest. A quick search shows that scribol was acquired by, or is linked to, PubOcean. The PubOcean website states that “Pub Ocean is founded by a team of people who built Scribol.com, a top-200 Alexa site, from scratch. Today, Scribol is one of the largest and most profitable publishers in the world.” Linkedin lists Scribol Publishing as a PubOcean company.

Screenshot from Linkedin showing Scribol Publishing

PubOcean in turn was acquired by a publicly traded company called Perion Network Ltd ($PERI) in July 2020. Perion Network also acquired another adtech platform called ContentIQ (CIQ).

The PubOcean acquisition press release states that: “Pub Ocean will be integrated into CIQ with the express intent of building a “new media supply chain.” As a leading digital publishing orchestration system, driven by proprietary data algorithms that enable brands to scale to highly relevant audiences, CIQ stands to immediately benefit from Pub Ocean’s automated recommendations and analytics which sit between brands and channels in the “media supply chain.”

Ads.txt and Sellers.json analysis

Based on the analytics IDs, network graph analysis, and some simple searching, we have a list of websites and companies that are thought to be related, such as “magellantimes.com”, “eliteherald.com”, “Battery Media”, “Scribol Publishing”, and Perion Networks.

This information can be corroborated by analyzing the ads.txt files placed on each of these domains, as well as cross-referencing the various company names with various sellers.json files located on domains of ad exchanges.

Although there are again multiple direct seller IDs that are shared across tens of thousands of publishers’s ads.txt files, one can filter these IDs by (low) prevalence or for IDs that match some of the aforementioned corporate names.The subset of less common IDs is specific to “magellantimes.com” or the other related websites identified through clustering analysis.

Airtable showing seller IDs listed on the "magellantimes.com/ads.txt" file. Each seller ID has been cross-referenced against other ads.txt and sellers.json files, to determine how many other publishers share that specific ID.

For example, the Triplelift seller ID 6739 is listed as belonging to Pub Ocean, and only appears on “magellantimes.com”. TripleLift has 25 seller IDs listed in its sellers.json file mentioning “Pub Ocean”. Each of these TripleLift IDs is specific to one of the websites identified in the above clustering analysis.

Playbuzz, Taboola, Pubmatic, Nativo, Sonobi, ShareThrough, and Sovrn all have Seller IDs that list Pub Ocean, Battery Media Group, or Hexagram Advertising Exchange (apparently a former name of PubOcean).

In total, by analyzing Sellers.json files from Pubmatic, Google, Index Exchange, OpenX, AppNexus, and Taboola, we can see that there are at least 41 domains which seem to share a large number of ads.txt IDs that are registered to Pub Ocean, Perion, Hexagram Advertising, or Scribol Publishing.

Airtable showing domains whose ads.txt files listed seller IDs that are registered to Perion Networks, Pub Ocean, Hexagram Advertising, or Scribol Publishing, according to various Sellers.json files.

Robots.txt analysis

The “magellantimes.com/robots.txt” somewhat strangely disallows Googlebot and BingBot crawlers from indexing any part of the website except for the “ads.txt” file. This prevents the “magellantimes.com” articles from appearing in Google search results. All user agents are also disallowed from indexing any images on the website.

Screenshot of the file from

As with “magellantimes.com”, the “eliteherald.com/robots.txt” file instructs Googlebot and BingBot not to index the webpage. Furthermore, the “eliteherald.com” pages contain inline meta tags with the “noindex” indicator, which Google treats as instructions not to index a given webpage. 

Screenshot showing the source code on

In total, there appear to be 32 domains in this cluster of websites whose robots.txt files disallow search engine crawlers for indexing any content beyond the ads.txt files.

Airtable illustrating websites whose robots.txt files disallow Googlebot or Bingbot crawlers.

Thedaddest.com and Unpasted.com - Pesto Harel cluster

Next, we will take a look at “thedaddest.com”, which was also observed running ads on other websites such as Twitter to attract readers.

Screenshot showing a Twitter ad for thedaddest.com

“Thedaddest.com” is behind a WhoIS privacy shield, so it is not possible to directly identify ownership of the site therein.

null

Similarly, the“About us”, privacy, or contact pages do not provide any concrete information about the site’s owner, listing only an anonymous contact email. However, if one visits the “robots.txt” file for the site (“thedaddest.com/robots.txt”), one can see that there is a specific page URL that is disallowed for all crawlers.

null

An “impressum” is a “legally mandated statement of the ownership and authorship of a document” that is required in certain contexts in Germany, Austria, and Switzerland. If one navigates manually to “thedaddest.com/impressum”, one can observe that the site appears to be registered to “Pesto LTD”. 

null

Furthermore, if one compares Google Analytics, Facebook Pixel, Taboola, Comscore, and Comscore pixel IDs from “thedaddest.com” it is possible to identify a cluster of websites that appear to share many of the same ids. For example, the Google Analytics ID “UA-111799310” and Yahoo Pixel ID “10007617” appear on a number of other websites such as "bridesblush.com" and "drivepedia.com"

null

Triangulating various analytics IDs across a swath of publisher websites reveals the following network clusters.

null

If one monitors the OpenRTB Supply Chain objects emitted from “thedaddest.com” in HTTPS requests sent to “primis-d.openx.net”, one can observe the Seller ID “27734” from primis.tech is used on the website.

null

If one checks the “primis.tech/sellers.json” file, this specific seller ID (27734) is registered to “Pesto Harel Shemesh LTD”, whose domains are either listed as “crunchmind.com” or “pubplus.com”

null

Cross-checking the Seller IDs from “thedaddest.com/ads.txt” file reveals that, although some of the direct IDs are shared across tens of thousands of publisher websites, there is a subset that are assigned to “Pesto Harel Shemesh”, “Crunchmind”, or “PubPlus”, and can be seen in use by many of the websites identified in the above clustering analysis.

Airtable showing seller IDs from "thedaddest.com/ads.txt", as well as how commonly those IDs appear on other publisher websites.

“Unpasted.com” is similarly hidden by a WhoIS privacy shield that does not disclose who is the owner of the website. Clustering on analytics IDs shows that the site shares many IDs with similar websites to “thedaddest.com”. The “unpasted.com/robots.txt” file has a similar structure, disallowing crawlers from indexing the “/impressum/” page, which mentions “Pesto LTD”.

null

Searching across multiple sellers.json files for seller IDs that mentioned “Pesto Harel”, “Crunchmind”, or “Pubplus” reveals a set of IDs that are listed on 88 ads.txt files. Many of these domains share common analytics IDs.

List of publisher domains whose ads.txt files list various Pesto Harel related seller IDs.

Adventurecrunch.com - Pure Ventures Media

Next we will examine “adventurecrunch.com”.

The landing page of the website does not offer any immediate clues as to its ownership or operator. The footer of the page includes a Privacy Policy and Contact pages. 

The Privacy Policy page does not offer much insight as to who owns the page. The website belongs to “Adventure Crunch LLC”, but searching for this company name in Google or Open Corporates does not yield any meaningful results.

null

The Contact page is similarly vague or anonymized.

null

The next logical step is to try to do a WhoIS database lookup to see who the domain “adventurecrunch.com” is registered to. However, this also does not yield much information as the domain is hidden behind a privacy shield.

null

One can try to analyze the publisher’s ads.txt file for “direct” seller IDs for the page to determine who gets paid for ad inventory, but this also does not prove to be a very informative endeavor. Many of these direct seller IDs are shared by thousands of different publishers, or are hosted by intermediaries. Cross-referencing the data with various “sellers.json” files shows that the Google direct seller ID pub-9095842310258311 is also registered to “Adventure Crunch LLC”.

In the absence of any clear indicators from the website itself, WhoIS records, or ads.txt and sellers.json files, one can then try to look at technographic indicators about the website’s structure to try to infer who created it.

Reviewing the source code of the website shows that “adventurecrunch.com” uses a specific Google Analytics account ID: UA-56751774-1.

null

However, in this case, it does not appear that there are other websites using the same Google Analytics account ID as “adventurecrunch.com”.

The Taboola analytics script ID for tracking page views on “adventurecrunch.com” is 1006564.This exact same Taboola ID also appears on at least 7 other websites, such as “yeahmotor.com”, “militarymachine.com”, “rushcrunch.com”, and “thegrizzled.com”.

null

“Adventurecrunch.com” was also observed using the Google Adsense ID “ca-pub-6897902191714833”. This Adsense ID is also utilized by six other websites, including “yeahmotor.com”, “thegrizzled.com”, and “mentertained.com”.

null

By cross-referencing various ads.txt, analytics, and Sellers.json IDs, it appears that this cluster of seven websites is all operated by Pure Ventures Media, which may also be an alias for Belle Modern Inc.

Itsthevibe.com - Spine Media

null

Sephora, Clif Bar, Rackspace, and Fiverr ads loading on “itsthevibe.com”

As with the previous websites of interest, “itsthevibe.com” is behind a WhoIS privacy shield. However, the “contact us” page lists a link to “Spine Media”. The Spine Media LLC website describes the company as an “audience growth technology company headquartered in New York City.” The page confirms that they operate “itsthevibe.com”, as well "standardnews.com", "definition.org", "yourbump.com", and "yourdailydish.com". Several of these websites share analytics and ads.txt Seller IDs. For example, the OpenX seller ID 537117644 is listed as belonging to Spine Media LLC and appears on 5 websites' ads.txt files.

“itsthevibe.com” loads a custom Javascript tag called “https://static.itsthevibe.com/wp-content/themes/genesiscoreapp/Assets/js/display-updated.js?ver=2.87:formatted” that instructs the website to automatically refresh ad slots approximately every 30 seconds. This can result in a significant number of ad impressions being generated per page view session. Several other websites identified in this cluster also load a similar Javascript tag which appear to trigger automated ad refreshes.

null

Obsev.com - Shandy Media

Lastly, examining “obsev.com” via a WhoIS lookup reveals the site is registered to “Shandy Media” of California. The footer of the website describes it as belonging to “Duke Digital Group.” A WhoIS lookup on the domain shows it is registered to Shandy Media, Inc.

The site shares analytics IDs with three other publishers - “whatsthat.com”, “stereotude.com”, or “madhistory.com”. 

null

These sites have several direct seller IDs listed with major SSPs or exchanges such as Google, Pubmatic, Index Exchange, TripleLift, GumGum, and Rubicon Project. Many of these are listed in the respective sellers.json files as belonging to Duke Digital, Shandy Media, or Obsev. Index Exchange record 187273 indicates that Duke Digital Group was formerly Obsev and Shandy Media.

Airtable listing various seller IDs from "obsev.com/ads.txt", as well as how commonly those IDs appear on other websites.

Which brands were advertising on these sites?

As one can tell from the previous sections of this blog post, many of the examined websites list Seller IDs for major SSPs and ad exchanges in their ads.txt files. But are there ads being actively served through these paths from Pubmatic, TripleLift Index Exchange, Rubicon, or OpenX?

Using data gathered through the Adalytics audience research panel, wherein individuals crowd-source ad impression and CPM bidding data, one can observe that many major brands were placing their ads on these sites through various major advertising platforms.

For example, the American restaurant chain IHop was observed serving their ads on “petsdetective.com” 64 times. Redrobin’s ads were observed collectively 60 times on “zenherald.com”, “drivepedia.com”, “magellantimes.com”, and “petsfanatic.com”.

Retailers Safeway and Walgreens were observed serving their ads 108 and 104 times (respectively) across these clusters of websites, including “zenherald.com” and “petsdetective.com”.

Other major brands serving ads across the aforementioned clusters of websites include California Cryo sperm bank, Nike, SimilarWeb, Transunion, Air France, Norton, Progressie, Samsung, The North Face, E-Trade, Paramount Plus, and the US Federal Emergency Management Agency.

By analyzing ad clickthrough URLs and UTM query strings, one can observe that California sperm bank and Legal Zoom were using the Google DV360 DSP to place ads on “itsthevibe.com”. Security firm Code42 was observed placing ads on “equitymirror.com”, likely via AppNexus.

The restaurant Chilis was observed running ads on “petsdetective.com”, where the ad clickthrough URL mentions “360i_Exponential''. This is likely a reference to 360i, a digital advertising agency under holding company Dentsu. Chilis selected 360i as its media agency of record in 2017.

null

By analyzing crowd-sourced programmatic header bidding data, Adalytics can also observe which ad networks are actively returning bid requests and how much advertisers are bidding or paying for ad slots on this cluster of websites.

For example, Pubmatic was observed serving 132 ad bids across the cluster of websites, including from brands such as Marriott Hotels, TransUnion, Kay Jewelers, and Citibank. Transunion was observed bidding up to $3.45 CPM on “obsev.com” via PubMatic and districtmDMX. Excellent Resorts was observed bidding $2.20 CPMs on “obsev.com” via IndexExchange.

This data highlights that many major SSPs and ad exchanges, including Pubmatic, TripleLift, IndexExchange, OpenX, AppNexus, and Google AdX appear to serve ads to the aforementioned clusters of websites. Major DSPs such as Google DV360 and AppNexus, as well as media holding companies such as Dentsu, Havas, Horizon Media, IPG, Publicis Groupe, and WPP, were observed buying ads on these domains.

Assessing ad recall & brand awareness

Adalytics is running an on-going, continuous audience research study with volunteers who use a browser plugin to record what ads they were shown. These volunteers are occasionally surveyed to assess ad recall and brand awareness. If a given user was shown an ad from a particular brand over one hundred times, a marketer might hope that the consumer is at least superficially aware of the brand's existence.

Four Adalytics volunteers had been served ads on some of the publisher websites listed in the previous sections of this article. Those volunteers were surveyed with various questions related to the ads that were shown to them specifically on the those publisher sites.

One user was shown over 92 distinct ad impressions for an event at the New York Botanical Gardens (NYBG). When asked if the user recalled being shown any ads for local outdoor events, or any events at the NYBG, the user said they did not recall any such ads nor were they aware of this promoted event.

Another user was shown 116 ad impressions for a local, healthy snack brand that sells lentil, carrot, and bean chips. When asked to name all of the chip brands that this user was aware of, the user did not name this particular brand.

Whilst a small sample size, these observations are directionally concordant with the notion that websites that have a high visual density of auto-refreshing ads lend to poor ad recall or brand awareness.

Conclusion

Caveats & Limitations

This analysis relied on ads.txt and sellers.json file data provided by well-known.dev in August 2021, and thus may be subject to change. Much of the analytics ID data, such as for Google Analytics and Facebook Pixel IDs, was provided as a static dataset from DeepSee.io. As such, many of the insights or analyses conducted in this study may change over time. Furthermore, the analytics and ads.txt Seller IDs used to perform network analyses may be shared between different websites not due to common ownership, but for other spurious reasons. For example, sometimes technology integrators or third party vendors may choose to use the same IDs for a set of websites they help manage, despite the websites having disparate ownership. Therefore, it is possible that there are false positives with regards to website ownership and clustering.

Discussion

Many of the websites identified through analytics ID network analysis appear to be related and are likely operated by the two organizations: Perion Networks (including subsidiaries PubOcean and ContentIQ) or Pesto Harel Shemesh. The first cluster of sites includes at least 41 domains, and the second appears to have 88 domains. In some cases, checking a website’s footer section or cross-referencing ads.txt and sellers.json seller IDs is sufficient to triangulate who is the operator of a website. But in other cases, such as “adventurecrunch.com”, orthogonal data may be needed to help understand a website’s ownership. 

Streamlining information about site ownership could be mutually beneficial for the owners of the websites described in this analysis, as well as for media buyers. If a programmatic media buyer desires to place ads on as many Perion Network or Pesto Harel Shemesh websites as possible, having clearer information about their subsidiaries and site lists could facilitate that. Conversely, if a programmatic media buyer wants to disable ad targeting for sites owned by these networks, in some cases they would need to identify and disable a large number of Seller IDs which appear to be registered to individual entities or domains. Many of the websites that were identified in this analysis use "standalone" ads.txt Seller IDs that are created specifically for those individual sites.

For example, Triplelift’s sellers.json file indicates it has 25 different seller IDs, where each ID represents one individual or specific PubOcean publisher domain. “Adventurecrunch.com” lists Google adX seller ID pub-9095842310258311 as a direct seller ID, and is the only website to use this ID. The owner of the ID is listed as “Adventure Crunch LLC”.

Furthermore, some of the websites identified in this analysis appear to be using Seller IDs that are shared across more than 10,000 websites, so it would be difficult for a programmatic media buyer to block those seller IDs without also potentially eliminating access to a large number of other sites. For example, “eliteherald.com” and several of the other sites listed the AppNexus seller ID 10062 as a “direct” seller ID. There are at least 14,541 different publishers which list this AppNexus seller ID as a “direct”. However, the AppNexus sellers.json file indicates that ID 10062 is assigned to Hive Media Video and “avantisvideo.com”. Furthermore, the AppNexus sellers.json file lists Hive Media Video as a “publisher” rather than as an “intermediary”. It is unclear why AppNexus considers Hive Media a publisher, if Hive Media and Avantis Video are the same company, and why 14,541 different publishers, including many identified in this analysis, list seller ID 10062 as a “direct” seller ID.

As explained in DeepSee’s blog post, many of these websites have some unusual behavior patterns with regards to ad serving. To further augment those observations, it is unclear as to:

  • why many of these publishers use WhoIS record privacy shields that make it difficult to determine the site’s ownership
  • why the structure of the websites appears to change depending on which source a site visitor is arriving from, such as why “magellantimes.com” or “eliteherald.com” appear to have different layouts and ad-serving behavior when a user visits through an Outbrain ad
  • why some of the publisher websites, such as “eliteherald.com” are disallowing GoogleBot and BingBot crawlers on their robots.txt files, or why they are disallowing all the image assets on their websites from being indexed by Google Image search. This has the effect of making it hard to find content on these websites through organic Google search

This study examined the ad serving behavior on the various identified publisher sites, and found that:

  • Major DSPs such as AppNexus and Google DV360 appear to be enabling ad serving on these websites
  • Several large SSPs and ad exchanges, such as Pubmatic, TripleLift, IndexExchange, OpenX, AppNexus, Sovrn, and Google AdX appear to serve ads on these sites
  • Major media holding companies like Dentsu are buying ads for their clients on these websites
  • The programmatic ad bids for some of these site range up to $3.46
  • Many advertisers, such as FEMA, IHop, Progressive, Samsung, Nike, Air France, and the North Face were serving ads and/or bidding on ad slot inventory on these sites.

Many advertisers buy programmatic ad slots with the hopes of building brand awareness. However, the high density of ads and ad refresh rates on some of the aforementioned publisher websites may make it difficult for consumers to notice or remember any of these ads. In light of this, marketers may wish to negotiate guidelines with their media agencies, DSPs, SSPs, and other ad tech suppliers to delineate what kind of ad inventory they think is consistent with their brand building goals.

Receive future blog posts

Subscribe below to get new articles