a close-up of a person's butt

BuzzFeed, the internet culture / news / listicle website, is on a roll. Their ability to find interesting content on the internet is the envy of the media world. This is evidenced by the events of last week, when BuzzFeed found a picture of a dress on Tumblr, identified that it had viral potential, and then published it in a very shareable format. That post alone generated over thirty eight million views. But perhaps most notably, it touched off a firestorm of other media companies writing derivative articles about “The Dress.” From Wired to the New York Times, everyone jumped on the BuzzFeed-initiated bandwagon.

Since it appears that every website in the world wants to replicate the BuzzFeed content-finding machine, we thought we’d do an analysis on from which sources BuzzFeed actually sources its content.

We wrote a simple crawler (see code) using the Priceonomics Analysis Engine, our tool that makes it easy to crawl and analyze web data, and then analyzed what the sources were used for the images that BuzzFeed includes in their articles.  In the last year, BuzzFeed published about 69,000 articles, included 830,000 images/videos with attribution strings, and linked to approximately 74,000 distinct sources.

a person in a red shirt

However, just 25 sources made up 62% of BuzzFeed’s content. While the Internet is a huge place, Internet Culture is birthed in just a few areas. What are these sources?

chart, bar chart

Source: Priceonomics data crawling

The number one place that BuzzFeed sources its images from is Tumblr. This is where The Dress was discovered. Instagram, Getty Images, and Youtube round out the top 4. This analysis isn’t perfect, however, as it ignores content that BuzzFeed embeds without putting an attribution link under it (Twitter embeds for example don’t need an attribution link since they are clearly from Twitter). A number of these sites simply provide stock photos.

There is a perception in the Reddit community that BuzzFeed is merely a sourcing all of its content from Reddit. Even though Reddit barely cracks the top 10 sources that BuzzFeed cites, this analysis doesn’t necessarily disprove this notion. For whatever reason, Reddit, doesn’t have its own image and video hosting. Instead, most people upload the content to Imgur or just link to the original source of the media on a place like Tumblr or Youtube for example.  BuzzFeed could very well be using the Reddit community as a mechanism to discover the content that is hosted elsewhere. Or not, this analysis doesn’t say.

So, if media sites are looking to compete with BuzzFeed at being BuzzFeed, here’s where to look. Internet culture is out there and it’s only on a few sites. You can build a web crawler that finds it like we did, create a newsroom that looks for it, or build some sort of hybrid technology/human approach as we imagine BuzzFeed does.

This post was written by Rohin Dhar. Follow him on Twitter. Data crawled by Brendan Wood and Priceonomics Data Services. To get occasional notifications when we write blog posts, sign up for our email list.