Friday, 15 July 2005
Size (of the whole Blogosphere) Doesn't Matter
Yesterday in Stop the Blog Statistic Madness! I touched on something I've been meaning to blog about for a while now – your base. I'm not talking about the base of the classic game Zero Wing, I'm talking about the base in the market research sense. For those not privy to the secret vocabulary of market researchers, base is a not-so-fancy word for denominator when calculating percentages of the occurrence of a particular characteristic or response in a sampling of people.
To help illustrate this let's look at an example. Let's say you conduct a political survey by telephone by calling 1,000 registered voters in a particular county and ask them who they are going to vote for in an upcoming election, Mr. Berry or Mr. Kush. Let's say the results of the survey are as follows:
313 Respondents say they are voting for Mr. Berry
397
Respondents say that are voting for Mr. Kush
290 Respondents say
that they are undecided
Now, if the election were held on the day that the survey was conducted, who would you say has the advantage? The right answer is whose ever campaign is paying for the survey and the reason is the base.
If the Berry campaign commissioned the survey, they might issue a press release stating “Kush carries less than 40% of voters”. This obviously misleads the reader into the false assumption that Berry is carrying over 60% of voters. In this case the Berry campaign uses all of the survey respondents in the base.
(397 Kush voters / 1000 respondents in the base = 39.7%).
The Kush campaign on the other hand, might report “over 55% of voters support Kush”. In this case, the Kush campaign wants to underplay the magnitude of the undecided vote so they remove the undecided voters from the base.
(397 Kush voters / (1000 respondents – 290 undecideds) = 55.9%)
For marketers, advertisers and market researchers, base definition is equal parts art and science. While the above example is a jab at the questionable practices of modern political campaigns, limiting the base to a particular subgroup of respondents is often valid or even essential to produce meaningful results. If our survey was commissioned by a non-partisan organization, the reported results might look like this:
55.9% of decided registered voters say they will vote
for Kush
44.1% of decided registered voters say they
will vote for Berry
29% of registered voters say they
are undecided
Here we see the base clearly defined for the reader (i.e. “of decided registered voters”). Sometimes, as in this case, it is useful to change the base depending on what question you are trying to answer. Once this idea seeps into your brain, you should start confronting every percentage statistic with the magic base question: “Of what?”
Okay, quiz time. According to BlogPulse, on July 13, the word “mortgage” was included in one half of one percent of blog posts.
Did you ask “of what?” Good. You passed. So what's the answer? Here the base contains every blog post that BlogPulse collected and indexed on July 13 including all of the blog spam, all non-English blogs, all link blogs, all the garbage that somebody stuffed into an RSS feed that day and shoveled into BlogPulse. For searching this is okay – not optimal – but okay. For market research it is abysmal. In our political survey example we called only registered voters in the geography of interest. What would the results had been if we called 1,000 people from all over the world regardless of age or voting status? Certainly less meaningful. Yet this is what we tend to do with the blogosphere.
Other than vanity or novelty why would anyone what to measure the percent mention of a particular term of EVERYTHING collected by BlogPulse (or any other search engine)? I can't think of any.
What is valuable is what the casual user thinks they get from BlogPulse, namely, a base that includes only legitimate blogs in their native language. What we need from BlogPulse (or anyone else who implements trending) is the ability to define the base in addition to our search term. There are lots of base definitions that I'd like to used given the opportunity. Here are a few off the top of my head:
Language – of all English blogs
Age of Blog – of all blogs that are at least six months old and have posted within in the last 30 days
Blog Host – of all Blogger blogs, of all Live Journal blogs
Geography – of all blogs in Oklahoma, of all blogs in the UK
Demographics – of all blogs written by women between the ages of 25 and 45
Blog Status – of the Technorati 100, of all TTLB Crawly Amphibians
Blog Type – of all personal blogs, of all commercial blogs
Blog Focus – of all political blogs, of all technology blogs
In reality, you'd almost always want to combine one or more of these but the point is that you should never want to measure all of the blog post on the planet – at least not for the purpose of market research. Think back to my “decline of god” discussions. If we were to use English language non-spam blogs as a base, I don't think we would see a decline in “god” – or at least one substantially less dramatic. For almost everything, it is meaningless at best and misleading at worst to use the whole blogosphere are your base. Unfortunately, this is all that BlogPulse – or anyone else I know of - currently offers (at least for free). BlogPulse is saying “All your base are belong to us.” ...er, so to speak.
[DISCLAIMER: I don't want to beat up BlogPulse here, as I've said before, their service is remarkable and generous and they are way ahead of anyone else with their Trend feature. Internally they perform these types of base adjusted analysis for paying customers (I gather but not from direct experience). ]
With all of the hoopla about Google Ad Sense for RSS and the jockeying for the right to proclaim “we index more worthless RSS fed SPAM than anyone else” I think we're missing the point. The value of the blogosphere to commercial interests is not finding a new way to be heard but discovering a new way to listen.
If you are thinking about measuring impressions of RSS feeds – you're looking in the wrong end of the telescope. Blogs are a feedback mechanism. Bloggers are bellwethers. By monitoring the blogosphere, you can measure the effectiveness of marketing but blogs themselves are not good marketing channels in the traditional sense – if it looks like advertising no one will subscribe.
So how do we listen? We listen by finding and measuring the blogs that represent our customers or our potential customers. The other 13.98 million blogs that BlogPulse and Technorati are tracking don't matter – they just muck up the base. Sure, companies need to blog – but the focus should be on catalyzing the conversation, not dictating it. It has been said recently that the blogosphere is a focus group – when's that last time you'd tried to advertise in a focus group?
There are people blogging about your company right now. Find them and listen. And forget the other 13.something million.
