Thursday, 16 June 2005

Who are those pesky bloggers anyway?

« Technology is Not an Antonym for Business | Main | Who are those pesky bloggers anyway? - The Links »

Over the last several weeks I've been thinking about blog metrics. Specifically, I've been thinking about data mining for trends, opinions, preferences, values and the like. Most of the blog aggregation tools out there are focused on getting the most recent information and picking the entries in which the reader is most interested. There are a few folks scratching the surface of trends with Technorati probably being the most well known. But as cool as Technorati is, the trending is still very rudimentary – is this tag going up or down in popularity right now. What about over the last six months, over the last year or in comparison to other tags or phrases or only within a select group of blogs? The more I think about this, the stronger I believe that, as they say, there's gold in them there blogs.

My thinking goes something like this...

  • There's lots of data. Technorati claims to be tracking over 11,000,000 web logs. Bloglines claims to index over 80 million entries per day.

  • There's lots of diversity. Web blogs are written by all different kinds of folks with different interests, from different parts of the world.

  • There's lots of influence. Lots of other folks read web logs and are influenced by them (for example for purchasing decisions or choices at the polls). Increasingly we are hearing that bloggers are gaining more credibility than professional journalists or corporate marketing.

  • There's lots of collaboration. Blogs are conversations. They're interactive. Really important ideas, disruptive ideas tend to get talked about a lot more than the other kind of ideas.

  • They're easy to read. The technology that's shuffling these ideas around the Internet was designed to carry data, not presentation or other fluff, in a reasonably standardized way. This makes mechanized analysis much easier.

  • They're time stamped. Blog entries are explicitly time stamped – this makes trending much easier than it is for generic web pages. Who knows how long web pages have been up or when they were last changed?

  • They're event-driven. Blog entries are events – this makes them easier to measure. Let's say you use Google to measure the number of web pages that contain a phrase of interest every six months. Let's say the result is always 15,000. What's the trend? Is the phrase still getting attention or are there just 15,000 people too lazy to update their web sites. Measuring event driven Blogs is different. People have to actively stay interested to keep the numbers up. This shortens response time making Blogs more sensitive to trends.

  • They're easy to collect. It's fairly easy to aggregate and index blogs. In many regards it's easier to build a reliable Technorati or Bloglines than it is to build a Goolge or Yahoo!

So what does this all mean? Well, to market researchers or anyone trying to better understand their customer, it might mean low cost access to really, really valuable information. In some cases, data mining blogs may even be able to replace traditional survey research or focus groups. For those in highly competitive markets, it might mean dramatic reduction of response time to customer service issues or a competitor's super secret initiative.

So now that I'd convinced myself this idea has potential, I wondered what the data would look like and what the limitations would be. Can useful information really be pulled out of blog rhetoric? So I began poking around and stumbled across Intelliseek's BlogPulse. I think these kids are really on to something. Although somewhat limited, BlogPulse has a very powerful tool called the BlogPulse Trend Tool. Using this tool, I began trying to answer some of these questions...

Who are these bloggers anyway?

Gender

I started off with trying to get an idea of demographics. I thought I'd start off small and see what I could discover about gender. The important thing to remember when looking at these graphs is that we're measuring what people are talking about, not who they are necessarily. I began by running a comparison between masculine words (man, men, guy, he, him, etc.) and feminine words (woman, women, gal, she, her, etc.). This yielded the following chart:


It's not a big surprise that masculine words are more common than feminine words. The interesting thing here is the decline of both genders over the last six months. I'll speculate that the secret here is that we're looking at “Percent of All Blog Posts.” Since the search phrases are pretty broad and both seem to trend downward, this is probably an indication that new bloggers tend to write about more gender neutral ideas than existing bloggers. This would drive down the percentage of gender specific writing. This might indicate that the tone of blogs is becoming less like journaling and more issue or product focused which might require less frequent use of gender terms. But this is just conjecture on my part.

Leisure Time

How do bloggers spend their free time? Here is how much they talk about books, television and movies:



It's nice to see that bloggers read more than they watch television. Recent studies show that Internet active people are replacing “TV time” with Internet usage, so this makes sense. Notice the dramatic cycle of entries that contain the word “movies”? This corresponds to weekends. People tend to watch movies on the weekend and then blog about them. Voila, instant consumer behavior data. While not exactly earth shattering, this begins to illustrate the validity of the concept. Also note the leveling off of the trend lines. New bloggers are writing about books, movies and television at similar proportions to existing bloggers.

Electronic Gadgets

I haven't yet discussed “blogosphere bias” but suffice it to say that bloggers are not representative of the general population. This presents a problem if you want to measure consumer trends for the elderly, for example, because their underrepresented. On the other hand, this is really good news if you want to measure geeks. What geek toys do bloggers talk about?


While Sony and Microsoft duke it out, Apple has all the buzz. As as I'll show in an upcoming post, bloggers' passion for Apple is disproportionate to that in the brick and motor world. What else is interesting here? Well, everyone blogs their coolest Christmas gifts! Apple released iPod Shuffle in January, iPod Mini in February. It's interesting that these new product releases create bursts of blog entries but don't have a sustained elevating effect. It turns out that this is common. It takes something fairly disruptive to sustain buzz on blogs.

Sony released their much anticipated PSP in March. Entering a new segment of the market (handheld video games) is disruptive and has helped Sony sustain an increase in buzz.

Microsoft released details of the new X-Box 360 in May only to be trumped by Sony's release of details of the PS3 a few days later.

Political Persuasion

Politics is another popular blog topic. Do you write a red blog or a blue blog?


Notice that we've slipped below 1% of all blog entries monitored by BlogPulse. Currently 1% is in the 4,000 to 5,000 blog post range – still a reasonable number. I don't have much to say here except that it looks a fair amount like the real world – both side neck and neck with the Republicans/Conservatives currently with a slight majority.

Church Every Sunday but God's in Decline

Religion is another hot topic on blogs. I think this next graph is really interesting. I measured the trends of mentions of “god”, “church” and “jesus” or “christ”. Here's what I got:



Like the “movies” trend above, discussions of “church” spikes on the weekends. The interesting thing is that Jesus only seems to get special attention on Christmas and Easter. This may support the idea that the community or social aspects of church are more significant than the spiritual component. Also, there are probably a lot of “Chistmas/Easter” bloggers that don't normally author religion content except around major Christian holidays. This again is a trend consistent with real world behavior.

I think the downward trend of all three of these terms is most likely due to the continuing diversification of blogs (like the gender graph about). However, the decline of “god” seems to be a bit more pronounced than the other two terms which may indicate some other influence. Even with the shrinking percentage, these religious terms are all more popular than any of the political ones mentioned above. With 3.5% of blog posts, “god” is still a popular deity.

Conclusions?

This “analysis” is far some scientific – I spent a few hours playing with the BlogPulse tool and wrote some commentary on my impressions. However, I think these graphs are compelling. I there is real potential this type of data mining. What BlogPulse does not currently off is the ability to limit the universe so that you could measure trends within a limited scope of blogs – blogs in your blog roll for example – or those in Robert Scoble's blog roll. For different industries there's likely a subset of blogs that will consistently be a bellwether for that market – as long as that market is reasonably represented by bloggers.

So what do you think?

Stay tuned for a follow up post where I'll show proof of concept for measuring brand strength and feature value using my mew favorite toy BlogPulse.

Posted by Matt Galloway at 4:02 AM in Technology & Culture
« June »
SunMonTueWedThuFriSat
   1234
567891011
12131415161718
19202122232425
2627282930