"The Data Dump: Fun with Graphs and Charts" Marc Hedlund David Hornik; Ian Kallen; Eric Lunt; Roger Magoulas; Adam Messinger; David L. Sifry ----------------------------- contrubutors: Gabe Hollombe - gabe@avantbard.com - http://avantbard.com ----------------------------- Dave Sifry - state of the blogosphere report - blogosphere doubles in size every 5.5 months over the last 36 months - over 60x larger than 3 yrs ago - over 100k new blogs each day, aprox 1.5 blogs per sec - 50% of new bloggers are still posting 3 months later - 10% of blogs update at least weekly (3M blogs) - lots of spam; about 9% of new blogs are spam; 60% of pings are from known spam sources; technorati blocks spam pings before the register as splogs in their indexes - 1.2 million legit posts/day - news cycle that's shifting. not 24hr cycle like old days, but one that's measured in MHz. (cute) - blogs brought us a "frictionlessness" of publishing capabilities - power law: MSN gets the top end, but blogs start to dominate the middle and low end - blogging is big in japan (started taking off about 4 months ago) * japanese posts tend to be shorter, but they post more often - almost 1/2 of all posts use tags/categories * about 24% use the rel="tag" microformat Eric Lunt (feedburner CTO) - going to show Feedstorm (animation) - yes, there is a growth in feedburner feeds over the last 2 years - demo viz created by manifestdigital Adam Messinger (Gauntlet Systems) - building a nextgen source control feature - no more broken builds (builds on each checkin) - Lotka's law (80/20 rule) another statement of power law, theorizes applies to OSS projects, that few developers write the most code - going to reply about 2dozen opensource project checkin histories - ActiveMQ checkins by author follows Lotka - Lucene checkins by auth doesn't fit as well - Hibernate EJB3 shows exponential scaling - conclusion: true that a few devs do most of the dev work - more data avail at www.gauntletsystems.com Andy Edmons (Windows Live Search) - showing viz of popular queries by date (rose bowl near jan 1, for example) - the attention of the world is mirrored in search activity - hot spots around the globe (often searched for in MS Earth; "walmart" bigger than "america") Roger Magoulas (O'Reilly) - need to figure out how to compare things when scales vary differently (tracking trends) - 79% of AJAX job postings are in the bay area Ian Kallen (technorati) - tracking blog spam - financial incentives amplify abuse (duh) - some posts are not easy to id as spam, but all the links tend to point to the same site - lots of splogs use the same name server and belong to same ARIN IP block == Resources == __END__