Dancing Tags

So just two days ago Obama swept Hawaii for his 10th Democratic primary win in a row. Barack in particular has been beat up for lacking real content in his message, though I personally think his actions have spoken loudly (he taught constitutional law, supported Net-neutrality, and helped push the ethics reform bill). Using tag clouds for visualizing messages has been done before, with pretty interesting results

That example was nice for a snapshot of each candidate, but I’m looking to dig into the data for a single candidate a little more. I want to compare tag clouds for Obama to see how his message has changed over time.

Word Frequency Analysis

First we need some speeches. I’ve chosen these:

  1. DNC Speech in 2004
  2. Winning Iowa
  3. Losing New Hampshire
  4. Winning super-tuesday
  5. Winning Wisconsin
  6. Austin Debate

Cleaned up of APPLAUSE and MR. BARACK prompts, we can get to ripping some word frequencies. There are surely easier ways to do this, but I’ve written a bash script to handle it.

#!/bin/bash
#
# parse.sh prints out a json/javascript style word frequency list
#
echo 'frequencies: {'
EXCLUDES=`cat exclude.txt |sed -e s/\\\\\\(.*\\\\\\)/\\\\\\|\\\1/ |tr -d '[:space:]'`
tr ' ' '
'<$1 |\
sed -e 's/[^a-zA-Z0-9]//g'|\
tr '[:upper:]' '[:lower:]'|\
sort |\
grep -Eiv "^(\ $EXCLUDES)$" |\
uniq -c |\
grep -iv ^\\\ *[0-9]*\\\ *$ |\
grep -iv ^\\\ *[12345]\\\ .*$ |\
sed -e 's/\( *\)\([0-9]*\)\ \([^ ]*\)/  "\3": \2,/'
echo '}'

You can download is as parse_1.sh note: Wrap the keys in double quotes to keep IE and Safari happy. This code assumes a file called exclude.txt that contains common words. I’m using the 100 most common English words, you can get that list as exclude.txt. The parse script will also drop words with a frequency below 6. Output is formatted to drop right into a javascript file.

frequencies: {
  "country": 5,
  "hope": 11,
  "just": 6,
  "led": 8,
  "me": 9,
  "moment": 8,
  "never": 10,
  "new": 6,
  "our": 9,
  "us": 7,
}

There we go, some word frequencies. Now to draw a basic tag cloud.

Tag Cloud Markup

Some basic cloud markup:

<style type="text/css">
ol.cloud { width: 300px; }
ol.cloud li { display:inline; padding: 2px 5px; }
ol.cloud li.hidden { padding: 2px 4px; }
</style>

<ol id="example_1_cloud" class="cloud">
<li style="font-size:14px;">A tag</li>
<li style="font-size:22px;">A big important tag</li>
</ol>

The important part of the CSS is the inline display of list elements. That’s what lets them wrap onto new lines, along with having newlines after the list element tags.

Drawing A Cloud In Javascript

This code uses frequency lists like the one generated earlier and styles them like the markup used above. It accepts a request like this:

my cloud = new TagCloud( 'example_cloud', {
  clouds: {
    convention: {
      frequencies: {
        blue: 4,
        country: 9,
        even: 5,
        expect: 3
      }
    },
    iowa: {
      frequencies: {
        blue: 4,
        country: 2,
        even: 5,
        let: 1
      }
    }
  }
});
// Now draw one
cloud.draw('convention');

Lines right up with the frequency lists we generated on the command line. It could also accept some arguments for customized clouds:

my cloud = new TagCloud( 'example_cloud', {
  tag_class: "tag", // By default a class of "tag" is set on all tags, change it here
  hidden_class: "hidden", // By default a class of "hidden" is applied to hidden tags
  tag_sizes: [ '8px', '16px', '30px' ], // Set as many size increments as you like
  clouds: {
    // The clouds
  }
});

So you can set three size increments or 15, whatever amount of detail you want.

TagCloud looks like this:

//
// TagCloud requires mootools v1.11 with these modules:
//
// Class.Extras, Array, Number, Element.Event, Element.Selectors,
// Window.DomReady, Fx.Style, Fx.Styles, Fx.Elements, Fx.Transitions,
// Hash
//
var TagCloud = new Class({

  initialize: function( cloud, options ) {
    this.options = $merge({
      clouds: {},
      tag_class: 'tag',
      hidden_class: 'hidden',
      tag_sizes: [ '8px', '12px', '18px', '20px', '22px', '24px', '26px', '28px' ]
    }, options);
    this.cloud = $(cloud);
    this.depth = this.options.tag_sizes.length;
    this.tags = $A();
    $each(this.options.clouds, function(v, k){
      this.reset_bounds();
      $each(v.frequencies, function(v2, k2){
        this.expand_bounds(v2);
      }.bind( this ));
      $each(v.frequencies, function(v2, k2){
        this.update_tag( k, k2, v2 );
      }.bind( this ));
    }.bind( this ));
    this.sort_tags();
  },

  update_tag: function(cloud, tag_content, frequency){
    var found = this.tags.some(function(tag, i) {
      if (tag.content == tag_content) {
        tag.cloud_weights.set(cloud, this.get_weight(frequency));
        return true;
      }
      return false;
    }.bind( this ));
    if (!found){
      var cloud_weights = new Hash();
      cloud_weights.set(cloud, this.get_weight(frequency));
      tag = { content: tag_content, cloud_weights: cloud_weights };
      tag.toString = function(){ return this.content; };
      this.tags.push(tag);
    }
  },

  get_weight: function( frequency ){
    var class_i = Math.floor(
      parseFloat(
        ((frequency-this.lower) / (this.upper-this.lower)),
          this.depth
      ) * this.depth
    );
    if (class_i == this.depth) class_i = class_i - 1;
    return class_i;
  },

  reset_bounds: function(){
    this.lower = 99999999999;
    this.upper = 0;
  },

  expand_bounds: function(v){
    if (v > this.upper) this.upper = v;
    if (v < this.lower) this.lower = v;
  },

  sort_tags: function( cloud_name ){
    this.tags.sort();
  },

  draw: function( cloud_name ) {
    $each(this.tags, function( tag, i ){
      if (!tag.element) {
        tag.element = new Element( 'li', {
          'rel': 'tag',
          'class': this.options.tag_class
        });
        tag.element.setHTML(tag.content).injectInside( this.cloud );
        tag.fx = new Fx.Styles( tag.element );
        this.cloud.appendText("\n");
      }

      if ( ''+tag.cloud_weights.get(cloud_name) != 'NaN' &&
           ''+tag.cloud_weights.get(cloud_name) != 'null') {
        if (this.options.hidden_class)
          tag.element.removeClass(this.options.hidden_class);
        tag.fx.start({
          'opacity': 1,
          'font-size': this.options.tag_sizes[tag.cloud_weights.get(cloud_name)]
        });
        return;
      }

      if ( tag.element.getStyle('opacity') != 0 ) {
        tag.fx.start({
          'opacity': 0,
          'font-size': 0
        });
        if (this.options.hidden_class)
          tag.element.addClass(this.options.hidden_class);
        return;
      }
    }.bind( this ));
  }

});

Give it a try!

As always, you can download morphing_cloud.js or play with an html example of democratic campaign speeches. Take a look at how the use of “hope” has changed over time, and how more details have emerged in recent speeches. I’m looking forward expanding this to look at some other candidates.