Posted By

adkatrit on 02/11/11

Tagged

Versions (?)

Last Edited at 02/11/11 03:05am

Statistics

Viewed 164 times

Favorited by 1 user(s)

Related snippets

Hierarchical Clustering

/ Published in: Python

Helps if you have a distance function. This will be implementation specific, but this is the main algo

Expand | Embed | Plain Text

Copy this code and paste it in your HTML

def hcluster(rows,distance=pearson):
  distances={}
  currentclustid=-1
 
  # Clusters are initially just the rows
  clust=[bicluster(rows[i],id=i) for i in range(len(rows))]
 
  while len(clust)>1:
    lowestpair=(0,1)
    closest=distance(clust[0].vec,clust[1].vec)
 
    # loop through every pair looking for the smallest distance
    for i in range(len(clust)):
      for j in range(i+1,len(clust)):
        # distances is the cache of distance calculations
        if (clust[i].id,clust[j].id) not in distances: 
          distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec)
 
        d=distances[(clust[i].id,clust[j].id)]
 
        if d<closest:
          closest=d
          lowestpair=(i,j)
 
    # calculate the average of the two clusters
    mergevec=[
    (clust[lowestpair[0]].vec[i]+clust[lowestpair[1]].vec[i])/2.0 
    for i in range(len(clust[0].vec))]
 
    # create the new cluster
    newcluster=bicluster(mergevec,left=clust[lowestpair[0]],
                         right=clust[lowestpair[1]],
                         distance=closest,id=currentclustid)
 
    # cluster ids that weren't in the original set are negative
    currentclustid-=1
    del clust[lowestpair[1]]
    del clust[lowestpair[0]]
    clust.append(newcluster)
 
  return clust[0]

Report this snippet Tweet

Comments

Subscribe to comments

Comment:

You need to login to post a comment.