TopLeaders

TopLeaders is a community mining method inspired by k-means, which extracts desired number of communities (given as input) from a network/graph. Communities are groups of highly inter-related nodes that have relatively less relations to other nodes outside their group (a.k.a clusters). TopLeades community mining algorithm assumes that each community consists of a leader and its followers. It detects different leaders and their followers based on relationships between nodes. Moreover, it determines the outlier nodes that belong to no community and the hub nodes that are bridging between different communities.

Related Publications

Reihaneh Rabbany K. and Osmar R. Zaiane, A Diffusion of Innovation-Based Closeness Measure for Network Associations, the COMMPER Workshop on Mining Communities and People Recommenders, in conjunction with the IEEE International Conference on Data Mining (ICDM), Vancouver, Canada, December 2011
Reihaneh Rabbany K., Jiyang Chen, Osmar R. Zaiane, Top leaders Community Detection Approach in Information Networks, 4th SNA-KDD Workshop on Social Network Mining and Analysis, in conjunction with the ACM SIGKDD conference on Knowledge Discovery and Data Mining (KDD), Washington, DC, July 25 2010

Download

An executable jar file of the algorithm can be downladed here, source code could be obtained here.
You could access a more recent version of this algorithm and also several other well-known community mining algorithms in Meerkat social network analysis toolbox.

Getting Started

Download the jar file
Download the sample network karate.pairs
Open a terminal in Mac or Linux
Change your directory to where you saved those files
In the terminal type

java -jar topLeaders.jar -i karate.pairs

The result (detected communities) is saved in the same directory named "karate.pairs.clusters" while you see the progress of the algorithm in the terminal as:

Graph loaded > karate.pairs: 34 nodes and 78 edges ...
Running Global with [ k:2, o: 0.0, c: 16.0, h: 0.0 ]
Initial leaders: [1, 33]
Iteration: 1, leaders: [1, 34]
Iteration: 2, leaders: [1, 34]
Partitioned to: Number of Clusters: 2 , size: 17.0 +/- 1.0 in [16.0,18.0] , number of Hubs: 0 , Outliers: 0
[ Communities(2): [[1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 18, 20, 22], [34, 32, 33, 9, 10, 15, 16, 19, 21, 23, 25, 24, 27, 26, 29, 28, 31, 30]]
Hubs: []
Outliers: []]

These are optional parameters that you could set for the algorithm ("-help")

-k: number of clusters (2..n, default : 2)
-o: how much outliers should be detected, (real number, default 0, numbers in between 0 and 1 are treated as percents for automatic scaling according to networks' charactresitics, negative numbers: ignored and program would predict the proper threshold)
-h: how much hubs should be detected (real number, default 0, numbers in between 0 and 1 are treated as percents for automatic scaling according to networks' charactresitics, negative numbers: ignored program would predict the proper threshold)
-c: how much centers could be close together (0..1, would be automatically set by the program if no value is provided)
-out: outputfilename (default $input.clusters)

Coverage

Input formats: .pairs, .net/.pajek, .gml
Input characteristics: weighted/unweighted homogeneous (only one type of node/relation) undirected network.

Visualized Examples

WKarate Club (weighted) by Zachary (input, groundtruth):
karate_results
Sawmill Strike data sets (input, groundtruth):
strike_results
NCAA Football Bowl Subdivision (input, groundtruth):
footbal_results