TopLeaders

TopLeaders is a community mining method inspired by k-means, which extracts desired number of communities (given as input) from a network/graph. Communities are groups of highly inter-related nodes that have relatively less relations to other nodes outside their group (a.k.a clusters). TopLeades community mining algorithm assumes that each community consists of a leader and its followers. It detects different leaders and their followers based on relationships between nodes. Moreover, it determines the outlier nodes that belong to no community and the hub nodes that are bridging between different communities.

Related Publications

Download

An executable jar file of the algorithm can be downladed here, source code could be obtained here.
You could access a more recent version of this algorithm and also several other well-known community mining algorithms in Meerkat social network analysis toolbox.

Getting Started

  1. Download the jar file
  2. Download the sample network karate.pairs
  3. Open a terminal in Mac or Linux
  4. Change your directory to where you saved those files
  5. In the terminal type
  6. java -jar topLeaders.jar -i karate.pairs

  7. The result (detected communities) is saved in the same directory named "karate.pairs.clusters" while you see the progress of the algorithm in the terminal as:
  8. Graph loaded > karate.pairs: 34 nodes and 78 edges ...
    Running Global with [ k:2, o: 0.0, c: 16.0, h: 0.0 ]
    Initial leaders: [1, 33]
    Iteration: 1, leaders: [1, 34]
    Iteration: 2, leaders: [1, 34]
    Partitioned to: Number of Clusters: 2 , size: 17.0 +/- 1.0 in [16.0,18.0] , number of Hubs: 0 , Outliers: 0
    [ Communities(2): [[1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 18, 20, 22], [34, 32, 33, 9, 10, 15, 16, 19, 21, 23, 25, 24, 27, 26, 29, 28, 31, 30]]
    Hubs: []
    Outliers: []]

  9. These are optional parameters that you could set for the algorithm ("-help")
  10. -k: number of clusters (2..n, default : 2)
    -o: how much outliers should be detected, (real number, default 0, numbers in between 0 and 1 are treated as percents for automatic scaling according to networks' charactresitics, negative numbers: ignored and program would predict the proper threshold)
    -h: how much hubs should be detected (real number, default 0, numbers in between 0 and 1 are treated as percents for automatic scaling according to networks' charactresitics, negative numbers: ignored program would predict the proper threshold)
    -c: how much centers could be close together (0..1, would be automatically set by the program if no value is provided)
    -out: outputfilename (default $input.clusters)

Coverage

Input formats: .pairs, .net/.pajek, .gml
Input characteristics: weighted/unweighted homogeneous (only one type of node/relation) undirected network.

Visualized Examples

WKarate Club (weighted) by Zachary (input, groundtruth):
karate_results
Sawmill Strike data sets (input, groundtruth):
strike_results
NCAA Football Bowl Subdivision (input, groundtruth):
footbal_results