[Home]ClusteringCoefficient

ElvisBrain | RecentChanges | Preferences | FanClub? | Edit | Login

The clustering coefficient is a metric that seeks to describe the amount of clustering in a graph. Clustering is when a subset of vertices have lots of edges to each other.

Apparently first sighted in the famous paper about small world networks published in Nature: http://tam.cornell.edu/SS_nature_smallworld.pdf [1]

The definition is very loose in the small world paper though.

'Scaling exponents and clustering coefficients of a growing random network' (http://arxiv.org/abs/cond-mat/0303547 [2]) gives some more precise definitions.


Clustering coefficient for undirected graphs, with no reflexive edges:

Define the clustering coeffecient for a vertex v in the following way:

Let N (the neighbour set) be the set of vertices that are connected to v, ie. that share an edge with v (not including v). The maximum number of edges among the set N is |N|*(|N|-1)/2.

Then define the clustering coefficient for a vertex v, given that |N| >= 2, as the actual number of edges among N divided by |N|*(|N|-1)/2.

The clustering coefficient for the graph is the average clustering coefficient of the vertices, in this definition computed over all vertices with associated |N| >= 2.


The clustering coefficient of a totally random network (from [2]) is ~= <k>/N, where <k> is the average degree of the vertices, and N is the number of vertices. Higher clustering coefficients will be found in networks with greater 'clumpiness'. (This comparison is actually a bit dodgy because the clustering coefficient definition I am using is computed only over nodes with degree >= 2)

A 'small world' network is one in which the clustering coefficient is high, indicating the network has mostly 'local' edges, but which has a low characteristic path length, ie. a low average shortest path between two vertices.


Results

Class call graph results

In this case, the call graphs are treated as undirected, with an edge between two classes if at least one of the classes calls a method on the other class.

Java v1.1 runtime class library

contained 2031 classes with at least one method call two another class.

 clustering coefficient: 0.400374
 total num edges:  17732
 average node degree:  8.73067
 <k>/N:  0.00429871

<k>/N is the clustering coefficient expected if the network was totally random. The actual clustering coefficient is 3 orders of magnitude higher, indicating a high degree of clustering.

Azureus Callgraph

 1386 classes
 average clustering coefficient: 0.263343
 total num edges:  10232
 average degree:  7.3824
 <k>/N:  0.0053264

Method call graph results

In this case, methods are the nodes in the graph, and there is an (undirected) edge between two nodes if one of the methods calls the other method. Methods are 'fully qualified' with the containing class name, i.e. Applet.doSomething() is a different method/node from Widget.doSomething().

 Java v1.1 runtime class library
 12838 nodes
 average clustering coefficient: 0.106641
 total num edges:  69156
 average degree:  5.38682
 <k>/N (expected clustering coeff):  0.0004196

 Azureus
 7161 nodes
 average clustering coefficient: 0.0835065
 total num edges:  40204
 average degree:  5.6143
 <k>/N (expected clustering coeff):  0.000784011

The clustering coefficient is lower for method call graphs, as opposed to class call graphs, due to methods containing less code and being operated on less often than classes. Again, the clustering coefficients are much greater than <k>/N, indicating a lot of clustering in the graphs.


ElvisBrain | RecentChanges | Preferences | FanClub? | Edit | Login
Edit text of this page | View other revisions
Last edited July 26, 2004 6:20 pm by 203-96-147-59.paradise.net.nz (diff)
Search:

Incoming links: LegoHypothesis

PDF Version