Machine Learning/Kaggle Social Network Contest/Network Description: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
| Line 6: | Line 6: | ||
* number of edges: 7,237,983 | * number of edges: 7,237,983 | ||
== Conectivity == | |||
"A digraph is strongly connected if every vertex is reachable from every other following the directions of the arcs. On the contrary, a digraph is weakly connected if its underlying undirected graph is connected. A weakly connected graph can be thought of as a digraph in which every vertex is "reachable" from every other but not necessarily following the directions of the arcs. A strong orientation is an orientation that produces a strongly connected digraph." [http://en.wikipedia.org/wiki/Glossary_of_graph_theory wikipedia] | |||
* The Graph is '''not''' weakly connected | |||
* It contains 27 subgraphs This means that it can be broken down into at least two discrete subgraphs. | |||
** c.f. [http://cneurocvs.rmki.kfki.hu/igraph/doc/R/clusters.html igraph clustering] | |||
** There is one very large cluster containing all but 154 verticies, then 4 with size 10 - 37, 8 sized 3 - 7, 13 size 2 and one lonely dude. | |||
* I also grabbed the number of strongly connected subgraphs | |||
{| border="1" | |||
|- | |||
!| Cluster Size | |||
| 1 | |||
| 2 | |||
| 3 | |||
| 4 | |||
| 5 | |||
| 9 | |||
| 10 | |||
| 32464 | |||
|- | |||
!| freq | |||
|1100647 | |||
| 162 | |||
| 18 | |||
|5 | |||
|4 | |||
| 1 | |||
| 1 | |||
| 1 | |||
|} | |||
* Diameter of the directed graph | * Diameter of the directed graph | ||
** This is the longest of the shortest directed paths between two nodes | ** This is the longest of the shortest directed paths between two nodes | ||
Revision as of 20:59, 22 November 2010
Here we can put the descriptive statistics of the network:
- Number of fully sampled nodes: 37,689
- ie the unique "outnodes" in the edge list
- Total number of nodes: 1,133,547
- number of edges: 7,237,983
Conectivity
"A digraph is strongly connected if every vertex is reachable from every other following the directions of the arcs. On the contrary, a digraph is weakly connected if its underlying undirected graph is connected. A weakly connected graph can be thought of as a digraph in which every vertex is "reachable" from every other but not necessarily following the directions of the arcs. A strong orientation is an orientation that produces a strongly connected digraph." wikipedia
- The Graph is not weakly connected
- It contains 27 subgraphs This means that it can be broken down into at least two discrete subgraphs.
- c.f. igraph clustering
- There is one very large cluster containing all but 154 verticies, then 4 with size 10 - 37, 8 sized 3 - 7, 13 size 2 and one lonely dude.
- I also grabbed the number of strongly connected subgraphs
| Cluster Size | 1 | 2 | 3 | 4 | 5 | 9 | 10 | 32464 |
|---|---|---|---|---|---|---|---|---|
| freq | 1100647 | 162 | 18 | 5 | 4 | 1 | 1 | 1 |
- Diameter of the directed graph
- This is the longest of the shortest directed paths between two nodes
- R igraph
- diameter (dg, directed = TRUE, unconnected = TRUE)
- Was taking forever so I aborted (after 34 minutes...)
- Total number of direct neighbours out: 7 275 672, in: 508 688, all: 7 473 273
- For each of our 38k I calculated the number of outbound neighbours and summed it
- R igraph:
- sum(neighborhood.size(dg, 1, nodes=myGuys, mode="out"))
- mode = "in", "out" or "all"