This document hopes to be a useful benchmark for a “toy” dataset, namely the network of 47 statistics journals studied in the paper Varin, C., Cattelan, M. and Firth, D. (2016), “Statistical modelling of citation exchange between statistics journals”. J. R. Stat. Soc. A, 179: 1–63. doi:10.1111/rssa.12124.
It isn’t always clear how the various implementations of community detection algorithms handle cases of directed and undirected graphs and the difference between weighted edges or multiple unweighted edges. Therefore, in this document we will try each of the four combinations just to see what happens. Where igraph algorithms are not implemented for weighted or directed graphs, they will throw an error, which we have reproduced under the respective tabs. Beware: just because the algorithm does not throw an error does not necessarily mean that directions or weights are actually taken into account.
Self-citations have not been explicitly removed before performing any of the procedures below (though some algorithms may implicitly or explicitly ignore them). I might try removing self-citations later to see if it makes any noticeable difference to the results.
Also to do: nice minimal plots so we don’t have to read lists/tables all day. Update: scroll down to see some visualisations.
Infomap algorithm
Directed multi-edge
- ANZS, Bcs, CSSC, CSTM, EES, Envr, JAS, JBS, JRSS-B, JSCS, JSS, StMed, StNee, StPap, SPL
- AmS, Bka, Biost, CSDA, JCGS, JRSS-A, JSPI, Mtka, SJS, StataJ, StSci
- BioJ, CJS, JASA, JNS, JRSS-C, LDA, StCmp, Stats, SMMR, StSin, Tech, Test
- AISM, AoS, Bern, CmpSt, ISR, JABES, JMA, JTSA, StMod
Directed weighted
- AmS, AISM, AoS, ANZS, Bern, BioJ, Bcs, Bka, Biost, CJS, CSSC, CSTM, CmpSt, CSDA, EES, Envr, ISR, JABES, JASA, JAS, JBS, JCGS, JMA, JNS, JRSS-A, JRSS-B, JRSS-C, JSCS, JSPI, JSS, JTSA, LDA, Mtka, SJS, StataJ, StCmp, Stats, StMed, SMMR, StMod, StNee, StPap, SPL, StSci, StSin, Tech, Test
Undirected multi-edge
- AoS, StMed
- JASA, StSci
- Bka, JMA
- CSDA, Test
- JNS, JSPI
- Bern, Bcs
- Biost, CSTM
- JAS, SPL
- JRSS-A, JRSS-B
- JRSS-C, StSin
- BioJ, JSCS
- LDA, SJS
- AISM, Mtka, StNee
- CmpSt, JCGS
- CSSC, JTSA
- CJS, SMMR
- JBS, Tech
- StCmp, StMod
- Envr, JSS
- Stats, StPap
- AmS, EES
- ANZS, JABES
- ISR, StataJ
Undirected weighted
- AmS, AISM, AoS, ANZS, Bern, BioJ, Bcs, Bka, Biost, CJS, CSSC, CSTM, CmpSt, CSDA, EES, Envr, ISR, JABES, JASA, JAS, JBS, JCGS, JMA, JNS, JRSS-A, JRSS-B, JRSS-C, JSCS, JSPI, JSS, JTSA, LDA, Mtka, SJS, StataJ, StCmp, Stats, StMed, SMMR, StMod, StNee, StPap, SPL, StSci, StSin, Tech, Test
Agglomerative hierarchical clustering
Directed, complete-linkage
- AmS
- AISM, JTSA, Stats, SPL
- AoS, Bern, CJS, JMA, JNS
- ANZS, JSPI, Mtka, Tech, Test
- BioJ, JBS
- Bcs, Biost, JRSS-A, JRSS-C, LDA, StMed, SMMR
- Bka, JASA, JRSS-B, SJS, StSin
- CSSC, CSTM, JAS, JSCS, StPap
- CmpSt, CSDA, JCGS, StCmp
- EES
- Envr, JABES
- ISR
- JSS
- StataJ
- StMod, StSci
- StNee
Undirected, complete-linkage
- AmS, ISR
- AISM, ANZS, CSTM, JSPI, JTSA, Mtka, Stats, StPap, SPL
- AoS, Bern, Bka, CJS, JASA, JCGS, JMA, JNS, JRSS-B, SJS, StCmp, StNee, StSin, Test
- BioJ, Bcs, Biost, JBS, JRSS-A, JRSS-C, LDA, StMed, SMMR, StMod, StSci
- CSSC, CmpSt, CSDA, JAS, JSCS, Tech
- EES, Envr, JABES
- JSS
- StataJ
Hierarchical clustering with complete linkage, applied to the undirected Pearson correlation matrix, should exactly reproduce the results of Varin et al (2016).
Edge betweenness
Directed multi-edge
- AmS
- AISM
- AoS, JMA, StSin
- ANZS
- Bern
- BioJ
- Bcs, StMed
- Bka
- Biost
- CJS
- CSSC
- CSTM
- CmpSt
- CSDA, JASA
- EES
- Envr
- ISR
- JABES
- JAS
- JBS
- JCGS
- JNS
- JRSS-A
- JRSS-B
- JRSS-C
- JSCS
- JSPI
- JSS
- JTSA
- LDA
- Mtka
- SJS
- StataJ
- StCmp
- Stats
- SMMR
- StMod
- StNee
- StPap
- SPL
- StSci
- Tech
- Test
Directed weighted
- AmS
- AISM
- AoS
- ANZS
- Bern
- BioJ
- Bcs
- Bka
- Biost
- CJS
- CSSC
- CSTM
- CmpSt
- CSDA
- EES
- Envr
- ISR
- JABES
- JASA
- JAS
- JBS
- JCGS
- JMA, JNS, JRSS-B, JSPI, SJS, StCmp, SPL, StSin, Tech, Test
- JRSS-A
- JRSS-C
- JSCS
- JSS
- JTSA
- LDA
- Mtka
- StataJ
- Stats
- StMed
- SMMR
- StMod
- StNee
- StPap
- StSci
Undirected multi-edge
- AmS
- AISM
- AoS, Bka, CSDA, JASA, JMA, JRSS-B, JSPI, StSin
- ANZS
- Bern
- BioJ
- Bcs, StMed
- Biost
- CJS
- CSSC
- CSTM
- CmpSt
- EES
- Envr
- ISR
- JABES
- JAS
- JBS
- JCGS
- JNS
- JRSS-A
- JRSS-C
- JSCS
- JSS
- JTSA
- LDA
- Mtka
- SJS
- StataJ
- StCmp
- Stats
- SMMR
- StMod
- StNee
- StPap
- SPL
- StSci
- Tech
- Test
Undirected weighted
- AmS
- AISM
- AoS
- ANZS
- Bern
- BioJ
- Bcs
- Bka
- Biost
- CJS
- CSSC
- CSTM
- CmpSt
- CSDA
- EES
- Envr
- ISR
- JABES
- JASA
- JAS
- JBS
- JCGS, StCmp, StSci
- JMA
- JNS
- JRSS-A
- JRSS-B, SMMR
- JRSS-C
- JSCS, Mtka, Stats, StPap, SPL
- JSPI
- JSS
- JTSA
- LDA, StSin
- SJS
- StataJ
- StMed
- StMod
- StNee
- Tech
- Test
Modularity optimisation: fast and greedy
Directed multi-edge
Looks like an error! Here is the message:
At fast_community.c:538 : fast greedy community detection works for undirected graphs only, Unimplemented function call
Directed weighted
Looks like an error! Here is the message:
At fast_community.c:538 : fast greedy community detection works for undirected graphs only, Unimplemented function call
Undirected multi-edge
Looks like an error! Here is the message:
At fast_community.c:553 : fast-greedy community finding works only on graphs without multiple edges, Invalid value
Undirected weighted
- AISM, ANZS, CSSC, CSTM, JAS, JMA, JNS, JSCS, JSPI, JTSA, Mtka, Stats, StNee, StPap, SPL, Tech
- CmpSt, CSDA, JSS
- AmS, ISR
- EES, Envr, JABES
- AoS, Bern, Bka, CJS, JASA, JCGS, JRSS-B, SJS, StCmp, StSin, Test
- BioJ, Bcs, Biost, JBS, JRSS-A, JRSS-C, LDA, StMed, SMMR, StMod, StSci
- StataJ
Modularity optimisation: Louvain method (Blondel et al)
Directed multi-edge
Looks like an error! Here is the message:
At community.c:2672 : multi-level community detection works for undirected graphs only, Unimplemented function call
Directed weighted
Looks like an error! Here is the message:
At community.c:2672 : multi-level community detection works for undirected graphs only, Unimplemented function call
Undirected multi-edge
- EES, Envr, JABES
- CmpSt, CSDA, JSS
- AISM, ANZS, CSSC, CSTM, JAS, JMA, JNS, JSCS, JSPI, JTSA, Mtka, Stats, StNee, StPap, SPL, Tech
- StataJ
- BioJ, Bcs, Biost, JBS, JRSS-A, JRSS-C, LDA, StMed, SMMR, StMod, StSci
- AmS, ISR
- AoS, Bern, Bka, CJS, JASA, JCGS, JRSS-B, SJS, StCmp, StSin, Test
Undirected weighted
- EES, Envr, JABES
- CmpSt, CSDA, JSS
- AISM, ANZS, CSSC, CSTM, JAS, JMA, JNS, JSCS, JSPI, JTSA, Mtka, Stats, StNee, StPap, SPL, Tech
- StataJ
- BioJ, Bcs, Biost, JBS, JRSS-A, JRSS-C, LDA, StMed, SMMR, StMod, StSci
- AmS, ISR
- AoS, Bern, Bka, CJS, JASA, JCGS, JRSS-B, SJS, StCmp, StSin, Test
Spinglass method
Directed multi-edge
- ANZS, BioJ, CmpSt, Envr, JABES, JAS, JCGS, JSS, StCmp, StSci, Tech
- AISM, Bern, CJS, CSSC, CSTM, JASA, JNS, JSCS, JTSA, Mtka, SJS, Stats, StPap, SPL, StSin, Test
- AmS, AoS, Bcs, Bka, Biost, CSDA, EES, ISR, JBS, JMA, JRSS-A, JRSS-B, JRSS-C, JSPI, LDA, StataJ, StMed, SMMR, StMod, StNee
Directed weighted
- AmS, BioJ, EES, ISR, JAS, JBS, JRSS-A, JRSS-C, JSS, LDA, SJS, StataJ, StMed, SMMR, StMod, StSci
- AISM, CSSC, CSTM, CmpSt, CSDA, JMA, JNS, JSCS, JSPI, JTSA, Mtka, Stats, StPap, SPL, StSin, Tech, Test
- AoS, ANZS, Bern, Bcs, Bka, Biost, CJS, Envr, JABES, JASA, JCGS, JRSS-B, StCmp, StNee
Undirected multi-edge
- ANZS, BioJ, CmpSt, Envr, JABES, JAS, JCGS, JSS, StCmp, StSci, Tech
- AISM, Bern, CJS, CSSC, CSTM, JASA, JNS, JSCS, JTSA, Mtka, SJS, Stats, StPap, SPL, StSin, Test
- AmS, AoS, Bcs, Bka, Biost, CSDA, EES, ISR, JBS, JMA, JRSS-A, JRSS-B, JRSS-C, JSPI, LDA, StataJ, StMed, SMMR, StMod, StNee
Undirected weighted
- AISM, ANZS, CSSC, CSTM, CmpSt, CSDA, ISR, JAS, JMA, JNS, JSCS, JSPI, JTSA, Mtka, Stats, StNee, StPap, SPL, Tech
- AoS, Bern, Bka, CJS, Envr, JASA, JCGS, JRSS-B, SJS, StCmp, StSin, Test
- AmS, BioJ, Bcs, Biost, EES, JABES, JBS, JRSS-A, JRSS-C, JSS, LDA, StataJ, StMed, SMMR, StMod, StSci
Random walks (walktrap)
Based on random walks of 6 steps.
See Pascal Pons, Matthieu Latapy: Computing communities in large networks using random walks, http://arxiv.org/abs/physics/0512106
Directed multi-edge
- AmS, BioJ, Bcs, Biost, EES, JABES, JBS, JRSS-A, JRSS-C, JSS, LDA, StMed, SMMR, StMod, StSci
- AISM, AoS, ANZS, Bern, Bka, CJS, CSSC, CSTM, CmpSt, CSDA, Envr, ISR, JASA, JAS, JCGS, JMA, JNS, JRSS-B, JSCS, JSPI, JTSA, Mtka, SJS, StCmp, Stats, StNee, StPap, SPL, StSin, Tech, Test
- StataJ
Directed weighted
- AmS, BioJ, Bcs, Biost, EES, JABES, JBS, JRSS-A, JRSS-C, JSS, LDA, StMed, SMMR, StMod, StSci
- AISM, AoS, ANZS, Bern, Bka, CJS, CSSC, CSTM, CmpSt, CSDA, Envr, ISR, JASA, JAS, JCGS, JMA, JNS, JRSS-B, JSCS, JSPI, JTSA, Mtka, SJS, StCmp, Stats, StNee, StPap, SPL, StSin, Tech, Test
- StataJ
Undirected multi-edge
- AmS, BioJ, Bcs, Biost, EES, JABES, JBS, JRSS-A, JRSS-C, JSS, LDA, StMed, SMMR, StMod, StSci
- AISM, AoS, ANZS, Bern, Bka, CJS, CSSC, CSTM, CmpSt, CSDA, Envr, ISR, JASA, JAS, JCGS, JMA, JNS, JRSS-B, JSCS, JSPI, JTSA, Mtka, SJS, StCmp, Stats, StNee, StPap, SPL, StSin, Tech, Test
- StataJ
Undirected weighted
- AmS, BioJ, Bcs, Biost, EES, JABES, JBS, JRSS-A, JRSS-C, JSS, LDA, StMed, SMMR, StMod, StSci
- AISM, AoS, ANZS, Bern, Bka, CJS, CSSC, CSTM, CmpSt, CSDA, Envr, ISR, JASA, JAS, JCGS, JMA, JNS, JRSS-B, JSCS, JSPI, JTSA, Mtka, SJS, StCmp, Stats, StNee, StPap, SPL, StSin, Tech, Test
- StataJ
Visualisation
Here I am just trying out the ggraph package on the statistical journals datatset. The indicated clusters are those from agglomerative hierarchical clustering by Varin et al.
Asymmetric MDS

Symmetric MDS

Force-directed algorithm

Spring-based algorithm

Plotting communities
Following advice given in this StackOverflow answer we will aggregate communities into nodes representing each one (i.e. “super-journals”) and then plot them again to see if anything is worth looking at.
The visualisations above are definitely not scaleable to large networks (using R and ggraph, anyway), but the ones below can be applied to very large networks, if the number of clusters within them is only around 50–100 or so.
For now, we are only looking at the clustering given by Varin et al (2016).
Circular

Asymmetric MDS

Symmetric MDS

Force-directed

Spring-based

---
title: "Clustering statistics journals"
author: "David A. Selby"
date: "`r format(Sys.Date(), '%e %B %Y')`"
output:
  html_document:
    code_download: yes
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = FALSE, cache = TRUE, results = 'asis',
                      dev.args = list(type = 'cairo'),
                      fig.width = 8, fig.height = 8)
```

This document hopes to be a useful benchmark for a "toy" dataset, namely the network of 47 statistics journals studied in the paper Varin, C., Cattelan, M. and Firth, D. (2016), "Statistical modelling of citation exchange between statistics journals". *J. R. Stat. Soc. A*, 179: 1–63. [doi:10.1111/rssa.12124](http://onlinelibrary.wiley.com/doi/10.1111/rssa.12124/abstract).

It isn't always clear how the various implementations of community detection algorithms handle cases of directed and undirected graphs and the difference between weighted edges or multiple unweighted edges.
Therefore, in this document we will try each of the four combinations just to see what happens.
Where `igraph` algorithms are not implemented for weighted or directed graphs, they will throw an error, which we have reproduced under the respective tabs.
Beware: just because the algorithm does not throw an error does not necessarily mean that directions or weights are actually taken into account.

Self-citations have not been explicitly removed before performing any of the procedures below
(though some algorithms may implicitly or explicitly ignore them).
I might try removing self-citations later to see if it makes any noticeable difference to the results.

Also to do: nice minimal plots so we don't have to read lists/tables all day.
**Update:** [scroll down](#visualisation) to see some visualisations.

```{r utilities, results = 'markup'}
listify <- function(communities,
                    mode = 'enumerate',
                    tightlist = TRUE) {
  # Input: a communities object or membership vector
  # Output: markdown list of the groups
  mode <- match.arg(mode, c('itemise', 'enumerate'))
  if ('communities' %in% class(communities))
    groupslist <- groups(communities)
  else
    groupslist <- split(names(communities), communities)
  outfn <- function(grp)
    paste(switch(mode, itemise = '-', enumerate = '1.'),
          paste(grp, collapse = ', '))
  outlist <- vapply(groupslist, outfn, character(1))
  writeLines(outlist, sep = ifelse(tightlist, '\n', '\n\n'))
}

make_title <-  function(graph) {
  switch(graph,
         directed_multi = 'Directed multi-edge',
         directed_weighted = 'Directed weighted',
         undirected_multi = 'Undirected multi-edge',
         undirected_weighted = 'Undirected weighted',
         g_scale = 'Directed, scaled weighted',
         stop('Graph not un/directed or un/weighted')) ->
    output
  writeLines(paste('\n####', output))
}

try_clustering <- function(f, graph, ...) {
  set.seed(2017) # for consistency between graphs
  tryCatch({
      make_title(deparse(substitute(graph)))
      listify(f(graph, ...))
    },
    error = function(cond) {
      cat('<div class="alert alert-danger">',
          '<strong>️Looks like an error! Here is the message:</strong><br />',
          cond$message,
          '</div>',
          sep = '\n')
    }
  )
}
```

```{r make_graphs}
suppressPackageStartupMessages(library(igraph))
C <- scrooge::citations
# Be careful! in igraph, citations go from rows->columns. So we need to use t()
directed_multi <- graph_from_adjacency_matrix(t(C))
directed_weighted <- graph_from_adjacency_matrix(t(C), weighted = TRUE)
undirected_multi <- graph_from_adjacency_matrix(C, mode = 'plus')
undirected_weighted <- graph_from_adjacency_matrix(C, mode = 'plus', weighted = TRUE)
```

### Infomap algorithm {.tabset}

```{r infomap}
try_clustering(cluster_infomap, directed_multi)
try_clustering(cluster_infomap, directed_weighted)
try_clustering(cluster_infomap, undirected_multi)
try_clustering(cluster_infomap, undirected_weighted)
```

### Agglomerative hierarchical clustering {.tabset}

```{r, hclust}
writeLines('\n#### Directed, complete-linkage')
totalC <- C + t(C) # Varin et al section 3
diag(totalC) <- diag(C)
D_dir <- as.dist(1 - cor(t(C))) # assuming we care about who they cite, not who cites them
hclust_dir <- hclust(D_dir, method = 'complete')
hclust_dir <- cutree(hclust_dir, h = 0.6)
listify(hclust_dir)

writeLines('\n#### Undirected, complete-linkage') # Should match approach of Varin et al. (2016)
D_ud <- as.dist(1 - cor(totalC))
hclust_ud <- hclust(D_ud, method = 'complete')
hclust_ud <- cutree(hclust_ud, h = 0.6)
listify(hclust_ud)
```

*Hierarchical clustering with complete linkage, applied to the undirected Pearson correlation matrix, should exactly reproduce the results of Varin et al (2016).*

### Edge betweenness {.tabset}

```{r betweenness, cache = TRUE}
#warning('The edge-betweeness algorithm is very slow!')
try_clustering(cluster_edge_betweenness, directed_multi)
try_clustering(cluster_edge_betweenness, directed_weighted)
try_clustering(cluster_edge_betweenness, undirected_multi)
try_clustering(cluster_edge_betweenness, undirected_weighted)
```

### Modularity optimisation: fast and greedy {.tabset}

```{r fast_greedy}
try_clustering(cluster_fast_greedy, directed_multi)
try_clustering(cluster_fast_greedy, directed_weighted)
try_clustering(cluster_fast_greedy, undirected_multi)
try_clustering(cluster_fast_greedy, undirected_weighted)
```

### Modularity optimisation: Louvain method (Blondel et al) {.tabset}

```{r louvain}
try_clustering(cluster_louvain, directed_multi)
try_clustering(cluster_louvain, directed_weighted)
try_clustering(cluster_louvain, undirected_multi)
try_clustering(cluster_louvain, undirected_weighted)
```

### Spinglass method {.tabset}

```{r spinglass}
try_clustering(cluster_spinglass, directed_multi)
try_clustering(cluster_spinglass, directed_weighted)
try_clustering(cluster_spinglass, undirected_multi)
try_clustering(cluster_spinglass, undirected_weighted)
```

### Random walks (walktrap) {.tabset}

Based on random walks of 6 steps.

See Pascal Pons, Matthieu Latapy: Computing communities in large networks using random walks, http://arxiv.org/abs/physics/0512106

```{r walktrap}
try_clustering(cluster_walktrap, directed_multi, steps = 6)
try_clustering(cluster_walktrap, directed_weighted, steps = 6)
try_clustering(cluster_walktrap, undirected_multi, steps = 6)
try_clustering(cluster_walktrap, undirected_weighted, steps = 6)
```

## Visualisation {.tabset}

Here I am just trying out the `ggraph` package on the statistical journals datatset.
The indicated clusters are those from [agglomerative hierarchical clustering by Varin et al][agg].

[agg]: #agglomerative-hierarchical-clustering

```{r fonts_and_stuff}
extrafont::loadfonts(quiet = TRUE)
if(!('Arial Narrow' %in% extrafont::fonts()))
  cat('*Warning*: Arial Narrow was not loaded')
```

### Asymmetric MDS

```{r mds}
gg47 <- graph_from_adjacency_matrix(t(scrooge::citations))
ggtest <- simplify(gg47, remove.loops = TRUE, remove.multiple = FALSE)
V(ggtest)$Indegree <- degree(ggtest, mode = 'in')
V(ggtest)$Outdgree <- degree(ggtest, mode = 'out')
V(ggtest)$PageRank <- page.rank(ggtest)$vector
V(ggtest)$cluster <- hclust_ud

library(ggplot2)
library(ggraph)

mds_layout <- create_layout(ggtest, layout = 'igraph', algorithm = 'mds',
                            dist = 1 - cor(C, method = 'pearson'))
ggraph(mds_layout) +
  geom_edge_fan0(alpha = 0.01,
                show.legend = FALSE,
                colour = 'steelblue') +
  geom_node_point(aes(size = PageRank, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Multidimensional Scaling of Pearson distances',
          subtitle = 'Asymmetric') +
  theme_graph()
```

### Symmetric MDS

```{r mds2}
mds_layout2 <- create_layout(ggtest, layout = 'igraph', algorithm = 'mds',
                             dist = 1 - cor(totalC, method = 'pearson'))
ggraph(mds_layout2) +
  geom_edge_fan0(alpha = 0.01,
                show.legend = FALSE,
                colour = 'steelblue') +
  geom_node_point(aes(size = PageRank, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Multidimensional Scaling of Pearson distances',
          subtitle = 'Symmetrised') +
  theme_graph()
```

### Force-directed algorithm

```{r fr}
ggraph(ggtest, layout = 'fr') +
  geom_edge_fan0(alpha = 0.01,
                show.legend = FALSE,
                colour = 'steelblue') +
  geom_node_point(aes(size = PageRank, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Fruchterman and Reingold') +
  theme_graph()
```

### Spring-based algorithm

```{r kk}
ggraph(ggtest, layout = 'kk') +
  geom_edge_fan0(alpha = 0.01,
                show.legend = FALSE,
                colour = 'steelblue') +
  geom_node_point(aes(size = PageRank, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Kamada and Kawai') +
  theme_graph()
```

## Plotting communities {.tabset}

Following advice given in [this StackOverflow answer](https://stackoverflow.com/a/20845431) we will aggregate communities into nodes representing each one (i.e. "super-journals") and then plot them again to see if anything is worth looking at.

The visualisations above are definitely not scaleable to large networks (using R and `ggraph`, anyway), but the ones below can be applied to very large networks, if the number of clusters within them is only around 50--100 or so.

For now, we are only looking at the [clustering given by Varin et al (2016)][agg].

```{r community_nodes}
# See tips here: https://stackoverflow.com/a/20845431
V(gg47)$members <- 1
V(gg47)$cluster <- as.factor(hclust_ud)
E(gg47)$count <- 1

# Contract communities into vertices
graph_hclust <- contract.vertices(graph = gg47,
                                  mapping = hclust_ud,
                                  vertex.attr.comb = list(
                                    members = 'sum', # count members
                                    cluster = 'first', # cluster ID
                                    'ignore' # drop other attributes
                                  ))
# Collapse multiple edges (and remove self-citations while we're at it)
# graph_hclust <- simplify(graph_hclust,
#                          remove.loops = FALSE,
#                          remove.multiple = TRUE,
#                          edge.attr.comb = list(
#                            count = 'sum',
#                            'ignore'
#                          ))
```

### Circular

```{r comm_plot}
# Not quite comparable to earlier MDS yet
ggraph(graph_hclust, layout = 'circle') +
  geom_edge_fan0(alpha = .01, colour = 'steelblue') +
  geom_node_point(aes(size = members, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Citations between communities',
          subtitle = 'Circular layout') +
  theme_graph()
```

### Asymmetric MDS

```{r comm_mds}
# https://stackoverflow.com/a/27512989
# Make super-journal cross-citation matrix

C_hc <- as.data.frame.table(C)
levels(C_hc$cited) <- levels(C_hc$citing) <- split(names(hclust_ud), hclust_ud)
C_hc <- xtabs(Freq ~ cited + citing, C_hc)

mds_layout_comm <- create_layout(graph_hclust,
                                 layout = 'igraph',
                                 algorithm = 'mds',
                                 dist = 1 - cor(t(C_hc)))

ggraph(mds_layout_comm) +
  geom_edge_fan0(alpha = .01, colour = 'steelblue') +
  geom_node_point(aes(size = members, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Citations between communities',
          subtitle = 'Asymmetric multidimensional scaling') +
  theme_graph()
```

### Symmetric MDS

```{r comm_mds_sym}
totalC_hc <- C_hc + t(C_hc) - diag(diag(C_hc))

mds_layout_comm2 <- create_layout(graph_hclust,
                                 layout = 'igraph',
                                 algorithm = 'mds',
                                 dist = 1 - cor(totalC_hc))

ggraph(mds_layout_comm2) +
  geom_edge_fan0(alpha = .01, colour = 'steelblue') +
  geom_node_point(aes(size = members, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Citations between communities',
          subtitle = 'Symmetric multidimensional scaling') +
  theme_graph()
```

### Force-directed

```{r comm_fr}
ggraph(graph_hclust, layout = 'fr') +
  geom_edge_fan0(alpha = .01, colour = 'steelblue') +
  geom_node_point(aes(size = members, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Citations between communities',
          subtitle = 'Fruchterman and Reingold layout') +
  theme_graph()
```

### Spring-based

```{r comm_kk}
ggraph(graph_hclust, layout = 'kk') +
  geom_edge_fan0(alpha = .01, colour = 'steelblue') +
  geom_node_point(aes(size = members, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Citations between communities',
          subtitle = 'Kamada and Kawai layout') +
  theme_graph()
```
