This document hopes to be a useful benchmark for a “toy” dataset, namely the network of 47 statistics journals studied in the paper Varin, C., Cattelan, M. and Firth, D. (2016), “Statistical modelling of citation exchange between statistics journals”. J. R. Stat. Soc. A, 179: 1–63. doi:10.1111/rssa.12124.

It isn’t always clear how the various implementations of community detection algorithms handle cases of directed and undirected graphs and the difference between weighted edges or multiple unweighted edges. Therefore, in this document we will try each of the four combinations just to see what happens. Where igraph algorithms are not implemented for weighted or directed graphs, they will throw an error, which we have reproduced under the respective tabs. Beware: just because the algorithm does not throw an error does not necessarily mean that directions or weights are actually taken into account.

Self-citations have not been explicitly removed before performing any of the procedures below (though some algorithms may implicitly or explicitly ignore them). I might try removing self-citations later to see if it makes any noticeable difference to the results.

Also to do: nice minimal plots so we don’t have to read lists/tables all day. Update: scroll down to see some visualisations.

Infomap algorithm

Directed multi-edge

  1. ANZS, Bcs, CSSC, CSTM, EES, Envr, JAS, JBS, JRSS-B, JSCS, JSS, StMed, StNee, StPap, SPL
  2. AmS, Bka, Biost, CSDA, JCGS, JRSS-A, JSPI, Mtka, SJS, StataJ, StSci
  3. BioJ, CJS, JASA, JNS, JRSS-C, LDA, StCmp, Stats, SMMR, StSin, Tech, Test
  4. AISM, AoS, Bern, CmpSt, ISR, JABES, JMA, JTSA, StMod

Directed weighted

  1. AmS, AISM, AoS, ANZS, Bern, BioJ, Bcs, Bka, Biost, CJS, CSSC, CSTM, CmpSt, CSDA, EES, Envr, ISR, JABES, JASA, JAS, JBS, JCGS, JMA, JNS, JRSS-A, JRSS-B, JRSS-C, JSCS, JSPI, JSS, JTSA, LDA, Mtka, SJS, StataJ, StCmp, Stats, StMed, SMMR, StMod, StNee, StPap, SPL, StSci, StSin, Tech, Test

Undirected multi-edge

  1. AoS, StMed
  2. JASA, StSci
  3. Bka, JMA
  4. CSDA, Test
  5. JNS, JSPI
  6. Bern, Bcs
  7. Biost, CSTM
  8. JAS, SPL
  9. JRSS-A, JRSS-B
  10. JRSS-C, StSin
  11. BioJ, JSCS
  12. LDA, SJS
  13. AISM, Mtka, StNee
  14. CmpSt, JCGS
  15. CSSC, JTSA
  16. CJS, SMMR
  17. JBS, Tech
  18. StCmp, StMod
  19. Envr, JSS
  20. Stats, StPap
  21. AmS, EES
  22. ANZS, JABES
  23. ISR, StataJ

Undirected weighted

  1. AmS, AISM, AoS, ANZS, Bern, BioJ, Bcs, Bka, Biost, CJS, CSSC, CSTM, CmpSt, CSDA, EES, Envr, ISR, JABES, JASA, JAS, JBS, JCGS, JMA, JNS, JRSS-A, JRSS-B, JRSS-C, JSCS, JSPI, JSS, JTSA, LDA, Mtka, SJS, StataJ, StCmp, Stats, StMed, SMMR, StMod, StNee, StPap, SPL, StSci, StSin, Tech, Test

Agglomerative hierarchical clustering

Directed, complete-linkage

  1. AmS
  2. AISM, JTSA, Stats, SPL
  3. AoS, Bern, CJS, JMA, JNS
  4. ANZS, JSPI, Mtka, Tech, Test
  5. BioJ, JBS
  6. Bcs, Biost, JRSS-A, JRSS-C, LDA, StMed, SMMR
  7. Bka, JASA, JRSS-B, SJS, StSin
  8. CSSC, CSTM, JAS, JSCS, StPap
  9. CmpSt, CSDA, JCGS, StCmp
  10. EES
  11. Envr, JABES
  12. ISR
  13. JSS
  14. StataJ
  15. StMod, StSci
  16. StNee

Undirected, complete-linkage

  1. AmS, ISR
  2. AISM, ANZS, CSTM, JSPI, JTSA, Mtka, Stats, StPap, SPL
  3. AoS, Bern, Bka, CJS, JASA, JCGS, JMA, JNS, JRSS-B, SJS, StCmp, StNee, StSin, Test
  4. BioJ, Bcs, Biost, JBS, JRSS-A, JRSS-C, LDA, StMed, SMMR, StMod, StSci
  5. CSSC, CmpSt, CSDA, JAS, JSCS, Tech
  6. EES, Envr, JABES
  7. JSS
  8. StataJ

Hierarchical clustering with complete linkage, applied to the undirected Pearson correlation matrix, should exactly reproduce the results of Varin et al (2016).

Edge betweenness

Directed multi-edge

  1. AmS
  2. AISM
  3. AoS, JMA, StSin
  4. ANZS
  5. Bern
  6. BioJ
  7. Bcs, StMed
  8. Bka
  9. Biost
  10. CJS
  11. CSSC
  12. CSTM
  13. CmpSt
  14. CSDA, JASA
  15. EES
  16. Envr
  17. ISR
  18. JABES
  19. JAS
  20. JBS
  21. JCGS
  22. JNS
  23. JRSS-A
  24. JRSS-B
  25. JRSS-C
  26. JSCS
  27. JSPI
  28. JSS
  29. JTSA
  30. LDA
  31. Mtka
  32. SJS
  33. StataJ
  34. StCmp
  35. Stats
  36. SMMR
  37. StMod
  38. StNee
  39. StPap
  40. SPL
  41. StSci
  42. Tech
  43. Test

Directed weighted

  1. AmS
  2. AISM
  3. AoS
  4. ANZS
  5. Bern
  6. BioJ
  7. Bcs
  8. Bka
  9. Biost
  10. CJS
  11. CSSC
  12. CSTM
  13. CmpSt
  14. CSDA
  15. EES
  16. Envr
  17. ISR
  18. JABES
  19. JASA
  20. JAS
  21. JBS
  22. JCGS
  23. JMA, JNS, JRSS-B, JSPI, SJS, StCmp, SPL, StSin, Tech, Test
  24. JRSS-A
  25. JRSS-C
  26. JSCS
  27. JSS
  28. JTSA
  29. LDA
  30. Mtka
  31. StataJ
  32. Stats
  33. StMed
  34. SMMR
  35. StMod
  36. StNee
  37. StPap
  38. StSci

Undirected multi-edge

  1. AmS
  2. AISM
  3. AoS, Bka, CSDA, JASA, JMA, JRSS-B, JSPI, StSin
  4. ANZS
  5. Bern
  6. BioJ
  7. Bcs, StMed
  8. Biost
  9. CJS
  10. CSSC
  11. CSTM
  12. CmpSt
  13. EES
  14. Envr
  15. ISR
  16. JABES
  17. JAS
  18. JBS
  19. JCGS
  20. JNS
  21. JRSS-A
  22. JRSS-C
  23. JSCS
  24. JSS
  25. JTSA
  26. LDA
  27. Mtka
  28. SJS
  29. StataJ
  30. StCmp
  31. Stats
  32. SMMR
  33. StMod
  34. StNee
  35. StPap
  36. SPL
  37. StSci
  38. Tech
  39. Test

Undirected weighted

  1. AmS
  2. AISM
  3. AoS
  4. ANZS
  5. Bern
  6. BioJ
  7. Bcs
  8. Bka
  9. Biost
  10. CJS
  11. CSSC
  12. CSTM
  13. CmpSt
  14. CSDA
  15. EES
  16. Envr
  17. ISR
  18. JABES
  19. JASA
  20. JAS
  21. JBS
  22. JCGS, StCmp, StSci
  23. JMA
  24. JNS
  25. JRSS-A
  26. JRSS-B, SMMR
  27. JRSS-C
  28. JSCS, Mtka, Stats, StPap, SPL
  29. JSPI
  30. JSS
  31. JTSA
  32. LDA, StSin
  33. SJS
  34. StataJ
  35. StMed
  36. StMod
  37. StNee
  38. Tech
  39. Test

Modularity optimisation: fast and greedy

Directed multi-edge

Looks like an error! Here is the message:
At fast_community.c:538 : fast greedy community detection works for undirected graphs only, Unimplemented function call

Directed weighted

Looks like an error! Here is the message:
At fast_community.c:538 : fast greedy community detection works for undirected graphs only, Unimplemented function call

Undirected multi-edge

Looks like an error! Here is the message:
At fast_community.c:553 : fast-greedy community finding works only on graphs without multiple edges, Invalid value

Undirected weighted

  1. AISM, ANZS, CSSC, CSTM, JAS, JMA, JNS, JSCS, JSPI, JTSA, Mtka, Stats, StNee, StPap, SPL, Tech
  2. CmpSt, CSDA, JSS
  3. AmS, ISR
  4. EES, Envr, JABES
  5. AoS, Bern, Bka, CJS, JASA, JCGS, JRSS-B, SJS, StCmp, StSin, Test
  6. BioJ, Bcs, Biost, JBS, JRSS-A, JRSS-C, LDA, StMed, SMMR, StMod, StSci
  7. StataJ

Modularity optimisation: Louvain method (Blondel et al)

Directed multi-edge

Looks like an error! Here is the message:
At community.c:2672 : multi-level community detection works for undirected graphs only, Unimplemented function call

Directed weighted

Looks like an error! Here is the message:
At community.c:2672 : multi-level community detection works for undirected graphs only, Unimplemented function call

Undirected multi-edge

  1. EES, Envr, JABES
  2. CmpSt, CSDA, JSS
  3. AISM, ANZS, CSSC, CSTM, JAS, JMA, JNS, JSCS, JSPI, JTSA, Mtka, Stats, StNee, StPap, SPL, Tech
  4. StataJ
  5. BioJ, Bcs, Biost, JBS, JRSS-A, JRSS-C, LDA, StMed, SMMR, StMod, StSci
  6. AmS, ISR
  7. AoS, Bern, Bka, CJS, JASA, JCGS, JRSS-B, SJS, StCmp, StSin, Test

Undirected weighted

  1. EES, Envr, JABES
  2. CmpSt, CSDA, JSS
  3. AISM, ANZS, CSSC, CSTM, JAS, JMA, JNS, JSCS, JSPI, JTSA, Mtka, Stats, StNee, StPap, SPL, Tech
  4. StataJ
  5. BioJ, Bcs, Biost, JBS, JRSS-A, JRSS-C, LDA, StMed, SMMR, StMod, StSci
  6. AmS, ISR
  7. AoS, Bern, Bka, CJS, JASA, JCGS, JRSS-B, SJS, StCmp, StSin, Test

Spinglass method

Directed multi-edge

  1. ANZS, BioJ, CmpSt, Envr, JABES, JAS, JCGS, JSS, StCmp, StSci, Tech
  2. AISM, Bern, CJS, CSSC, CSTM, JASA, JNS, JSCS, JTSA, Mtka, SJS, Stats, StPap, SPL, StSin, Test
  3. AmS, AoS, Bcs, Bka, Biost, CSDA, EES, ISR, JBS, JMA, JRSS-A, JRSS-B, JRSS-C, JSPI, LDA, StataJ, StMed, SMMR, StMod, StNee

Directed weighted

  1. AmS, BioJ, EES, ISR, JAS, JBS, JRSS-A, JRSS-C, JSS, LDA, SJS, StataJ, StMed, SMMR, StMod, StSci
  2. AISM, CSSC, CSTM, CmpSt, CSDA, JMA, JNS, JSCS, JSPI, JTSA, Mtka, Stats, StPap, SPL, StSin, Tech, Test
  3. AoS, ANZS, Bern, Bcs, Bka, Biost, CJS, Envr, JABES, JASA, JCGS, JRSS-B, StCmp, StNee

Undirected multi-edge

  1. ANZS, BioJ, CmpSt, Envr, JABES, JAS, JCGS, JSS, StCmp, StSci, Tech
  2. AISM, Bern, CJS, CSSC, CSTM, JASA, JNS, JSCS, JTSA, Mtka, SJS, Stats, StPap, SPL, StSin, Test
  3. AmS, AoS, Bcs, Bka, Biost, CSDA, EES, ISR, JBS, JMA, JRSS-A, JRSS-B, JRSS-C, JSPI, LDA, StataJ, StMed, SMMR, StMod, StNee

Undirected weighted

  1. AISM, ANZS, CSSC, CSTM, CmpSt, CSDA, ISR, JAS, JMA, JNS, JSCS, JSPI, JTSA, Mtka, Stats, StNee, StPap, SPL, Tech
  2. AoS, Bern, Bka, CJS, Envr, JASA, JCGS, JRSS-B, SJS, StCmp, StSin, Test
  3. AmS, BioJ, Bcs, Biost, EES, JABES, JBS, JRSS-A, JRSS-C, JSS, LDA, StataJ, StMed, SMMR, StMod, StSci

Random walks (walktrap)

Based on random walks of 6 steps.

See Pascal Pons, Matthieu Latapy: Computing communities in large networks using random walks, http://arxiv.org/abs/physics/0512106

Directed multi-edge

  1. AmS, BioJ, Bcs, Biost, EES, JABES, JBS, JRSS-A, JRSS-C, JSS, LDA, StMed, SMMR, StMod, StSci
  2. AISM, AoS, ANZS, Bern, Bka, CJS, CSSC, CSTM, CmpSt, CSDA, Envr, ISR, JASA, JAS, JCGS, JMA, JNS, JRSS-B, JSCS, JSPI, JTSA, Mtka, SJS, StCmp, Stats, StNee, StPap, SPL, StSin, Tech, Test
  3. StataJ

Directed weighted

  1. AmS, BioJ, Bcs, Biost, EES, JABES, JBS, JRSS-A, JRSS-C, JSS, LDA, StMed, SMMR, StMod, StSci
  2. AISM, AoS, ANZS, Bern, Bka, CJS, CSSC, CSTM, CmpSt, CSDA, Envr, ISR, JASA, JAS, JCGS, JMA, JNS, JRSS-B, JSCS, JSPI, JTSA, Mtka, SJS, StCmp, Stats, StNee, StPap, SPL, StSin, Tech, Test
  3. StataJ

Undirected multi-edge

  1. AmS, BioJ, Bcs, Biost, EES, JABES, JBS, JRSS-A, JRSS-C, JSS, LDA, StMed, SMMR, StMod, StSci
  2. AISM, AoS, ANZS, Bern, Bka, CJS, CSSC, CSTM, CmpSt, CSDA, Envr, ISR, JASA, JAS, JCGS, JMA, JNS, JRSS-B, JSCS, JSPI, JTSA, Mtka, SJS, StCmp, Stats, StNee, StPap, SPL, StSin, Tech, Test
  3. StataJ

Undirected weighted

  1. AmS, BioJ, Bcs, Biost, EES, JABES, JBS, JRSS-A, JRSS-C, JSS, LDA, StMed, SMMR, StMod, StSci
  2. AISM, AoS, ANZS, Bern, Bka, CJS, CSSC, CSTM, CmpSt, CSDA, Envr, ISR, JASA, JAS, JCGS, JMA, JNS, JRSS-B, JSCS, JSPI, JTSA, Mtka, SJS, StCmp, Stats, StNee, StPap, SPL, StSin, Tech, Test
  3. StataJ

Visualisation

Here I am just trying out the ggraph package on the statistical journals datatset. The indicated clusters are those from agglomerative hierarchical clustering by Varin et al.

Asymmetric MDS

Symmetric MDS

Force-directed algorithm

Spring-based algorithm

Plotting communities

Following advice given in this StackOverflow answer we will aggregate communities into nodes representing each one (i.e. “super-journals”) and then plot them again to see if anything is worth looking at.

The visualisations above are definitely not scaleable to large networks (using R and ggraph, anyway), but the ones below can be applied to very large networks, if the number of clusters within them is only around 50–100 or so.

For now, we are only looking at the clustering given by Varin et al (2016).

Circular

Asymmetric MDS

Symmetric MDS

Force-directed

Spring-based

---
title: "Clustering statistics journals"
author: "David A. Selby"
date: "`r format(Sys.Date(), '%e %B %Y')`"
output:
  html_document:
    code_download: yes
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = FALSE, cache = TRUE, results = 'asis',
                      dev.args = list(type = 'cairo'),
                      fig.width = 8, fig.height = 8)
```

This document hopes to be a useful benchmark for a "toy" dataset, namely the network of 47 statistics journals studied in the paper Varin, C., Cattelan, M. and Firth, D. (2016), "Statistical modelling of citation exchange between statistics journals". *J. R. Stat. Soc. A*, 179: 1–63. [doi:10.1111/rssa.12124](http://onlinelibrary.wiley.com/doi/10.1111/rssa.12124/abstract).

It isn't always clear how the various implementations of community detection algorithms handle cases of directed and undirected graphs and the difference between weighted edges or multiple unweighted edges.
Therefore, in this document we will try each of the four combinations just to see what happens.
Where `igraph` algorithms are not implemented for weighted or directed graphs, they will throw an error, which we have reproduced under the respective tabs.
Beware: just because the algorithm does not throw an error does not necessarily mean that directions or weights are actually taken into account.

Self-citations have not been explicitly removed before performing any of the procedures below
(though some algorithms may implicitly or explicitly ignore them).
I might try removing self-citations later to see if it makes any noticeable difference to the results.

Also to do: nice minimal plots so we don't have to read lists/tables all day.
**Update:** [scroll down](#visualisation) to see some visualisations.

```{r utilities, results = 'markup'}
listify <- function(communities,
                    mode = 'enumerate',
                    tightlist = TRUE) {
  # Input: a communities object or membership vector
  # Output: markdown list of the groups
  mode <- match.arg(mode, c('itemise', 'enumerate'))
  if ('communities' %in% class(communities))
    groupslist <- groups(communities)
  else
    groupslist <- split(names(communities), communities)
  outfn <- function(grp)
    paste(switch(mode, itemise = '-', enumerate = '1.'),
          paste(grp, collapse = ', '))
  outlist <- vapply(groupslist, outfn, character(1))
  writeLines(outlist, sep = ifelse(tightlist, '\n', '\n\n'))
}

make_title <-  function(graph) {
  switch(graph,
         directed_multi = 'Directed multi-edge',
         directed_weighted = 'Directed weighted',
         undirected_multi = 'Undirected multi-edge',
         undirected_weighted = 'Undirected weighted',
         g_scale = 'Directed, scaled weighted',
         stop('Graph not un/directed or un/weighted')) ->
    output
  writeLines(paste('\n####', output))
}

try_clustering <- function(f, graph, ...) {
  set.seed(2017) # for consistency between graphs
  tryCatch({
      make_title(deparse(substitute(graph)))
      listify(f(graph, ...))
    },
    error = function(cond) {
      cat('<div class="alert alert-danger">',
          '<strong>️Looks like an error! Here is the message:</strong><br />',
          cond$message,
          '</div>',
          sep = '\n')
    }
  )
}
```

```{r make_graphs}
suppressPackageStartupMessages(library(igraph))
C <- scrooge::citations
# Be careful! in igraph, citations go from rows->columns. So we need to use t()
directed_multi <- graph_from_adjacency_matrix(t(C))
directed_weighted <- graph_from_adjacency_matrix(t(C), weighted = TRUE)
undirected_multi <- graph_from_adjacency_matrix(C, mode = 'plus')
undirected_weighted <- graph_from_adjacency_matrix(C, mode = 'plus', weighted = TRUE)
```

### Infomap algorithm {.tabset}

```{r infomap}
try_clustering(cluster_infomap, directed_multi)
try_clustering(cluster_infomap, directed_weighted)
try_clustering(cluster_infomap, undirected_multi)
try_clustering(cluster_infomap, undirected_weighted)
```

### Agglomerative hierarchical clustering {.tabset}

```{r, hclust}
writeLines('\n#### Directed, complete-linkage')
totalC <- C + t(C) # Varin et al section 3
diag(totalC) <- diag(C)
D_dir <- as.dist(1 - cor(t(C))) # assuming we care about who they cite, not who cites them
hclust_dir <- hclust(D_dir, method = 'complete')
hclust_dir <- cutree(hclust_dir, h = 0.6)
listify(hclust_dir)

writeLines('\n#### Undirected, complete-linkage') # Should match approach of Varin et al. (2016)
D_ud <- as.dist(1 - cor(totalC))
hclust_ud <- hclust(D_ud, method = 'complete')
hclust_ud <- cutree(hclust_ud, h = 0.6)
listify(hclust_ud)
```

*Hierarchical clustering with complete linkage, applied to the undirected Pearson correlation matrix, should exactly reproduce the results of Varin et al (2016).*

### Edge betweenness {.tabset}

```{r betweenness, cache = TRUE}
#warning('The edge-betweeness algorithm is very slow!')
try_clustering(cluster_edge_betweenness, directed_multi)
try_clustering(cluster_edge_betweenness, directed_weighted)
try_clustering(cluster_edge_betweenness, undirected_multi)
try_clustering(cluster_edge_betweenness, undirected_weighted)
```

### Modularity optimisation: fast and greedy {.tabset}

```{r fast_greedy}
try_clustering(cluster_fast_greedy, directed_multi)
try_clustering(cluster_fast_greedy, directed_weighted)
try_clustering(cluster_fast_greedy, undirected_multi)
try_clustering(cluster_fast_greedy, undirected_weighted)
```

### Modularity optimisation: Louvain method (Blondel et al) {.tabset}

```{r louvain}
try_clustering(cluster_louvain, directed_multi)
try_clustering(cluster_louvain, directed_weighted)
try_clustering(cluster_louvain, undirected_multi)
try_clustering(cluster_louvain, undirected_weighted)
```

### Spinglass method {.tabset}

```{r spinglass}
try_clustering(cluster_spinglass, directed_multi)
try_clustering(cluster_spinglass, directed_weighted)
try_clustering(cluster_spinglass, undirected_multi)
try_clustering(cluster_spinglass, undirected_weighted)
```

### Random walks (walktrap) {.tabset}

Based on random walks of 6 steps.

See Pascal Pons, Matthieu Latapy: Computing communities in large networks using random walks, http://arxiv.org/abs/physics/0512106

```{r walktrap}
try_clustering(cluster_walktrap, directed_multi, steps = 6)
try_clustering(cluster_walktrap, directed_weighted, steps = 6)
try_clustering(cluster_walktrap, undirected_multi, steps = 6)
try_clustering(cluster_walktrap, undirected_weighted, steps = 6)
```

## Visualisation {.tabset}

Here I am just trying out the `ggraph` package on the statistical journals datatset.
The indicated clusters are those from [agglomerative hierarchical clustering by Varin et al][agg].

[agg]: #agglomerative-hierarchical-clustering

```{r fonts_and_stuff}
extrafont::loadfonts(quiet = TRUE)
if(!('Arial Narrow' %in% extrafont::fonts()))
  cat('*Warning*: Arial Narrow was not loaded')
```

### Asymmetric MDS

```{r mds}
gg47 <- graph_from_adjacency_matrix(t(scrooge::citations))
ggtest <- simplify(gg47, remove.loops = TRUE, remove.multiple = FALSE)
V(ggtest)$Indegree <- degree(ggtest, mode = 'in')
V(ggtest)$Outdgree <- degree(ggtest, mode = 'out')
V(ggtest)$PageRank <- page.rank(ggtest)$vector
V(ggtest)$cluster <- hclust_ud

library(ggplot2)
library(ggraph)

mds_layout <- create_layout(ggtest, layout = 'igraph', algorithm = 'mds',
                            dist = 1 - cor(C, method = 'pearson'))
ggraph(mds_layout) +
  geom_edge_fan0(alpha = 0.01,
                show.legend = FALSE,
                colour = 'steelblue') +
  geom_node_point(aes(size = PageRank, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Multidimensional Scaling of Pearson distances',
          subtitle = 'Asymmetric') +
  theme_graph()
```

### Symmetric MDS

```{r mds2}
mds_layout2 <- create_layout(ggtest, layout = 'igraph', algorithm = 'mds',
                             dist = 1 - cor(totalC, method = 'pearson'))
ggraph(mds_layout2) +
  geom_edge_fan0(alpha = 0.01,
                show.legend = FALSE,
                colour = 'steelblue') +
  geom_node_point(aes(size = PageRank, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Multidimensional Scaling of Pearson distances',
          subtitle = 'Symmetrised') +
  theme_graph()
```

### Force-directed algorithm

```{r fr}
ggraph(ggtest, layout = 'fr') +
  geom_edge_fan0(alpha = 0.01,
                show.legend = FALSE,
                colour = 'steelblue') +
  geom_node_point(aes(size = PageRank, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Fruchterman and Reingold') +
  theme_graph()
```

### Spring-based algorithm

```{r kk}
ggraph(ggtest, layout = 'kk') +
  geom_edge_fan0(alpha = 0.01,
                show.legend = FALSE,
                colour = 'steelblue') +
  geom_node_point(aes(size = PageRank, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Kamada and Kawai') +
  theme_graph()
```

## Plotting communities {.tabset}

Following advice given in [this StackOverflow answer](https://stackoverflow.com/a/20845431) we will aggregate communities into nodes representing each one (i.e. "super-journals") and then plot them again to see if anything is worth looking at.

The visualisations above are definitely not scaleable to large networks (using R and `ggraph`, anyway), but the ones below can be applied to very large networks, if the number of clusters within them is only around 50--100 or so.

For now, we are only looking at the [clustering given by Varin et al (2016)][agg].

```{r community_nodes}
# See tips here: https://stackoverflow.com/a/20845431
V(gg47)$members <- 1
V(gg47)$cluster <- as.factor(hclust_ud)
E(gg47)$count <- 1

# Contract communities into vertices
graph_hclust <- contract.vertices(graph = gg47,
                                  mapping = hclust_ud,
                                  vertex.attr.comb = list(
                                    members = 'sum', # count members
                                    cluster = 'first', # cluster ID
                                    'ignore' # drop other attributes
                                  ))
# Collapse multiple edges (and remove self-citations while we're at it)
# graph_hclust <- simplify(graph_hclust,
#                          remove.loops = FALSE,
#                          remove.multiple = TRUE,
#                          edge.attr.comb = list(
#                            count = 'sum',
#                            'ignore'
#                          ))
```

### Circular

```{r comm_plot}
# Not quite comparable to earlier MDS yet
ggraph(graph_hclust, layout = 'circle') +
  geom_edge_fan0(alpha = .01, colour = 'steelblue') +
  geom_node_point(aes(size = members, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Citations between communities',
          subtitle = 'Circular layout') +
  theme_graph()
```

### Asymmetric MDS

```{r comm_mds}
# https://stackoverflow.com/a/27512989
# Make super-journal cross-citation matrix

C_hc <- as.data.frame.table(C)
levels(C_hc$cited) <- levels(C_hc$citing) <- split(names(hclust_ud), hclust_ud)
C_hc <- xtabs(Freq ~ cited + citing, C_hc)

mds_layout_comm <- create_layout(graph_hclust,
                                 layout = 'igraph',
                                 algorithm = 'mds',
                                 dist = 1 - cor(t(C_hc)))

ggraph(mds_layout_comm) +
  geom_edge_fan0(alpha = .01, colour = 'steelblue') +
  geom_node_point(aes(size = members, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Citations between communities',
          subtitle = 'Asymmetric multidimensional scaling') +
  theme_graph()
```

### Symmetric MDS

```{r comm_mds_sym}
totalC_hc <- C_hc + t(C_hc) - diag(diag(C_hc))

mds_layout_comm2 <- create_layout(graph_hclust,
                                 layout = 'igraph',
                                 algorithm = 'mds',
                                 dist = 1 - cor(totalC_hc))

ggraph(mds_layout_comm2) +
  geom_edge_fan0(alpha = .01, colour = 'steelblue') +
  geom_node_point(aes(size = members, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Citations between communities',
          subtitle = 'Symmetric multidimensional scaling') +
  theme_graph()
```

### Force-directed

```{r comm_fr}
ggraph(graph_hclust, layout = 'fr') +
  geom_edge_fan0(alpha = .01, colour = 'steelblue') +
  geom_node_point(aes(size = members, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Citations between communities',
          subtitle = 'Fruchterman and Reingold layout') +
  theme_graph()
```

### Spring-based

```{r comm_kk}
ggraph(graph_hclust, layout = 'kk') +
  geom_edge_fan0(alpha = .01, colour = 'steelblue') +
  geom_node_point(aes(size = members, colour = as.factor(cluster))) +
  scale_colour_brewer('Cluster', type = 'qual', palette = 'Set1') +
  coord_fixed() +
  ggtitle('Citations between communities',
          subtitle = 'Kamada and Kawai layout') +
  theme_graph()
```
