This is equivalent to performing the initial "E" step in expectation maximisation (EM) for a multinomial mixture model.
multinomial_mix(target, profiles)
target | A vector of citations to be compared |
---|---|
profiles | A matrix of |
A vector of log-probabilities that each community generated target
's citation profile.
Here we are effectively computing the nearest point in the convex hull of community profiles,
implicitly using Kullback--Leibler divergence as the distance measure.
Alternative distance measures may be used; see nearest_point()
and others.
# To which cluster should 'Biometrika' belong? distances <- as.dist(1 - cor(citations + t(citations))) clusters <- cutree(hclust(distances), h = 0.8) profiles <- community_profile(citations, clusters) Biometrika <- citations[, 'Bka'] w <- multinomial_mix(Biometrika, profiles) which.max(w) == clusters['Bka']#> Bka #> TRUEprofiles %*% exp(w) # nearest point#> 47 x 1 Matrix of class "dgeMatrix" #> [,1] #> AmS 5.126183e-03 #> AISM 9.266562e-03 #> AoS 1.784306e-01 #> ANZS 3.746057e-03 #> Bern 2.267350e-02 #> BioJ 5.520505e-03 #> Bcs 4.554416e-02 #> Bka 8.339905e-02 #> Biost 1.360410e-02 #> CJS 1.518139e-02 #> CSSC 2.168770e-03 #> CSTM 4.140379e-03 #> CmpSt 4.929022e-03 #> CSDA 3.253155e-02 #> EES 2.957413e-03 #> Envr 8.083596e-03 #> ISR 2.563091e-03 #> JABES 2.760252e-03 #> JASA 1.575315e-01 #> JAS 1.380126e-03 #> JBS 1.971609e-04 #> JCGS 2.996845e-02 #> JMA 2.858833e-02 #> JNS 1.123817e-02 #> JRSS-A 4.140379e-03 #> JRSS-B 9.641167e-02 #> JRSS-C 7.294953e-03 #> JSCS 5.914826e-03 #> JSPI 3.391167e-02 #> JSS 4.534700e-03 #> JTSA 2.957413e-03 #> LDA 4.140379e-03 #> Mtka 3.154574e-03 #> SJS 2.227918e-02 #> StataJ 3.290657e-98 #> StCmp 1.537855e-02 #> Stats 3.548896e-03 #> StMed 2.168770e-02 #> SMMR 1.774448e-03 #> StMod 3.351735e-03 #> StNee 3.746057e-03 #> StPap 5.914826e-04 #> SPL 1.892744e-02 #> StSci 1.912461e-02 #> StSin 3.588328e-02 #> Tech 1.025237e-02 #> Test 9.463722e-03