Abstract
We introduce a method to predict or recommend high-potential future (i.e., not yet realized) collaborations. The proposed method is based on a combination of link prediction and machine learning techniques. First, a weighted co-authorship network is constructed. We calculate scores for each node pair according to different measures called predictors. The resulting scores can be interpreted as indicative of the likelihood of future linkage for the given node pair. To determine the relative merit of each predictor, we train a random forest classifier on older data. The same classifier can then generate predictions for newer data. The top predictions are treated as recommendations for future collaboration. We apply the technique to research collaborations between cities in Africa, the Middle East and South-Asia, focusing on the topics of malaria and tuberculosis. Results show that the method yields accurate recommendations. Moreover, the method can be used to determine the relative strengths of each predictor.


Similar content being viewed by others
References
Adamic, L., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.
Antonellis, I., Garcia-Molina, H., & Chang, C. C. (2008). Simrank++: Query rewriting through link analysis of the click graph. In Proceedings of the 34th International Conference on Very Large Data Bases (pp. 408–421). Auckland, New Zealand.
Boshoff, N. (2010). South–South research collaboration of countries in the Southern African Development Community (SADC). Scientometrics, 84(2), 481–503.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. New York: Chapman & Hall.
Egghe, L., & Rousseau, R. (2003). A measure for the cohesion of weighted networks. Journal of the American Society for Information Science and Technology, 54(3), 193–202.
Frenken, K., Hardeman, S., & Hoekman, J. (2009). Spatial scientometrics. Towards a cumulative research program. Journal of Informetrics, 3(3), 222–232.
Glänzel, W., & Gupta, B. M. (2008). Science in India. A bibliometric study of national research performance in 1991–2006. ISSI Newsletter, 4(3), 42–48.
Guns, R. (2011). Bipartite networks for link prediction: Can they improve prediction performance? In E. Noyons, P. Ngulube & J. Leta (Eds.), Proceedings of the ISSI 2011 Conference (pp. 249–260). Durban: ISSI, Leiden University, University of Zululand.
Guns, R. (2012). Missing links: Predicting interactions based on a multi-relational network structure with applications in informetrics. Doctoral dissertation, Antwerp University.
Guns, R., & Rousseau, R. (2013). Predicting and recommending potential research collaborations. In J. Gorraiz et al. (Eds.), Proceedings of ISSI 2013 (pp. 1409–1418). Vienna: AIT.
Jeh, G., & Widom, J. (2002). SimRank: A measure of structural-context similarity. In KDD’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 538–543). New York: ACM.
Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39–43.
Langville, A. N., & Meyer, C. D. (2005). A survey of eigenvector methods for web information retrieval. SIAM Review, 47(1), 135–161.
Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.
Newman, M. E. J. (2001). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64(1), 016132.
Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks, 32(3), 245–251.
Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.
Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory with application to the literature of physics. Information Processing and Management, 12(5), 297–312.
Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. J. Smola, et al. (Eds.), Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.
Schubert, T., & Sooryamoorthy, R. (2010). Can the centre–periphery model explain patterns of international scientific collaboration among threshold and industrialised countries? The case of South Africa and Germany. Scientometrics, 83(1), 181–203.
Shibata, N., Kajikawa, Y., & Sakata, I. (2012). Link prediction in citation networks. Journal of the American Society for Information Science and Technology, 63(1), 78–85.
The STIMULATE-6 Group. (2007). The Hirsch index applied to topics of interest to developing countries. First Monday, 12(2). Retrieved November 28, 2013, from http://www.firstmonday.org/issues/issue12_2/stimulate/.
Van Eck, N. J., & Waltman, L. (2007). VOS: A new method for visualizing similarities between objects. In H.-J. Lenz, & R. Decker (Eds.), Advances in Data Analysis: Proceedings of the 30th Annual Conference of the German Classification Society (pp. 299–306). Springer.
Van Eck, N. J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: University Press.
Yang, L. Y., & Jin, B. H. (2006). A co-occurrence study of international universities and institutes leading to a new instrument for detecting partners for research collaboration. ISSI Newsletter, 2(3), 7–9.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Guns, R., Rousseau, R. Recommending research collaborations using link prediction and random forest classifiers. Scientometrics 101, 1461–1473 (2014). https://doi.org/10.1007/s11192-013-1228-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-013-1228-9