1 Introductions

With the rapid advancements in network science, the diffusion process within online social networks (OSNs) has attracted significant attention, primarily because it effectively models real-world information dissemination (Yang and Pei 2021). Therefore, social networks have been extensively explored across various knowledge domains, including rumor propagation (Bian et al. 2019), game theory (Hafiene et al. 2020), transportation networks (Khan et al. 2017), and optimization control (Jiang et al. 2015). Online platforms like weblogs, social communities, virtual worlds, and tagging systems have transformed online communication and navigation, offering diverse forms of interaction. A specific type of online social network (OSN), known as a weblog, facilitates global user connections for sharing information, common interests, and real-life activities. Weblogs such as MetaFilter serve various purposes including information flow, idea exchange, disease dissemination, and the spread of influence among online users (Jiang et al. 2015; Tulu et al. 2018). Additionally, nodes within weblogs exhibit distinct interaction patterns that encompass a wide range of relationships (Liu et al. 2021). Consequently, the substantial volume of interaction data provides researchers and data scientists with unique opportunities to systematically explore online user behavior.

Weblogs can be considered as graphs, where users are represented as nodes, and social connections between users are depicted as links. Weblogs offer a rich array of features, including the ability to create posts, engage in discussions through comments and replies, share multimedia content such as photos, videos, and documents, publish articles, and establish follower-followee relationships. For example, Lifehacker,Footnote 1 a popular weblog boasting over a million users, generates an annual revenue of approximately $250 million. Consequently, conducting a detailed analysis can yield valuable insights, ranging from uncovering relationship patterns among users to gaining a deeper understanding of societal issues. Furthermore, comprehending the structural properties and characteristics of weblogs and other social media platforms can unveil critical information about their diverse functions and dynamics. These insights can include the role and contribution of a node in the dynamic processes of a social network, such as its orientation and impact. One of the underlying reasons for these phenomena is that nodes possess varying abilities to disseminate information, rumors, or diseases, depending on factors like their number and organization of connections, text sentiment, and the temporal sequence of their posts. Therefore, the quantification and ranking of user influence are importance, as network stability depends on network topology, which, in turn, is closely tied to the flow of information, a factor heavily influenced.

by the level of user influence within the network (Masuda et al. 2017; Liao et al. 2017).

User influence within an online platform signifies the capacity to persuade fellow users to alter their opinions or behaviors (Engel and Broeck 2012; Fortunato 2010). The dissemination of user influence depends on the strength of connections between users in the network, with connection strength determined by the number of overlapping neighbors shared by any pair of network users (Ishfaq et al. 2017). Consequently, it is practical to concentrate on a selected or niche group of network nodes capable of exerting influence over the opinions, decisions, and behaviors of others (Ishfaq et al. 2017)–(Trusov et al. 2010). These individuals are commonly referred to as influential users or influentials. While influential users typically exhibit significant connection strength, this is not an absolute rule (Zareie et al. 2019a, 2019b). Recently, the domain of influential user mining has gained substantial popularity as an active area of research (Erlandsson et al. 2016)–(Khanday et al. 2018). Identifying important or influential users holds several practical applications, including viral marketing strategies that can expedite information dissemination for promoting various products and services (Zareie et al. 2020a; Ishfaq et al. 2022a). Consequently, a comprehensive analysis can assist in devising effective strategies to detect dynamic trends and gain a competitive marketing advantage (Yujie 2020). Additionally, influential user mining finds utility in applications such as curbing the spread of unwanted content, viruses, and negative behaviors (Zhang and Ghorbani 2020; Zhu et al. 2016).

In many cases, existing research identifies influential users using enhanced centrality measures or hybrid centrality models. For instance, one approach involves refining the k-shell algorithm by removing locally densely connected subgraphs, where each edge is assessed in terms of its influence dissemination within the network and its potential impact on the spreading process (Liu et al. 2015). Similarly, an effective strategy for addressing the monotonicity issue of k-shell centrality is achieved by striking a balance between a user's degree and their coreness in a scale-free network. On a different note, it has been observed that more insights into structural importance can be acquired in comparison to local or global characterizations of a node (Liang and Li 2014). However, enhanced accuracy often comes at the cost of increased computational complexity that resembles the performance characteristics of most centrality algorithms, including PageRank, betweenness centrality, and closeness centrality (Liang and Li 2014). Similarly, a study by Al-Garadi et al. (Al-Garadi et al. 2016) utilizes both local and global features of a network based on a user's network reach and region of influence to rank influential spreaders. Additionally, a combination of structural hole theory with degree and closeness centralities results in higher accuracy when identifying influential users (Qiao et al. 2021). Alternatively, a hybrid centrality approach effectively identifies influential users by incorporating multiple centrality measures (Fei and Deng 2017). This approach generates an interval, subsequently used to rank each node based on the lower and upper limits derived from various centrality measures. Finally, a network importance score is assigned to each node based on the median of the interval.

Furthermore, traditional centrality models such as degree, betweenness, and closeness centralities frequently result in diverse rankings for the same set of nodes (Zareie et al. 2020b). Similarly, the presence of local-tight connections can give rise to core-like structures that can challenge the k-shell rankings of a substantial number of nodes in real-world networks. Up to this point, much of the existing research has leaned towards either relying solely on local environmental variables or global importance indices, and studies that employ hybrid centrality models have often suffered from efficiency issues (Camacho et al. 2020; Wen and Cheong 2021). As a result, accurately capturing the true influence of a node within the network remains a challenging endeavor.

This study introduces a novel hybrid ranking model aimed at assessing node centrality using a multi-criteria approach which proves more effective than the existing centrality models based on single-criterion approach. The hybrid methodology outlined in this research integrates six classical network centrality measures with textual and temporal attributes to construct a node ranking model that measures the influence of each node within the weblog network. A notable feature of this approach is the introduction of a temporal component, which assigns greater significance to recent social interactions. Each of the six centrality measures is assigned an appropriate weight, determined through the well-established concept of information entropy (Guiaşu 1971). These weighted centrality criteria are then unified into a single ranking for the network nodes through an effective aggregation technique known as TOPSIS (Hwang and Yoon 1981). The effectiveness of this hybrid centrality model is evaluated using various assessment methods, including SIR simulation (Zhou et al. 2006), Jaccard similarity, Spearman rank correlation, and the frequency of nodes sharing the same ranks. The evaluation encompasses comparisons with both classical and contemporary centrality measures. The key contributions of this research can be summarized as follows:

  • Proposal of a novel multi-criteria centrality model named CSTUserRank (Centrality and Sentiment based Temporal User Rank) designed to rank the most influential nodes in a weblog network

  • Introduction of an innovative temporal feature aimed at capturing node influence in dynamic weblog networks

  • Implementation of an effective weighting scheme that objectively assigns weights to each individual feature within the multi-criteria decision matrix (MCDM)

  • Exploration of a robust aggregation strategy for consolidating the weighted centrality criteria into a single, effective ranking reflective of a node's network importance

  • Comparative analysis of CSTUserRank against classical and contemporary centrality measures, utilizing various validation techniques

The rest of this study is structured as follows: Sect. 2 offers a brief review of the most pertinent research articles; Sect. 3 provides essential background information on both traditional and current centrality measures; Sect. 4 outlines the novel CSTUserRank approach, provides specifics about the employed datasets, and presents a brief description of the evaluation methods; Sect. 5 presents an analysis of the results obtained through experimental. Finally, Sect. 6 concludes the study and.

suggests potential avenues for future research based on the empirical analysis presented in Sect. 5.

2 Related work

This section presents a concise review of the most relevant studies on ranking the important nodes in social and weblog networks. These studies employ classical centrality methods, heuristic techniques, or hybrid models given as following:

In dynamic social networks, evaluating the centrality of certain nodes cause inaccurate estimation results that can create obstacles (Lerman et al. 2010). Nonetheless, this problem can be addressed by acknowledging that a path connecting the source to the destination through intermediate nodes must exist at various time intervals. In addition, time-aware ranking by parameterizing the factors such as time and length of the social interaction can further enhance the ranking of network nodes. Nonetheless, a single measure with improved explicit encoding of time-aware changes can complicate the visual representation of the network as well as increase the probability of losing important temporal information. Similarly, recency bias and recognition of the important nodes pose a serious problem (Ghosh et al. 2011). The important nodes are unable to form as much connections as compared to their older counterparts. Therefore, an effective strategy based on contagion matrix has been proposed that considers the temporal order of connections as well as chains of connections to a network node with some form of decay factors (Ghosh et al. 2011). However, the study is based on the assumption that there are no citations among articles that were published in the same year. In addition, PageRank or centrality-based models are not designed for ranking scholarly articles.

On the other hand, it has been identified that the connections between nodes and the duration of those connections play a crucial role in determining the network centralities within a network (Uddin et al. 2014). The results prove that temporal degree centrality is effective as compared to traditional degree centrality at micro cluster level. However, the temporal centrality lacks the ability to evaluate the most important time instants that could lead to different conceptions of temporal centrality of a node targeted at specific applications. Subsequently, research has demonstrated that node importance changes over time in dynamic social networks. Therefore, aggregating values that reflect the significance of a node throughout the entire network timespan is not pertinent (Magnien and Tarissan 2015). Moreover, extending the concept of closeness centrality over time can efficiently identify significant nodes within the network. The results show that different networks show different characteristics regarding time-aware closeness centrality. However, the study assesses the node importance according to variation in the network cohesion prior and after the node shrinks. In addition, the proposed temporal closeness centrality is unable to distinguish the network importance of nodes with similar network topologies.

On the other hand, research discovers that temporal versions of PageRank show higher accuracy for networks that are represented by user activity in terms of sequence of time-dependent edges (Rozenshtein and Gionis 2016). In addition, true flow of information in the network is captured through temporal random walk. However, keeping edge distribution constant, temporal and static PageRank show almost similar ranking results. In addition, the temporal PageRank is inefficient in terms of updating online. Subsequently, an integrated approach based on evolution of direct neighbors and position of a node is an effective candidate for capturing node importance by taking advantage of longitudinal nature of a dynamic social network (Orman et al. 2017). The study is based on frequent pattern mining for identifying meaningful behavioral trends, categorical features, and individual categorization of specific nodes according to temporal order. However, the cluster analysis qualitatively investigates the neighborhood evolution of a node in the network. Similarly, venue and prestige of a node in citation network can be utilized in time-varying PageRank (Ma et al. 2018). The research includes a block-wise time-variant PageRank extension for understanding the features and an incremental algorithm for dynamic ranking of scholarly articles. The incremental approach periodically partitions graph network using various updating techniques for affected and unaffected graph areas.

Subsequently, a heuristic approach for predicting users with high centrality scores in a dynamic network avoids expensive shortest-path calculations during each time-sensitive snapshot (Sarkar et al. 2018). In the first stage, the model forecasts the overlapping coefficient between the set of nodes with high closeness and betweenness centrality in previous time stamps and the set of high centrality nodes in future timestamps. Secondly, the innermost core is analyzed to identify only the user IDs of high centrality nodes. However, the study sticks to only shortest distance-based prediction scheme and prone to noise or failures while conforming to the property of core connectedness.

Similarly, temporal PageRank has been introduced that ranks nodes as well as time layers in time-variant network (Lv et al. 2019a). The model, known as f-PageRank, takes into account the eigenvector centrality of a multi-level, multiplex network. Unlike other models, the f-PageRank ensures uniqueness without relying on the assumption of irreducibility on the adjacency matrix of a temporal social network. In addition, convergence, and the rate of convergence of f-PageRank is established by the existing results. However, the parameter employed for tuning the relationships between adjoining time layers is constant and coupling relationship between the time layers relies on neighbor time layers as well as on multiple segment information within the network in time. On the other hand, Lv et al., (Lv et al. 2019b) incorporates node similarity to classical eigenvector centrality for node ranking under tensor computation framework. The model derives fourth-order tensor for representing complex multilayer temporal networks. The relationship between different time stamps is represented through node similarity and defined tensor, therefore, tensor equation is computed for final node centrality values. However, the study suffers from selecting appropriate node similarity index, therefore, the proposed method is unable to rank important nodes on some datasets.

Subsequently, research has been proposed (Lv et al. 2021) that extends the further order tensor (Lv et al. 2019b) by considering (sixth order) tensor for representing multilayer time varying networks. The extended research considers time-dependence as well as measuring strength of interlayer interactions for an effective analysis of network topological structure (Lv et al. 2021). However, the model shows efficiency under some conditions, for instance, small scale temporal networks. Conversely, studies have revealed that different centrality measures, including betweenness centrality, closeness centrality, and PageRank, can be combined into a single ranking metric that can be valuable in comprehending the early approval and adoption of various products and services in the market (Jain and Sinha 2022). In addition, the proposed framework identifies and ranks the central nodes that possess high connectivity and spreading power in the network.

Similarly, Alshahrani et al., (Alshahrani et al. 2020) posit that degree and Katz centralities can be integrated into a single ranking model by focusing on local and general node strength in the network. The study highlights the identification of highly influential users in directed and undirected graphs with a comparatively lower time complexity and higher accuracy than using a single centrality criterion. Subsequently, PSAIIM (Mnasri et al. 2021) is an algorithm that runs in parallel and takes into account semantic features such as user interests and social behavior to identify the most influential users in a network, ranked in top-k order. In addition, the proposed algorithm introduces ‘influence-BFS tree’ for optimal selection of seed nodes. On the other hand, research predicts the future influencers in a social network using past social interactions (Salve et al. 2021). The authors have devised a mechanism of combining current temporal centrality measures for predicting influential users from Facebook groups. According to the results, approximately 30% of members in each group are among the top 10 most influential members of that group. Interestingly, research identifies that local influential users are significant in spreading situational information related to disasters (Fan et al. 2021). However, domain knowledge and regular social interactions can enhance the online spreading capabilities of influential users. Additionally, scale free structure of social networks can facilitate new users in joining the situational information thread that can establish connections with the influential users. Furthermore, research identifies that classical centrality measures can be integrated using objective weighting schemes and efficient combining methods for effective ranking of influential users in the network (Ishfaq et al. 2022b).

In short, the existing literature suffers from several research limitations. For instance, limitations of existing centrality measures in static networks when utilized in dynamic network contexts (Lerman et al. 2010). Similarly, lack of consideration for the dynamic nature of complex networks in existing node importance algorithms which leads to recency bias and the failure to recognize the significance of new nodes (Ghosh et al. 2011). Likewise, research identifies the limitation of traditional degree centrality measures in capturing the temporal aspect of relationships within patient-physician networks in healthcare organizations (Uddin et al. 2014). In addition, research identifies the limitation of traditional PageRank algorithms in capturing temporal dynamics, particularly, changes in node importance over time (Magnien and Tarissan 2015). Furthermore, effective ranking of scholarly articles is challenging, considering their heterogeneous, evolving, and dynamic nature (Ma et al. 2018). Similarly, research identifies relatively limited number of tools available for analyzing dynamic networks compared to static ones (Orman et al. 2017). On the other hand, research identifies that prediction of influential vertices is challenging in time-varying networks, particularly those with high betweenness and closeness centralities (Sarkar et al. 2018). Subsequently, research identifies the need of a new centrality measure for temporal networks as the time variant PageRank suffers from the limitations of manually setting a parameter and the consideration of only neighboring time layers (Lv et al. 2019a). On the other hand, research identifies the problem of limited interaction among the influential users in online social networks for the dissemination of situational information in disasters (Fan et al. 2021). Similarly, research identifies the challenge in capturing a set of top influential users for maximizing information dissemination (Alshahrani et al. 2020). Finally, recent research identifies the challenge in capturing inter-layer similarity and strength of interactions between different layers in the network (Lv et al. 2021).

On the other hand, the existing literature offers several solutions aimed at addressing the current limitations in identifying the influential users. For instance, Lv et al. (Lerman et al. 2010) introduced a method that considers temporal interactions and analyzes paths between nodes, resulting in improved accuracy in assessing node importance in citation networks. Similarly, the study by Ghosh et al. (Ghosh et al. 2011) emphasizes the importance of incorporating temporal aspects by proposing time-aware dynamic centrality metrics for ranking in citation networks. Time Scale Degree Centrality (TSDC) (Uddin et al. 2014) offers insights into patient-physician interactions and their impact on hospital length of stay, revealing important understanding. Magnien and Tarissan (Magnien and Tarissan 2015) presented a temporal extension of closeness centrality in dynamic social networks, while Ma et al. (Ma et al. 2018) extended traditional PageRank with a time decaying factor, known as Time-Weighted PageRank, for computing the prestige of articles and venues. Additionally, Orman (Orman et al. 2017) proposed a method tailored to exploit the longitudinal nature of dynamic networks by characterizing each node based on the evolution of its direct neighborhood. Sarkar et al. (Sarkar et al. 2018) introduced novel heuristics and a two-step algorithm to alleviate the need for costly computations in each temporal snapshot and provide accurate predictions based on innermost core analysis. Furthermore, Alshahrani et al. (Alshahrani et al. 2020) combined centralization measures (degree centrality and Katz centrality) to select seed nodes with high influence potential, aiming to improve both efficiency and effectiveness in influence maximization. Finally, in this study, we propose a hybrid node ranking model tailored for community weblogs by integrating classical network centralities with textual and temporal features. The study employs an effective weighting mechanism based on information entropy and leverages and effective integration strategy for unified importance ranking. Table 1 provides a summary of the research articles discussed in the related work section.

Table 1 Summary of the studies as discussed in the related work

3 Background information

In this section, necessary definitions have been provided for some of the classical centrality measures, recently developed centrality models, and other relevant methods. Let’s suppose a weblog network \(WL=(A, P)\) where \(A\) represents the network nodes and \(P\) indicates the paths or edges between a pair of network nodes. In addition, \(M=\{{m}_{xy}\}\) denotes the adjacency matrix of the weblog network. Similarly, \(x\) and \(y\) are considered as the network nodes such that \({m}_{xy}=1\) depending upon on a direct connection between \(x\) and \(y\); in all other cases, \({m}_{xy}=0\).

3.1 Degree centrality (DC)

DC value of \(x\) denotes number of immediate neighbors of \(x\) given as

$$DC\left(x\right)=\frac{1}{n-1}\sum_{y=1}^{n}{m}_{xy}$$
(1)

where \({m}_{xy}\) denotes the \(x{y}^{th}\) value from the adjacency matrix \(m\). Similarly, \(n\) shows the numeric strength of network nodes. In a directed graph network, Eq. (1) computes DC score by summing up all the connections of node \(x\) to other nodes, and then divide by n − 1, which is the maximum possible number of connections a node can have in the network. Due to the denominator term n − 1, the resultant DC score of a node falls within the range [0, 1].

3.2 Closeness centrality (CC)

CC indicates how easily a node can reach all other nodes in a network. CC indicates mean of the shortest distance that is.

taken from the node under consideration to remaining of the nodes given as

$$CC\left(x\right)=\frac{n-1}{\sum_{y=1}^{n}{d}_{xy}}$$
(2)

where \({d}_{xy}\) represents minimum distance between the nodes \(x\) and \(y\) in the network. The value of \({d}_{xy}\) approaches to infinity in case of non-existence of direct path between \(x\) and \(y\).

3.3 Betweenness centrality (BC)

BC measures the significance of a node, such as \(x\), by counting the frequency of \(x\) on the shortest paths in the network. The output is then normalized with the total shortest paths that pass through any pair of network nodes in the weblog network, which is given by:

$$BC\left(x\right)=\sum_{m, n\ne i}\frac{{\sigma }_{m,n }(x)}{{\sigma }_{m,n }}$$
(3)

Here, \({\sigma }_{m,n }(x)\) represents the number of shortest paths passing through the node \(x\), whereas \({\sigma }_{m,n}\) denotes total shortest paths between any pair of nodes in the network.

3.4 Katz centrality

Katz centrality considers relative influence of a network node by measuring its immediate neighbors and other nodes that are linked via these first-order neighbors to the node under consideration. Equation (4) computes the immediate neighbors of a node \(x\) given as

$${C}_{Katz}=\sum_{z=1}^{\infty }\sum_{y=1}^{n}{o}^{z}{\left({M}^{z}\right)}_{yx}$$
(4)

The above equation shows the value at \({(x,y)}^{th}\) place of the \({M}^{z}\) such that \(M\) represents the adjacency matrix. Therefore, \({M}^{z}\) represents z-degree connections between \(x\) and \(y\). Similarly, \(o\) represents the attenuation factor that penalizes the links with distant neighbors.

3.5 PageRank (PR)

PR is a commonly used centrality measure that evaluates the significance of a node in a network by considering the quantity and quality of the links directed towards it. The computation of PR given as

$$PR\left(x\right)=\left(1-d\right)+d\sum_{y\epsilon {C}_{x}}\frac{PR(y)}{O(y)}$$
(5)

Here, \({C}_{x}\) represents the set of all nodes that are directly linked to the node \(x\). Equation (5) shows that \(PR\) computation of node \(x\) requires the computation of \(PR\) of the node \(y\). Similarly, \(O(y)\) represents outgoing connections that shows \(y\) referencing other nodes. Finally, the term \(d\) denotes the damping factor and its value, after several experiments, is taken as 0.85 (Boldi et al. 2005).

3.6 Combined centrality measures

Research suggests that the models based on integration of classical centrality measures show better performance in identifying the influential users as compared to single centrality criterion such as degree centrality or PageRank (Fei et al. 2017). The study constructs a centrality ranking matrix of users where each column represents a unique centrality criterion including degree, betweenness, and closeness centralities. The final influence score is represented by the median value between the maximum and minimum values of each centrality criterion (Fei et al. 2017).

3.7 Extended H-index (Ex_HI)

The extended h-index (Zareie and Sheikhahmadi 2019) used in the model takes into account the ability of a node's neighbors to spread information and compute the node's dissemination power in the network. This extended version is based on a semi-local model that utilizes the degrees of a node's neighbors to improve the accuracy of the classical h-index.

3.8 MinCDegKatz d-hops and MaxCDegKatz d-hops

The study of Alshahrani et. al (Alshahrani et al. 2020) introduced two centrality algorithms that integrates Katz centrality and DC with threshold-based combined (i.e., maximal and minimal) propagation probability both undirected and directed networks. The process involves calculating the number of edges between a node and its neighbors that are at a distance greater than a certain threshold from the total number of network edges. Next, the degree centrality (DC) and Katz centrality are calculated for each node, and the nodes are ranked in descending order to select the seed set. Subsequently, the node getting the highest Katz score are stored in a queue as seed nodes. Later, neighbors of the most recent nodes are identified such that each of the neighbors is at a radius distance away from the node being investigated. Next, the non-repeating neighbors out of all radius-far neighbors are picked as the selected neighbors. Then, the selected neighbors are filtered and nodes with highest DC score are picked as a seed set and stored in the queue. Subsequently, the seed set in queue is taken off from the original data and this cycle carries on for a pre-determined period.

3.9 Sentiment analysis

Sentiment Analysis (SA) is a popular text analysis technique for analyzing text polarity i.e., positive, negative, or neutral from a document, paragraph, or a sentence (Hutto and Gilbert 2014). SA measures the emotion and attitude of a speaker or a writer using computational analysis of subjectivity in the text. This study computes the sentiment analysis using a popular rule-based toolkit called VADER that is available in NLTK package in Python for sentiment analysis of unlabeled text data, particularly social media texts. VADER is based on dictionary that links the lexical features to intensity of emotions also called sentiment score. Final sentiment value is based on the sum of emotional intensities of each word or text-based element. For instance, words like ‘enjoy’, ‘love’, or ‘happy’ convey a positive and optimistic sentiment. In addition, VADER is an intelligently designed tool that comprehends the basic context of the words, for instance, ‘did not buy’ as the negative text polarity.

3.9.1 Information entropy (IE)

IE refers to the degree of disorder in data and it is derived from transportation models for assessing trips dispersal between source and destination (Borda 2011; Que et al. 2021). However, this study employs IE for calibrating objective weights to each single centrality measure in MCDM. IE is generally utilized when the values in matrix are known, however, weights for each single-criteria are unclear (Que et al. 2021). In addition, IE eliminates the subjectivity caused by human involvement in assigning weights. The IE weight computation process is explained in detail in Sect. 4.4.

3.9.2 Topsis

TOPSIS (Behzadian et al. 2012) is a commonly used method for decision makers who want to conduct cost–benefit analysis to examine various aspects of available solutions. TOPSIS is based on the concept that alternative solutions should be farthest from the ideal negative and closest to the ideal positive solutions (Łatuszyńska 2014). Consequently, bad performance in one centrality criterion is balanced by relatively good in another. Therefore, utilizing IE in combination with TOPSIS effectively assigns final user rank based on weighted integration of adopted and newly proposed centrality measures.

4 Research methodology

The proposed methodology of CSTUserRank, illustrated in Fig. 1, comprises three primary components. The first component conducts various analyses on real-world weblog networks. The analyses comprise three main components: network, temporal, and sentiment. Network analysis utilizes six classical centrality measures, temporal analysis involves a novel time-based user ranking, and sentiment analysis categorizing user posts into positive, negative, or neutral. Additionally, this section presents a detailed discussion on the proposed temporal analysis. Following this, the initial step of the research methodology generates an MCDM (Multiple Criteria Decision Making) matrix with dimensions of n*8, where n represents the number of users in the dataset. For each user, six network centrality measures, text sentiment, and temporal ranks contribute to the eight columns (or features) of the MCDM. Subsequently, the second component of CSTUserRank applies Information Entropy (IE) for each user to calibrate weights and produce weighted centralities. Finally, TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) is utilized as an aggregation approach to compute a single ranking score for each user. Furthermore, the ranking process of CSTUserRank is evaluated against both classical and recent centrality algorithms using diverse evaluation measures.

Fig. 1
figure 1

Framework of the proposed CSTUserRank

4.1 Temporal analysis

The objective of a novel temporal analysis is to define time intervals that capture maximum changes occurring in a dynamic weblog network and each time interval is assigned a weight such that most recent time intervals get higher weights. Earlier studies fail to assign higher priority to more recent time intervals in dynamic social networks (Wang et al. 2020)–(Mahmoudi et al. 2020b). Generally, influential users attract the attention of many other users, however, in some cases, they still receive many comments or replies without being so active. Nonetheless, temporal analysis of user activities requires careful slicing of time depending upon data distribution of the weblog. For instance, time interval could be three months, one month, or one week depending upon properties of the social network. Time interval should be small for covering maximum changes in a high activity user network and vice versa. The analysis of MetaFilter dataset shoes that the employed dataset has user posts that span over a decade from June 6, 2006 to April 7, 2017. Therefore, after careful deliberation, time interval is set as one month that captures maximum user activity in the form of online interactions including posts and comments in the network. Since a significant event or announcement that sparks ongoing debate or speculation, such as a major political election, a high-profile product launch, or a global crisis remain relevant and alive, mostly, for 3–4 weeks (Azmi et al. 2018). The overall steps in computing the time intervals are presented in Fig. 1. First, the time stamps in the entire dataset are converted into integer values and secondly, the resultant integer values are sorted their mean and standard deviation (\({S}_{D}\)) are obtained. The \({S}_{D}\) values are computed using Eq. (6) given as

$$Standard\,Deviation\left( {S_{D} } \right) = \sqrt {\frac{{\sum _{{i = 1}}^{n} (v_{i} - \bar{V})^{2} }}{n}}$$
(6)

where \(\overline{V }\) represents the mean that is obtained using the following equation:

$$\overline{V } \left(mean\right)=\frac{\sum {v}_{i}}{n}$$
(7)

Here, \({v}_{i}\) indicates each single time stamp and \(n\) represents total number of time stamps. Subsequently, Eq. (8) mathematically divides range between the first and the last connections into time slices given as:

$$Time\,slice\left( {ts_{1} = \left[ {f_{c} ,f_{c} + S_{D} } \right],tc_{2} = \left[ {f_{c} + \left( {2 - 1} \right)*S_{D} ,f_{c} + 2*S_{D} } \right], \ldots .,t_{k} = \left[ {f_{c} + \left( {k - 1} \right)*S_{D} ,f_{c} + (k*S_{D} )} \right]} \right)$$
(8)

Here, \({f}_{c}\) denotes the initial connection in the network, and \(k\) represents number of iterations. The value of \(k\) depends on \(D\) that is the range between the first and the last connection in the network. The next step involves calculating the connections made in each time slice and assigning weights to users in each slice by giving higher importance to the more recent time slice. The study employs.

a simple exponential smoothing (SES) method for assigning more weight to recent time intervals, which is a commonly used technique in data mining based on time series. SES predicts current value based on previous value and is computed as

$$WT=\left(1-\beta \right){\omega }_{tn}+\beta \left(1-\beta \right){\omega }_{tn-1}+{\beta }^{2}\left(1-\beta \right){\omega }_{tn-2}+\dots$$
(9)

where \(\beta\) is a constant and its value is taken as \(0.51\). Table 2 shows a simple example for better explanation of SES.

Table 2 new connections of a user established in each single time slice
$${WT}_{user-A}=0.51*4+0.51\left(1-0.51\right)*2+{0.51}^{2}\left(1-0.51\right)*3+{0.51}^{3}\left(1-0.51\right)*6=3.19$$

Finally, the proposed temporal rank sorts the users in descending order according to their assigned user weights, with a linear time complexity. The upper part of Fig. 2 represents the process for computing time slices that executes \(k\) number of times, therefore, complexity of computing time slices is a constant. However, the second portion of Fig. 2 represents the computation of user weights. Figure 2 shows that number of connections established in each of the \(k\) time slices are computed for each user out of \(n\) number of users. As \(k\) is a constant that represents number of time intervals, therefore, the time complexity of computing user weights is linear. Hence, the overall time complexity of computing temporal ranks of users takes linear time (Alshahrani et al. 2020).

Fig. 2
figure 2

Novel temporal ranking approach

4.2 MCDM formulation

The formulation or construction of MCDM \({M}_{mcdm}[x, y]\) is based on user centralities such that \({N}_{c}=\{{c}_{1},{c}_{2}, {c}_{3}, \dots \dots ., {c}_{m}\}\) where \({c}_{x}\) represents each centrality criterion including DC, CC, BC, etc. In step 2 of Fig. 1, first sub-section (MCDM) represents this integration and normalization of centrality measures. The \({M}_{mcdm}[x, y]\) based on \(m\) different centrality criteria are given as.

$$M_{mcdm} \left[ {x, y} \right] = \left[ \begin{gathered} m_{1} \left( {c_{1} } \right) m_{1} \left( {c_{2} } \right) m_{1} \left( {c_{3} } \right) \ldots \ldots \ldots \ldots m_{1} \left( {c_{m} } \right) \hfill \\ m_{2} \left( {c_{1} } \right) m_{2} \left( {c_{2} } \right) m_{2} \left( {c_{3} } \right) \ldots \ldots \ldots \ldots m_{2} \left( {c_{m} } \right) \hfill \\ u_{2} \left( { ._{1} } \right) u_{2} \left( {.c_{2} } \right) u_{2} .\left( {c_{3} } \right) . \ldots \ldots \ldots \ldots u_{2} \left( {._{m} } \right) \hfill \\ \left( {._{1} } \right) u_{2} \left( {.c_{2} } \right) u_{2} .\left( {c_{3} } \right) \ldots . \ldots \ldots u_{2} ( . \hfill \\ \left( { ._{1} } \right) u_{2} \left( {.c_{2} } \right) u_{2} .\left( {c_{3} } \right) \ldots \ldots \ldots . \ldots u_{2} ._{m} \hfill \\ m_{n} \left( {c_{1} } \right) m_{n} \left( {c_{2} } \right) m_{n} \left( {c_{3} } \right) \ldots \ldots \ldots \ldots m_{n} \left( {c_{m} } \right) \hfill \\ \end{gathered} \right]$$

Where \({m}_{x}\left({c}_{y}\right)\) represents the value of \({y}^{th}\) centrality criterion for the user \({m}_{x}\).

4.3 MCDM normalization

The obtained MCDM \({M}_{mcdm}\left[x, y\right]\) is further normalized using sum-of-square normalization method given as

$$m^{{^{\prime } }} xy = \frac{{m_{x} (c_{y} )}}{{\sqrt {\sum\limits_{{x = 1}}^{{x = n}} {\left\{ {m_{x} (c_{y} )} \right\}^{2} } } }}\,\forall y \in [1,m]$$
(10)

The normalized MCDM is of the order of \(n*m\) and is represented as \({{M}^{\prime }}_{mcdm}\left[x, y\right]\) where \({{m}^{\prime }}_{xy}\) denotes a single member of \({{M}^{\prime }}_{mcdm}\left[x, y\right]\).

4.4 Information entropy and weight computation for each single centrality measure

The concept of information entropy (IE) is applied for calibrating suitable weights to single centrality measure in MCDM. The second sub-section in step 2 of Fig. 1 presents the computation of entropy-based weight computation and integration of weighted centralities. First, the IE is computed for each single member in \({{M}^{\prime }}_{mcdm}[x,y]\) using the following equation:

$$iee_{{xy}} = \frac{{m^{\prime } _{{xy}} }}{{\mathop \sum \nolimits_{{x = 1}}^{{x = n}} \left( {m^{\prime } _{{xy}} } \right)}}\,\forall ~y \in \left[ {1,~m} \right]$$
(11)

where \({iee}_{xy}\) represents an element in \(IEE[x,y]\). Next, IE is computed for each single centrality measures using Eq. (12) given as

$$\begin{gathered} iec_{y} = - \frac{1}{\ln n}\mathop \sum \limits_{x = 1}^{x = n} \{ iee_{xy} *\ln (iee_{xy} )\} \hfill \\ \forall y \in \left[ {1, m} \right] \hfill \\ \end{gathered}$$
(12)

where \({iec}_{y}\) represents an element in \(IEC\left[x,y\right]\). Subsequently, Eq. (13) computes suitable weightage score for each centrality measure given as

$$\begin{array}{*{20}c} {w_{y} = \frac{{1 - iec_{y} }}{{\sum\limits_{{z = 1}}^{{z = m}} {\left( {1 - iec_{z} } \right)} }}} & {\forall y \in [1,m]} \\ \end{array}$$
(13)

Finally, \(WT\left[x,y\right]\) is computed by multiplying normalized elements in \({{M}^{\prime }}_{mcdm }\left[x, y\right]\) with their respective weights in \(W\left[x,y\right]\) using Eq. (14) given as

$$\begin{gathered} wt_{{xy}} = m^{\prime } _{{xy}} *~w_{y} \hfill \\ \forall ~x \in \left[ {1,~n} \right]~\,and~\,y \in \left[ {1,~m} \right] \hfill \\ \end{gathered}$$
(14)

where \(WT\left[x,y\right]\) represents weighted centrality matrix and \({wt}_{xy}\) indicates an element in \(WT\left[x,y\right]\).

4.5 Calculation of ideal positive and negative solutions

Generally, lower value is considered as positive ideal in cases such as price, however, in social or weblog networks, higher value indicates a positive ideal solution. Computation of ideal positive and ideal negative solutions, and measures for separation are sub-sections or part of the weighted integration using TOPSIS. Therefore, minimum, and maximum thresholds are computed for each single centrality measure given as

$$\begin{array}{*{20}c} {a_{y}^{ + } = MAX\left( {V_{y} } \right)} & {\forall y \in \left[ {1,m} \right]} \\ \end{array}$$
(15)

where \({V}_{y}\) represents the vector of \(WT\left[x,y\right]\) for \({y}^{th}\) centrality criterion and \({a}_{y}^{+}\) indicates a Single member of \({A}_{max}[x, y]\). Similarly, \({a}_{y}^{-}\) is computed using Eq. (16) given as

$${ a_{y}^{ - } = MIN\left( {V_{y} } \right)} \quad {\forall y \in \left[ {1,m} \right]}$$
(16)

4.6 Computation of measures for separation

The separation denotes the distance between alternative solutions, which is obtained through an n-dimensional vector representing distance. Hence, the alternative distance from ideal positive solution \({a}_{y}^{+}\) is computed using Eq. (17), which is as follows:

$$\begin{array}{*{20}c} {s_{x}^{ + } = \sqrt {\sum\limits_{{y = 1}}^{{y = m}} {\left( {wt_{{xy}} - a_{y}^{ + } } \right)^{2} } } } & {\forall x \in \left[ {1,n} \right]} \\ \end{array}$$
(17)

where \({s}_{x}^{+}\) represents a single member of \({S}_{max}\left[x,y\right]\) that is a vector of order \(n\). Likewise, the alternative distance from \({s}_{x}^{-}\) is computed as

$${s}_{i}^{-}=\sqrt{\sum_{j=1}^{j=m}{\left({wt}_{ij}-{a}_{j}^{-}\right)}^{2}}$$
(18)

where \({s}_{x}^{-}\) represents a member of the negative ideal solution matrix \({S}_{min}\left[i,j\right]\) that is a vector like \({S}_{max}\left[x,y\right]\).

4.7 Calculation of relative proximity to the ideal solution

In the final step of computing CSTUserRank, the study computes relative proximity or closeness to obtained ideal solution using Eq. (19) given as

$${csturank}_{x}=\frac{{S}_{x}^{-}}{{S}_{x}^{+}+{S}_{x}^{-}}$$
(19)

where \({csturank}_{x}\) represents the proposed ranking score of \(user-x\). In addition, \({csturank}_{x}\) represents a member of the matrix \(CSTUserRank[x, y]\) that consists of the ranking cores of all users in the network. Sorting \(CSTUserRank[x, y]\) is descending order identifies the top-k influential users. Computation of relative proximity to ideal solution is a process that is represented as last or 4th sub-section of step 2 in Fig. 1.

4.8 Proposed algorithm: CSTUserRank

Algorithm 1 presents the working of the proposed model. In the first step, the centrality matrix \({M}_{mcdm}[x,y]\) of the order of \(n*m\) is provided as input. The term \(n\) depicts number of rows (i.e., users) whereas \(m\) denotes number of columns (or centrality criteria). Subsequently, the normalization of the matrix \({M}_{mcdm}[x,y]\) is performed as represented in Eq. (10) (in line 1). Next, information entropy \({IEE[x,y]}_{n*m}\) for the normalized \({{M}^{\prime }}_{mcdm}[x, y]\) is presented using Eq. (11) (in line 2). Similarly, the information entropy \(IEE[x,y]\) for each centrality measures is represented as a vector using Eq. (12) (in line 3). Subsequently, suitable weights for each centrality measure, represented as \({IEC\left[x,y\right]}_{1*m}\), are computed using Eq. (13) (in line 4). Later, the weighted MCDM \(WT{\left[x,y\right]}_{n*m}\) is obtained using Eq. (14) (in line 5). Next, positive ideal \({A}_{max}\left[x,y\right]\) and negative ideal \({A}_{min}\left[x,y\right]\) solutions are computed using Eq. (15) and Eq. (16) (from line 6 to line 7). The terms \({A}_{max}\) and \({A}_{min}\) represent two matrices such that both are different and can be represented as a vector of size \(1*m\). Similarly, line 1 of algorithm 1 is a nested loop that executes for \(n*m\) times where \(n\) indicates total number of users and \(m\) represents number of centrality measures. Therefore, the complexity of code from line 1 to line 3 is \({O}_{(n*m)}\) as nested loop executes \((n*m)\) times irrespective of number of operations (Yin et al. 2020).

However, the code in line 4 executes under a single loop, therefore, the weight computation for each single centrality criterion takes approximately about Om as asymptotic time complexity. Similarly, line 1 of algorithm 1 is a nested loop that executes for \(n*m\) times where \(n\) represents the total number of users and \(m\) represents the number of centrality measures being used. Therefore, the complexity of code from line 1 to line 3 is \({O}_{(n*m)}\) as nested loop executes \((n*m)\) times irrespective of number of operations (Yin et al. 2020). However, the code in line 4 executes under a single loop, therefore, the weight computation for each single centrality criterion takes about \({O}_{(m)}\) asymptotic time complexity. Subsequently, the procedure in line 5 for computing weighted MCDM is under a nested loop, thus, line 5 takes about \({O}_{(n*m)}\) time in execution. On the other hand, the procedure for computing positive and negative ideal solutions in line 6–7 takes linear time as single loop takes \({O}_{(m)}\) time for completion. However, line 8–9 that represents the separation measures is under a nested loop, hence, complexity of execution is \({O}_{(n*m)}\).

On the other hand, the code in line 10 presents the steps involved in computation of relative closeness to ideal solution that eventually results in \(CSTUserRank\) for each user in the network takes linear time complexity i.e., \({O}_{(n)}\). Finally, the code in line 11 comprises two sub-sections; first, a user matrix consisting of user ids that is appended to \(CSTUserRank[x, y]\). Secondly, \(CSTUserRank[x, y]\) is sorted in descending order for identifying top-k influential users. Time complexity of the appending process is linear whereas sorting takes logarithmic time. Therefore, complexity of line 11 is \({O}_{(n{log}_{2}n)}\) (Kobbaey et al. 2022). In this study, the value of \(m\) is 8 because of eight different centrality and temporal-based rankings of a user. Thus, the the overall time complexity of the \(CSTUserRank\) is \({O}_{(n{log}_{2}n)}\) (Alshahrani et al. 2020) which is exactly equal to the baseline measures. However, the influence spread of the propsoed CTUserRank outperforms the baseline measures which is emprirically demonstrated in Sect. 5.2 from Fig. 7 to Fig. 10.

4.9 Dataset

MetaFilterFootnote 2 is a relatively old community-curated blog network that was established on July 14, 1999. Its prominent features include active moderation by paid staff and minimum fee from members for preventing trolling and other fraudulent activities. In this study, four different infodumps including Music, Meta, Mefi, and Askme are utilized which span MetaFilter and several of its subsites. The infodumps contain almost seventeen years of history of interaction between users, topics of interest, and the community evolution over time. Such infodumps are interesting, particularly, to the researchers of social networks as these.

Algoritham 1
figure a

Centrality-based temporal user rank (CSTUserRank)

infodumps are regularly updated.Footnote 3 On the other hand, a social network, characterized by connections via posts, comments, and replies, is typically considered directed. In this context, interactions between users are often unidirectional. For instance, when User A posts content and User B comments on it, the interaction is directed from User A to User B. Likewise, if User B replies to User A's comment, it establishes another directed connection from User B to User A. Thus, the nature of interactions in such a social network exhibits clear directionality, defining it as a directed network (Bayrakdar et al. 2020). Therefore, the graph networks generated for Music, Meta, Mefi, and Askme are directed networks.

Subsequently, Table 3 shows the detailed description of the four variants of MetaFilter datasets including Askme, Mefi, Meta, and Music, each offering a diverse array of user interactions. Askme features the highest user count at 47,122, with a substantial 31.6 million edges connecting these users. Over the span of 14 years (2003–2017), Askme accumulated a massive amount of content, with over 300,000 posts and nearly 4.3 million comments across 233 unique topics. Thereafter, Mefi, spanning from 1999 to 2008, boasts 23,742 users and a substantial 47.5 million edges, indicating extensive user interconnectedness. In contrast, the Meta dataset, spanning from 2000 to 2017, features a relatively smaller user base of 14,949 individuals, however it maintains an active community, with 47,520 posts and 1.26 million comments. Similarly, the Music dataset, spanning from 2006 to 2017, features the smallest user base consisting of 5,022 individuals, however it maintains an active community, with 20,672 posts and 121,218 comments. In addition, Table 3 indicates that Askme and Mefi datasets have a wider range of topics covered through user interactions as compared to Meta and Music datasets which suggests a more diverse set of discussions within Askme and Mefi communities.

Table 3 Detailed description of metafilter dataset

4.10 The measures of evaluation

The study involves multiple evaluation metrics for presenting a thorough comparison of the proposed CSTUserRank against existing centrality models. A brief description of each of the evaluation metric is given as.

4.10.1 Jaccard similarity (JS)

JS measures the level or percentage of similarity between two sets of data. However, JS index is sensitive to small-scale data sets that mostly results in missing observations. Assuming \({R}_{x}\) and \({R}_{y}\) represent two rankings of the data, JSI between \({R}_{i}\) and \({R}_{j}\) is computed as.

$$J\left( {R_{x} , R_{y} } \right) = \frac{{\left| {R_{x} \cap R_{y} } \right|}}{{\left| {R_{x} \cup R_{y} } \right|}}$$
(20)

Where the numerator and denominator of Eq. (20) represent the absolute values of intersection and union between \({R}_{x}\) and \({R}_{y}\). Higher percentage shows that most of the elements in \({R}_{x}\) and \({R}_{y}\) are similar.

4.10.2 Simulation based on SIR

In this study, we analyze the performance of CSTUserRank against existing centrality models using SIR simulation (Zhou et al. 2006). SIR is a popular and effective simulation technique that comprises three disjoint compartments including the susceptible state, the infected state, and the recovered state. In susceptible state, the node is likely to be infected, however, infected state comprises only the infected nodes. Finally, recovered state consists of recovered nodes that are no longer infected or susceptible (Yin et al. 2020). In SIR, the susceptible nodes have a tendency of being infected with a certain probability, say \(p\), by infected nodes. This probability \(p\) indicates the range of influenced or infected nodes in a social network. The transition between states in SIR in given as.

$$\frac{dS}{{dt}} = - \frac{I\left( t \right)}{n}*\beta *\tau *S\left( t \right), \frac{dI}{{dt}} = \left( {\frac{I\left( t \right)}{n}*\beta *\tau *S\left( t \right)} \right) - \left\{ {\gamma *I\left( t \right)} \right\}, \frac{dR}{{dt}} = \gamma *I\left( t \right)$$
(21)

Where \(I\left(t\right)\) and \(S\left(t\right)\) represent the infected state \(I\) and susceptible state \(S\) at time \(t\). Similarly, the terms \(dS\), \(dI\), and \(dR\) indicate the rate of change at a specific time \(t\) in \(S, I,\) and \(R\) respectively. The term \(t\) reflects the frequency of iterations whereas \(\frac{I\left(t\right)}{n}\) represents the current proportion or fraction of infected nodes. For instance, in a network of 12 nodes, 8 are susceptible and 4 are infected, therefore, 4/8 indicates that there is 50% chance of getting infected if a node randomly interacts with another node on any given time. Similarly, \(n\) depicts the population size and \(\beta\) indicates the act rate represents the interaction rate of a node during a particular time. Likewise, the term \(\tau\) denotes the transmission propensity that indicates the propensity of a node getting infected if that node is interacting with an infected node. In addition,\(S(t)\) shows that at a given time \(t\), how many nodes are in susceptible state. On the other hand, negative sign in Eq. (21) represents the decline of susceptible nodes over time. However, \(\gamma\) denotes the rate of recovery multiplied with current number of the infected nodes. Interestingly, the term \(\left(\frac{dR}{dt}\right)\) where \(n\) is a constant and sum of the change i.e., \(\left(\frac{dS}{dt}+\frac{dI}{dt}+\frac{dR}{dt}\right)\) is always zero.

In this study, the term \(\beta\), that controls the influence dissemination in the network, is calibrated to 0.1 which indicates that there is a 10% chance that a susceptible node will become infected upon contact with an infected node. However, the population size \(n\) depends on size of the dataset in terms of number of network nodes as mentioned in Table 3. Similarly, the recovery rate term, denoted by γ, has also been adjusted to 5, therefore, \(\frac{1}{\gamma }=0.2\) shows 20% probability of the population being recovered that means the 20% of the nodes in the dataset are immune and cannot be infected again. During the simulation process, 60 iterations have been performed for each centrality algorithm. If each iteration corresponds to a day, the diffusion rate of the top-5 nodes is plotted for each of the four datasets.

4.10.3 Frequency of the node with identical influence ranks

While attempting to identify significant or influential nodes within social networks, a centrality measure may assign identical ranks to multiple nodes. Resultantly, the nodes with the same ranks will be difficult to distinguish. Therefore, a ranking method is considered effective as compared to others if the frequency of nodes getting identical ranks is smaller. Therefore, frequency of nodes having identical ranks can be employed for assessing the performance of a centrality measure.

5 Results and discussion

In this study, we have analyzed the results using three standard evaluation measures including Jaccard similarity index, SIR-based simulation analysis, and frequency of nodes having the same rank. The effectiveness of CTUserRank is analyzed in light with some of the latest centrality algorithms such as combined centrality measure (Fei et al. 2017), Ex_HI (Zareie and Sheikhahmadi 2019),MaxCDegKatzd-hops and MinCDegKatzd-hops (Alshahrani et al. 2020). In addition, the implementations are performed on an HP Pavilion 15—CC123cl—8th Gen Ci5 Quadcore laptop equipped with 16 GB of RAM running Windows 10 Pro. The Python code was executed using the Jupyter Notebook editor (Fig  2).

5.1 Jaccard similarity (JS)

In JS-based analysis, CTUserRank is compared with existing centrality models using the Music dataset. Figure 3 illustrates a consistent upward trend in the number of top influential users. For example, considering the top-10 users, CSTUserRank demonstrates nearly 50% similarity with DC, around 10% with CC and BC, close to 60% with PR, approximately 50% with Katz, and CMB-C, MinCDegKatz d-hops, and Ex_HI. However, CSTUserRank exhibits a slightly higher similarity index, almost 60%, with MaxCDegKatz d-hops. This trend persists when considering the top-50 most influential users, with PR, Katz, and MaxCDegKatzd-hops displaying similar behavior, showing slightly over 70% similarity with CSTUserRank. Likewise, CMB-C, Ex_HI, and MinCDegKatzd-hops also exhibit similar levels of similarity, reaching almost 70% with CSTUserRank. Moreover, as we extend our analysis to the top-100 users, the rising trend continues, with PR, Katz, and MaxCDegKatzd-hops displaying similar JSI values to CSTUserRank. Similarly, CMB-C, Ex_HI, and MinCDegKatzd-hops also demonstrate similar JSI with CSTUserRank.

Fig. 3
figure 3

JI Analysis of CSTUserRank against the existing centrality measures using Music Dataset

Additionally, in Fig. 4, CSTUserRank exhibits higher Jaccard Similarity Index (JSI) with relatively recent centrality models such as CMB-C, Ex_HI, MinCDegKatzd-hops, and MaxCDegKatzd-hops, in contrast to classical network centrality models like PR and Katz. However, CC and BC display the lowest JSI with CSTUserRank across the top-10 to top-100 rankings compared to other centrality models.

Fig. 4
figure 4

JI Analysis of CSTUserRank against the existing centrality measures using MEFI Dataset

These differences can be attributed to the Music dataset's smaller size relative to the datasets presented in Table 4. Therefore, having a higher number of nodes that are closer to all other nodes in the network is not enough to achieve a higher JSI with CSTUserRank. Moreover, the Music dataset exhibits a lower count of bridge nodes in the network. In contrast, CSTUserRank encompasses diverse features, including network centralities, textual, and temporal aspects, thereby resulting in significant ranking differences. Similarly, in Fig. 5, CSTUserRank achieves the highest JSI with CMB-C, MaxCDegKatzd-hops, MinCDegKatzd-hops, and Ex_HI, respectively. Except for MaxCDegKatzd-hops, the other three centrality models demonstrate an almost 80% JSI, while MaxCDegKatzd-hops attains nearly 90% JSI. The underlying reason for this 80% and higher JSI with CSTUserRank lies in the fact that existing centrality measures, in comparison to classical centrality models, adopt a similar hybrid multi-criteria approach as CSTUserRank. However, the distinction lies in the incorporation of an appropriate weighting strategy and aggregation method, enhancing effectiveness in identifying influential nodes.

Table 4 Performance analysis of cstuserrank across diverse network structures
Fig. 5
figure 5

JI Analysis of CSTUserRank against the existing centrality measures using META Dataset

Similarly, in Fig. 5, CSTUserRank demonstrates a higher JSI with recent centrality models like CMB-C, while displaying relatively lower JSI with classical network centrality measures such as PR. Notably, JSI with CC and BC remains lower that is consistent with the observations from Figs. 3 and 4. Interestingly, JSI of CSTUserRank with DC surpasses that of the Meta or Music datasets. This significant hike can be attributed to Mefi being a relatively large dataset, resulting in a higher frequency of nodes with large degree centrality compared to Meta and Music datasets. However, while the total number of bridge nodes and nodes with shorter distances to all other nodes in the Mefi network are higher as compared to the Music dataset, the difference is not significant. Finally, Fig. 6 illustrates a high JSI with PR, CMB-C, Ex_HI, MaxCDegKatzd-hops, and MinCDegKatzd-hops. CSTUserRank exhibits a high JSI with DC, CC, and BC, particularly evident in the Askme dataset. The higher JSI level, exceeding 80%, can be attributed to Askme being the largest dataset with over 47 thousand nodes and 31.6 million edges, forming a strongly connected network. Consequently, Askme presents a higher probability of bridge nodes, nodes with shortest paths to all other nodes, and nodes with high degree centrality in the weblog network.

Fig. 6
figure 6

JI Analysis of CSTUserRank against the existing centrality measures using ASKME Dataset

5.2 SIR-based simulation analysis

A node's significance within a social network depends on its ability to exert a substantial influence or impact on other nodes. To evaluate this, the top-5 nodes are selected based on each existing centrality measure, and their influence propagation is simulated on all four datasets. Results spanning from Fig. 7 to Fig. 10 clearly demonstrate CSTUserRank's superior performance compared to both recent and classical centrality models on Music, Meta, Mefi, and Askme datasets. In Fig. 7, the performance of CSTUserRank is significantly higher as compared to DC, CC, and BC. The performance is higher as compared to PR and Katz and relatively higher as compared to CMB-C, Ex_HI, MaxCDegKatz d-hops, and MinCDegKatz d-hops. One of the underlying reasons is that classical centrality models such as DC consider only as single aspect, either local or global. However, recent centrality models such as Ex_HI enhances classical centrality measures by integrating other features. For instance, CMB-C considers three different centrality criterion and utilizes aggregation strategies for a final influence rank of a network node.

Fig. 7
figure 7

SIR Analysis of CSTUserRank against the existing measures using Music Dataset

However, CMB-C and other recent centrality measures lack the ability to incorporate textual and dynamic aspects of a social network. In addition, CSTUserRank considers multiple aspects of a network node including network as well as textual and temporal aspects. In addition, each of the features is assigned objective weights and later combined for final node influence rank. Similarly, the simulation analysis in Fig. 8 shows the comparison of CTUserRank with the existing centrality measures using Meta dataset. Figure 8 illustrates that CSTUserRank performs better than classical algorithms including CC, DC, BC, Katz, and PR. CTUserRank also outperforms recent centrality models such as CMB-C, Ex_HI, MinCDegKatzd-hops, and MaxCDegKatzd-hops.

Fig. 8
figure 8

SIR Analysis of CSTUserRank against the existing measures using Meta Dataset

However, the performance differences between CSTUserRank and relatively recent centrality measures is not significant. The rationale or underlying reason for demonstrating such behavior is that recent centrality algorithms consider the structure properties of the network, enhancements in seed set selection, for instance, Max and MinCDegKatz d-hops integrate the properties of DC and Katz for picking appropriate seed sets that maximize the influence propagation of a network node.

Subsequent analyses in Figs. 9 and 10 demonstrate CSTUserRank's consistently strong performance regarding influence dissemination on Mefi and Askme datasets, outperforming classical algorithms and, to a lesser extent, recent centrality measures.

Fig. 9
figure 9

SIR Analysis of CSTUserRank against the existing measures using Mefi Dataset

Fig. 10
figure 10

SIR Analysis of CSTUserRank against the existing measures using Askme Dataset

The proposed model shows significantly better performance than classical algorithms including CC, DC, BC, Katz, and PR. Though CSTUserRank outperforms recent centrality measures, however, difference in performance is not significant when compared with classical centrality algorithms. Overall, CSTUserRank significantly outperforms the classical centrality measures and is better as compared to recent centrality measures. Since CSTUserRank integrates seven different centrality models including DC and Katz that have been utilized in Ex_HI, MinCDegKatzd-hops, and MaxCDegKatzd-hops, therefore, simulation results of influence spread are quite closer to CSTUserRank. In addition, Jaccard similarity of MaxCDegKatzd-hops is highest with CSTUserRank.

Furthermore, the results show that incorporating textual and temporal features in addition to network centralities enhance the spreading capabilities of network nodes, therefore, effective in identifying top-k influential users. Moreover, CSTUserRank has a.

logarithmic time complexity, which is similar to that of most existing centrality measures, especially the more recent ones.

5.3 Spearman’s rank correlation analysis

The Spearman's rank correlation analysis on the Music dataset in between various network centrality algorithms in Fig. 11 provides interesting insights into their relationships. Comparing CTUserRank with DC, a weak positive correlation is observed across different top rankings ranging from top-10 to top-100 with correlation scores from 0.2 to 0.6. This is because CTUserRank combines DC with sentiment analysis and temporal analysis in addition to other centrality measures. Sentiment analysis is associated with text polarity of user posts and temporal analysis assigns higher importance to recent user activity whereas DC value of a node is based only on number of network connections. However, entropy assigns appropriate weights to number of connections as well as more recent activity and number of important connections when combining various centrality measures for.

Fig. 11
figure 11

Spearman’s Rank Correlation Analysis of CSTUserRank against the existing measures on MUSIC Dataset

unified ranking by CTUserRank. Similarly, CTUserRank exhibits weak to moderate positive correlations with CC and BC, with values ranging from − 0.2 to 0.3. This is because CTUserRank captures certain aspects of node importance similar to degree and betweenness centrality, its inclusion of additional centrality criterion results in varied rankings. Furthermore, in CTUserRank, entropy-based integration allocates weights to each centrality criterion according to its relative importance or priority. Consequently, higher weights are assigned to centrality criteria that contribute significantly to the decision-making process or have a greater impact on the decision's outcome. Subsequently, CTUserRank demonstrates a strong positive correlation with PR, indicating consistent agreement between these measures across different top rankings. At top-100, CTUserRank shows a strong correlation with PR reaching approximately 0.8. Conversely, CTUserRank shows a moderate to strong negative correlation with Katz Centrality (Katz), with values ranging from − 0.4 to − 0.76. A strong negative correlation indicates that nodes ranked higher by CTUserRank tend to be ranked lower by Katz centrality. This phenomenon arises from the differing principles underlying these centrality measures. PageRank, closeness centrality, and betweenness centrality prioritize nodes based on their connections to other.

important nodes or their proximity to all other nodes in the network. In contrast, Katz centrality considers not only a node and its direct neighbors but also their extended network, therefore offering a broader perspective on node importance beyond immediate neighbors. Notably, CTUserRank demonstrates moderate to strong positive correlations extend h-index (Ex_HI) and combined centrality measures (CMB-C), indicating similarity in rankings (0.3–0.72). Furthermore, CTUserRank shows a moderate to strong positive correlation with both MinCDegKatz and MaxCDegKatz ranging from 0.35 to 0.77.

Subsequently, Spearman's rank correlation analysis on Mefi dataset in Fig. 12 shows a weak negative correlation between CTUserRank and DC ranging from − 0.1 to 0.35 across different top rankings. Similarly, CTUserRank exhibits a weak to moderate negative correlation with CC and BC (values ranging from − 0.3 to 0.28). This is because Mefi is a relatively large dataset with many influential users with large number of followers and receive significant engagement on their posts. However, Mefi has limited interactions between different communities within the network. Resultantly, despite having many influential users having a high Page Rank, Mefi has few bridges nodes or intermediaries connecting different clusters.

Fig. 12
figure 12

Spearman’s Rank Correlation Analysis of CSTUserRank against the existing measures on MEFI Dataset

Additionally, in such a decentralized nature of the network, users may have high closeness centrality scores within their respective communities but limited influence beyond them. However, CTUserRank demonstrates a strong positive correlation with PR (0.42–0.84). Conversely, CTUserRank shows a moderate to strong negative correlation with Katz Centrality (− 0.4 to − 0.77). Notably, CTUserRank demonstrates weak to moderate positive correlations with Ex_HI and CMB-C ranging from − 0.1 to 0.63.

Furthermore, CTUserRank shows moderate to strong positive correlations with both MinCDegKatz and MaxCDegKatz (0.33–0.75). From Fig. 12 to Fig. 15, it can be observed that as the top users increase from top-10 to top-100, the positive/negative correlation rises from weak to moderate or from moderate to strong. Next, Spearman's rank correlation analysis on the Meta dataset in Fig. 13 yields a weak to moderate positive correlation between CTUserRank and DC ranging from 0.05 to 0.29. Similarly, CTUserRank exhibits a moderate positive correlation with CC and BC (0.1–0.4). The differences are because of inclusion of additional centrality criterion in CTUserRank which result in varied rankings. Moreover, CTUserRank demonstrates a moderate to strong positive correlation with PR (0.31–0.63). Conversely, CTUserRank shows a moderate to strong positive correlation with Katz Centrality (0.27–0.6). Additionally, CTUserRank exhibits a moderate to strong positive correlation with Ex_HI and CMB-C (0.34–0.63). Furthermore, CTUserRank demonstrates a moderate to strong positive correlation with both MinCDegKatz and MaxCDegKatz ranging from 0.35 to 0.8.

Fig. 13
figure 13

Spearman’s Rank Correlation Analysis of CSTUserRank against the existing measures on META Dataset

Finally, the Spearman's rank correlation analysis on the Askme dataset in the Fig. 14 demonstrates a weak to moderate positive correlation between CTUserRank and DC ranging from 0.05 to 0.47. Similarly, CTUserRank exhibits a moderate to strong positive correlation with CC and BC (0.1–0.64). Moreover, CTUserRank demonstrates a moderate to strong positive correlation with PR (0.17–0.78). Conversely, CTUserRank shows a moderate to strong positive correlation with Katz Centrality (0.14–0.65). Additionally, CTUserRank exhibits a moderate to strong positive correlation with Ex_HI and CMB-C (0.13–0.72). Furthermore, CTUserRank demonstrates a moderate to strong positive correlation with both MinCDegKatz and MaxCDegKatz (0.15–0.84). These findings provide insights into the effectiveness of CTUserRank in ranking nodes within the Askme dataset compared to other centrality algorithms. Interestingly, CTUserRank shows moderate to strong correlation with most of the centrality measures by taking top-10 to top-100 influential users. This is because Askme is the largest and well-connected dataset with many bridge nodes and nodes that are close to a significant number of other nodes in the network. In addition, in the Askme dataset, nodes are interconnected such that some nodes are linked to important or influential nodes. Subsequently, these influential nodes are further connected to other important nodes within the network.

Fig. 14
figure 14

Spearman’s Rank Correlation Analysis of CSTUserRank against the existing measures on ASKME Dataset

5.4 Frequency of the node with same ranks

The percentage of frequency of users with identical ranks assigned by the centrality measure has been presented in Fig. 15. Figure displays the percentage of users with identical ranks assigned by the centrality measure. This percentage is computed by dividing the number of nodes with the same ranks by the total number of nodes in the network. Figure 15 demonstrates that across all four datasets, CSTUserRank assigns identical ranks to a significantly lower number of users compared to existing centrality measures. This percentage falls below 10%, underscoring the superiority of the proposed model over existing methods. Notably, on all four datasets, DC exhibits the worst performance in terms of the percentage of users with the same ranks. For instance, on the Music dataset, CSTUserRank assigns identical ranks to only 2.45% of network nodes, significantly better as compared to 74.28% by DC. Recent centrality algorithms like Ex_HI assign identical ranks to nearly 30% of network nodes, while the top-performing hybrid algorithm, MaxCDegKatzd-hops, assigns identical ranks to over 15% of network nodes. Similarly, on large datasets like Mefi, CSTUserRank assigns the identical ranking score to just 7.57% of the network population, demonstrating considerable performing as compared to DC (89.78%), CC (78.69%), CMB-C (39.91%), and MaxCDegKatzd-hops (26.13%). These findings underscore that CSTUserRank demonstrates a remarkable improvement over existing single and hybrid centrality measures.

Fig. 15
figure 15

Frequency of nodes with same ranking score by centrality algorithms (in percentage)

In short, the comparative analysis of CSTUserRank across multiple datasets, including Askme, Mefi, Meta, and Music, provides valuable insights into its performance across diverse social network structures. Through evaluations metrics such as Jaccard Similarity, SIR-based simulation analysis, Spearman's rank correlation, and frequency of assigning identical ranks, distinctive patterns emerge. CSTUserRank exhibits consistently high Jaccard similarity with recent centrality models, indicating alignment with their underlying ranking philosophy. However, exceptions in Askme dataset, the dataset with the highest density, suggest the influence of network structure on model performance. Similarly, SIR-based simulations confirm the superiority of CSTUserRank in identifying influential users, although the advantage is less pronounced in Mefi, a decentralized network with limited interaction between communities. Subsequently, Spearman's rank correlation analysis clarifies that CSTUserRank strongly correlates positively with PageRank and recent models across all datasets. However, exceptions include negative correlations with Katz centrality, reflecting the fundamental differences in how these models measure node importance. Finally, CSTUserRank consistently assigns unique ranks to a significantly higher proportion of users across all datasets which demonstrates its strength in differentiating user influence by incorporating textual and temporal features alongside network centralities.

Overall, the comparative analysis across diverse social network structures demonstrates CSTUserRank's consistent superiority in identifying the influential users. The results underscore the importance of incorporating a rich set of features, including network structure, textual content, and temporal dynamics, for a comprehensive analysis of user influence. By considering not just connections, but also user activity and the user generated content, CSTUserRank provides a more nuanced understanding of influence within social networks.

6 Conclusion

In this study, we introduce CSTUserRank, a centrality approach that leverages a multi-criteria decision matrix incorporating network centralities, textual analysis, and a novel temporal feature. We adopt an effective method for assigning appropriate weights to each centrality measure based on entropy, which are then integrated to determine the final ranking of nodes within the network using TOPSIS aggregation. Our evaluation across four real-world weblog networks demonstrates CSTUserRank's superior performance over both classical and recent hybrid centrality algorithms, exhibiting enhanced influence spreading capabilities, strong similarity and rank correlation with classical and hybrid models, and a reduced frequency of nodes sharing identical influence ranks. In addition, the inclusion of textual and temporal features in the proposed model enhances the process of ranking influential nodes. The novel temporal feature, which assigns higher weightage to recent time intervals in social interactions, sets CSTUserRank apart from other multi-criteria hybrid models that predominantly consider network or graph-based aspects of network nodes. The findings of this study hold significance for ongoing research in complex social network analysis, the prevention of virus spread in biological networks, and the identification of important nodes for grid stations, aiding in the prevention of power failures.

Future avenues of exploration may include investigating alternative feature weighting schemes to further enhance the accuracy of.

CSTUserRank. Additionally, extending the evaluation to encompass other datasets and considering the application of the proposed approach to weighted networks would contribute valuable insights.