Mathematics 11 03541

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

mathematics

Article
Link Prediction for Temporal Heterogeneous Networks Based
on the Information Lifecycle
Jiaping Cao , Jichao Li * and Jiang Jiang

College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
* Correspondence: lijichao09@nudt.edu.cn

Abstract: Link prediction for temporal heterogeneous networks is an important task in the field
of network science, and it has a wide range of real-world applications. Traditional link prediction
methods are mainly based on static homogeneous networks, which do not distinguish between
different types of nodes in the real world and do not account for network structure evolution over
time. To address these issues, in this paper, we study the link prediction problem in temporal
heterogeneous networks and propose a link prediction method for temporal heterogeneous networks
(LP-THN) based on the information lifecycle, which is an end-to-end encoder–decoder structure. The
information lifecycle accounts for the active, decay and stable states of edges. Specifically, we first
introduce the meta-path augmented residual information matrix to preserve the structure evolution
mechanism and semantics in HINs, using it as input to the encoder to obtain a low-dimensional
embedding representation of the nodes. Finally, the link prediction problem is considered a binary
classification problem, and the decoder is utilized for link prediction. Our prediction process accounts
for both network structure and semantic changes using meta-path augmented residual information
matrix perturbations. Our experiments demonstrate that LP-THN outperforms other baselines in
both prediction effectiveness and prediction efficiency.

Keywords: temporal heterogeneous networks; link prediction; information lifecycle; meta-path

MSC: 68T07

Citation: Cao , J.; Li, J.; Jiang, J.


Link Prediction for Temporal
Heterogeneous Networks Based on
1. Introduction
the Information Lifecycle.
Mathematics 2023, 11, 3541.
Network science is an interdisciplinary field in which network structure and network
https://doi.org/10.3390/ behaviour are studied. Studying the topology, information propagation and dynamic
math11163541 evolution of networks in depth not only reveals the intrinsic laws and characteristics
of networks but also helps us to understand and explain the complex interactions and
Academic Editor: Michele Bellingeri
phenomena of real-world networks. Currently, digital intelligence technology is rapidly
Received: 29 July 2023 developing, and network science is being studied in further depth. From social media
Revised: 13 August 2023 platforms to the internet and from neural networks in organisms to traffic networks, the
Accepted: 15 August 2023 interactions contained in networks are becoming increasingly complex and variable, and
Published: 16 August 2023 capturing all the interactions existing in complex networks is difficult with the existing
technology. Incomplete information is therefore a major challenge in current network
science research.
Link prediction provides more comprehensive knowledge of the network structure
Copyright: © 2023 by the authors. by predicting potential or future edges in the network. For example, predicting potential
Licensee MDPI, Basel, Switzerland.
relationships in drug–target networks can accelerate the discovery and development of
This article is an open access article
new drugs [1]. Predicting relationships in protein interaction networks can help researchers
distributed under the terms and
deeply understand disease development mechanisms, facilitating effective diagnosis and
conditions of the Creative Commons
treatment [2]. In addition, the structure of networks in the real world is constantly changing
Attribution (CC BY) license (https://
over time, as reflected in the increase or decrease in the number of nodes and edges in the
creativecommons.org/licenses/by/
4.0/).

Mathematics 2023, 11, 3541. https://doi.org/10.3390/math11163541 https://www.mdpi.com/journal/mathematics


Mathematics 2023, 11, 3541 2 of 17

networks. Friend recommendations in social networks [3], co-authorship recommenda-


tions in academic collaboration networks [4] and item recommendations in e-commerce
networks [5] are all examples of link prediction in the real world.
However, most of the existing link prediction studies have focused on static homoge-
neous networks. Homogeneous networks treat all nodes as the same type and all edges
as the same type, but in the real world, different types of nodes play different roles in the
network. For example, the academic cooperation network includes many types of entities,
such as authors, papers, conferences, etc. If these entities are modelled as the same type of
nodes, and the relationships between different types of entities are modelled as the same
type of relationship, the network will lose its practical research value; furthermore, if only
the co-authorship relationship between authors is modelled, the research results will ignore
the impact of collaborative papers and participation in conferences on the collaboration
between authors, and the results will not be generalizable. Static networks do not account
for the appearance and disappearance of nodes and edges in the network over time, so they
cannot portray the evolution of network structure over time. For example, over time, new
user accounts will be registered, old user accounts will be cancelled in social platforms, and
new links will be created between users. If the time factor is ignored, it will not be possible
to differentiate the evolution of the increase or decrease in users or the increase or decrease
in relationships in the platforms mentioned above over time.
To address these issues, this paper proposes a link prediction method for temporal
heterogeneous networks (LP-THN) based on an encoder–decoder structure. LP-THN
absorbs an encoder–decoder structure that can learn the topological features of the network
while dealing with the non-linear features and sparsity of the network. This method uses
an encoder to learn the rich semantic information contained in temporal heterogeneous
networks and network structure changes over time, generates low-dimensional embedding
representations of the nodes based on the global structure, and uses a decoder to perform
link prediction. The main contributions of this paper are as follows:
• We propose a link prediction framework based on an encoder–decoder for tempo-
ral heterogeneous networks, which can achieve more accurate results in practical
applications than other frameworks.
• We propose an augmented residual information matrix considering meta-paths that
incorporates the law of decreasing information lifecycle into meta-paths. This matrix
is used as an input to the encoder to enhance the effective extraction of semantic and
structural information.
• Through numerous experiments, we verify that our method is superior to many
existing methods in terms of effectiveness.
The remainder of the article is organized as follows. Section 2 summarizes the existing
studies and presents the weaknesses of them. Section 3 introduces the notation used in the
paper and provides relevant definitions. Section 4 presents the LP-THN model and details
this model. Section 5 presents the experiments and analyses of the experimental results.
Section 6 concludes the paper.

2. Related Work
Practical solutions to link prediction problems have become a popular research field
for many scholars in network science. Linyuan Lv summarizes the existing heuristic link
prediction methods and classifies them as local information-based heuristics, such as AA [6]
and RA [7]; path-based heuristics, such as Katz [8] and LHN-II [9]; and random-walk-based
heuristics, such as Cos+ [10] and SimRank [11]. The above heuristics are mostly from
the perspective of network structure and have high interpretability, but they have low
prediction accuracy and are not applicable to large-scale networks. With the rapid develop-
ment of artificial intelligence technology, link prediction methods represented by network
embedding techniques have received wide attention. DeepWalk [12] is a classical shallow
neural network embedding method that obtains node sequences by random walks and
inputs those node sequences into Word2vec to learn the low-dimensional embedding repre-
Mathematics 2023, 11, 3541 3 of 17

sentations of the nodes. Node2Vec [13] is based on DeepWalk and defines a biased random
walk that combines both breadth-first and depth-first approaches to obtain node sequences.
M-NMF [14] preserves the topology and the community structure of the network and com-
bines the two through a nonnegative matrix decomposition to obtain node low-dimensional
embedding representations. Matrix decomposition and shallow neural network embedding
methods have high computational complexity when dealing with large-scale networks.
LINE [15], in contrast, maximizes the conditional probability and co-occurrence probability
of node neighbours to learn the low-dimensional embedding representations of nodes,
allowing it to efficiently handle large-scale networks. The above methods mainly focus
on the local structural information in the network, ignoring the rich node attributes as
well as the global structural information of the network. In recent years, many scholars
have proposed methods based on deep neural networks that can combine the rich attribute
information of nodes to learn their low-dimensional embedding representations, e.g., graph
neural networks (GNNs) [16], GraphSAGE [17], graph convolutional neural networks
(GCNs) [18] and graph attention networks (GATs) [19]. However, the above methods are
performed on homogeneous networks that do not account for the rich semantic information
contained in different nodes and edge types in heterogeneous networks.
To address the above problems, the link prediction problem in heterogeneous networks
has received extensive attention [20]. Many scholars extract important semantic information
in the network based on meta-paths and embed this information into neural networks
to obtain low-dimensional embedding representations of nodes containing rich semantic
information. Node representation methods on heterogeneous networks can be classified
into three categories: shallow neural networks, such as metapath2vec [21] and HIN2Vec [22],
matrix decomposition methods, such as HNE [23], and deep neural networks, such as
HetGNN [24] and HERec [25]. However, these methods are based on link prediction for
static networks and do not account for the evolution of the network structure over time.
Numerous link prediction methods for temporal networks have been proposed. TLPSS [26]
accounts for the existence of higher-order structural information in the network through
a simplicial complex but does not consider the heterogeneity of nodes and edges. Lasso
regression and random forest [27] find that the connection of a node pair depends on the
connection of many node pairs in the past over a period, explaining the structure of the
model. DynamicTriad [28] learns low-dimensional embedding representations of nodes
through triadic closure. In this method, the dynamics refer to adding edges or changing the
weight of an edge that represents closeness between two nodes, and the number of nodes
does not change over time. The dynamics in CTDN [29] refer to continuous time variation.
This method considers the temporal attributes of the edges to obtain a sequence of nodes
by random walk. DyLink2Vec [30] directly obtains the embedding of the edges to perform
link prediction. TempNodeEmbed [31] splices the same nodes in different time slices by
computing orthogonal bases between node matrices in different time slices. However,
the above link prediction methods for temporal networks are based on homogeneous
networks and do not account for the rich semantic information contained in heterogeneous
networks. DHNE [32] performs a random walk by constructing a historical–current graph
and combining it with meta-paths, but constructing the historical–current graph requires
overly high computational complexity. DyHNE [33] captures network structures and
semantics by retaining first-order and second-order neighbours based on meta-paths, but
this neighbourhood structure cannot capture global structural information.

3. Notations and Definitions


There are different types of nodes or edges that appear or disappear over time in a
temporal heterogeneous network. In this paper, temporal heterogeneous networks are
defined as follows:

Definition 1. Temporal Heterogeneous Network. A temporal  heterogeneous network can be repre-


sented by a sequence of time slices G1 , G2 , . . . , G T . G t = V t , Et , φ, ϕ denotes a snapshot of

Mathematics 2023, 11, 3541 4 of 17

the heterogeneous network under the time slice t, where V t denotes the set of nodes of the heteroge-
neous network under the time slice t, and Et denotes the set of connected edges of the heterogeneous
network under the time slice t. In addition, φ and ϕ are two mapping equations, φ: V t → A,
ϕ: Et → R, where A and R denote the sets of types of nodes and edges, respectively. For a
heterogeneous network, | A| + | R| > 2.

Meta-paths are generated by sequences of nodes or edges of different types, which


can contain rich semantic and local structural information. Meta-paths on temporal hetero-
geneous networks are generally defined as follows:

Definition 2. Meta-path. Denoted byM, a meta-path is considered to be a sequence of nodes and


r1 2r r3 r l −1
edges, a1 −→ a2 −
→ a3 −→ · · · −−→ al , where ri ∈ R, ai ∈ A, and A and R denote the sets of
types of nodes and connecting edges, respectively. Clearly, meta-paths can describe the complex
relationships contained between node types a1 and a2 , and a sequence satisfying the type of meta-path
M is an instance of meta-path M. An academic collaboration network consisting of 8 nodes is shown
in Figure 1A. This network contains three nodes of author type, three nodes of paper type and two
nodes of conference type. The meta-path a3 → p2 → c2 → p1 → a1 is defined as an instance of
the meta-pathAPCPA, which contains semantic information that can be expressed as two authors
writing articles published in the same conference.

A Author Paper Conference B


write

a1 p1
Author A Author B Author A Author C
c1
a2 p2 Co-authorship Co-authorship
(2000–2005 year) (2005 year)
c2
a3 p3

Figure 1. Example of (A) an academic co-authorship network and (B) co-authorship instances.

Meta-path-based first-order
 similarity denotes the instance of meta-path M between
connected node pairs vi , v j , which can measure the local structural similarity between
nodes in a heterogeneous network. Second-order similarity based on meta-paths denotes
the instance of meta-path M between a neighbour node N (vi ) M of node vi and a neigh-
M
bour node N v j of node v j , where a neighbour node N (vi ) M of node vi contains nodes
connected to node vi based on meta-path M.
The adjacency matrix describes only the first-order adjacencies present in the network,
and the meta-paths alone cannot characterize the network structure over time, so the
residual information on the edges in the network is fused with the meta-paths. The meta-
path augmented residual information matrix is proposed to portray the temporal structural
information in the network:

Definition 3. Meta-path M augmented


h residual
i information matrix. The meta-path M augmented
residual information matrix W M = wijM , where wijM = nASF M denotes the residual information
contained in the specified meta-path instance, is calculated as shown in Equation (1).

nASF M = nASF(n ,n ) · nASF(n ,n ) · · · · · nASF(n p ,nq ) (1)


i j j k

where M = ni n j nk . . . n p nq , nASF(n ,n ) denotes the amount of residual information contained in


i j

the edges of the node pair ni , n j .
Mathematics 2023, 11, 3541 5 of 17

In temporal network link prediction, the network structure continuously evolves over
time, and information about the historical structural changes in the network is used to
predict the edges in the future network. Temporal network link prediction is typically
defined as follows:

Definition 4. Temporal network link prediction. Temporal network  link prediction means predict-
ing the set of edges in the network at time T + 1, i.e., E T +1 = et (u, v)|u, v ∈ V t , t = T + 1 ,
based on the network at the given time slice [1, T ]. Given the time t, s ∈ T, t < s, if the nodes
and edges in G t do not disappear during the period [t, s], then the nodes and edges in G t are all
represented in G s , and the information on G t is said to be the history of G s , as shown in Figure 2.
Since temporal network link prediction focuses more on the characteristics of network structure
evolution over time and ignores the heterogeneity of node and edge types in the network, this
paper only gives a more general definition. For the notion of temporal heterogeneous network link
prediction, it is sufficient to ensure that the number of node or edge types in this definition is not 1.

‧‧‧‧ ‧‧‧‧ ‧‧‧‧

1 2 t s T T +1 time

Figure 2. Schematic diagram of temporal link prediction.

In this paper, link prediction is defined as an end-to-end encoder–decoder process,


where a stochastic gradient is generated in each round of training, the prediction error is
computed through a loss equation, and the parameters of the model are updated in each
round. The encoder primarily obtains low-dimensional embedding representations of the
nodes, which are defined as follows:

Definition 5. Embedding Representation. An embedding representation is the mapping of a node


v ∈ V in a graph into the same low-dimensional vector space, i.e., f (v) = zv , through the equation
f (), where zv ∈ Rd , and d denotes the dimension of the vector z. The embedding representation is
the encoder LP-THN.

4. The LP-THN Model


In this section, a series of time slices is aggregated to form a network containing the
age and number of occurrences of each edge. Then, the semantic and structural time-
varying information in this network is extracted based on meta-paths and an adjusted
sigmoid function [26], which are used as inputs to the encoder to obtain low-dimensional
embedding representations of nodes containing semantic and temporal information. Finally,
the learned node embedding representations are used as input for the decoder.

4.1. Basic Idea


The problem investigated in this paper is whether edges on a snapshot of time slice
tn+1 based on information on a series of snapshots of the network at time slice [t1 , tn ] can
be predicted. The example of an academic collaboration network is taken in Figure 3.
First, the network snapshots at time [t1 , tn ] are aggregated into a single-layer network
G (V, E, φ, ϕ), where V denotes the set of nodes in G, and E denotes the set of edges in
the aggregated network. The aggregation process is described as follows: If a node vi
or a connecting edge eij appears in the network snapshots of any historical time slice tz
before the time slice tn+1 , the node or the edge will always exist in the network snapshots
of the subsequent time slices, where vi ∈ V t , eij ∈ Et , t = tz , tz+1 , . . . , tn . If a node vi or
an edge eij that originally appeared from the time slice tz−m disappears at the moment tz ,
the disappeared node vi or the edge eij will not be present in the network snapshots of
this time slice and the subsequent time slices, where vi ∈ V t1 , vi ∈ / V t2 , eij ∈ Et1 , eij ∈/ E t2 ,
Mathematics 2023, 11, 3541 6 of 17

t1 = tz−m , . . . , tz−1 , t2 = tz , tz+1 , . . . , tn . The weight of edge eij in the aggregated network
is the residual information nASFeij , which portrays the lifecycle of the information, and
the residual information decreases from the beginning of the generation of edge eij to the
current moment. This residual information does not decrease linearly but undergoes the
process of first gently decreasing, then dramatically decreasing, and finally tending towards
gently maintaining. This formula is as shown in Equation (2):
1
 1+exp{ aij /p − a}
+q
nASFeij aij , tij = tij · (2)
q+1
where aij denotes the time span of the time slice tz to tn in which the edge was generated,
and aij = n − z + 1. In particular, an edge eij generated at time ts is aij = n − z + 1. If
the edge reappears at time tz (s < z), tij is the number of times an edge eij occurs. This
parameter is set to distinguish between the two cases shown in Figure 1B. When author A
and author B cooperate on five consecutive time slices [t1 , t5 ] and author A and author C
first cooperate at time slice t5 , the value of the parameter a between author A and author
C as well as between author A and author B is 1. Therefore, if only the parameter aij is
considered, the two cases shown in Figure 1B cannot be differentiated between, while the
parameter tij can differentiate between these two cases.
After obtaining the aggregated network G, the meta-path instances are extracted
according to the determined meta-paths. Taking the academic collaboration network in
Figure 1A as an example, the instances of meta-path APA and meta-path APCPA are
extracted. The meta-path APA indicates that the two authors coauthored a paper, such as
a1 p1 a2 in Figure 1A, and the meta-path APCPA indicates that the paper authored by the
two authors was published in the same conference, such as a1 p1 c2 p2 a3 in Figure 1A. The
meta-path APA and meta-path APCPA augmented residual information matrices W APA
and W APCPA are calculated by Equations (1) and (2). Inputting W APA and W APCPA into the
modified self-attention model, which can be used to calculate the correlation between any
two nodes in the graph, yields a relationship-enhanced meta-path M augmented residual
information matrix W d M. W\ APA and W\ APCPA are input into the modified GCN model.

Guided by the modified self-attention model, modified GCN can change the degree of
aggregation between different nodes to obtain a meta-path augmented residual information
matrix H M after aggregating the local information based on the meta-path.
Next, the information in H APA and H APCPA is aggregated through the attention mecha-
nism. The adaptive weights w M1 and w M2 are set, the initial values are randomly generated,
and the augmented residual information matrix H is calculated according to Equation (3)
to obtain each node embedding representation. The encoder of LP-THN is as mentioned
above. The decoder inputs the augmented residual information matrix H into the modified
GAT model to obtain the augmented residual information matrix fusing semantic and
temporal information. Finally, a multilayer perceptron is used for link prediction, and the
GAT model can account for the variability of the influence of different neighbourhood
nodes on the core node.

H = w M1 H M1 + w M2 H M2 + · · · + w Mn H Mn (3)
Mathematics 2023, 11, 3541 7 of 17

Meta-path
Network Aggregation
Selection
Meta-path Augmented Information Matrix Encoder Decoder

Temporal Heterogeneous Network APA APA

timestep = 1
Xtimes W APA W APA
Modified
timestep = 2 APA self-attention
APA
= ASF
timestep = n Modified H APA Modified
GAT
= ASF APCPA GCN
age, times wM1
wM 2
output
Aggregated Network
Modified
GCN
H APCPA H
APCPA Modified
Xtimes self-attention

APCPA APCPA W APCPA W APCPA

Figure 3. Diagram of the proposed LP-THN model. This model contains network aggregation,
meta-path selection, meta-path augmented information matrix, encoder and decoder steps. In the
first step, the temporal heterogeneous network is aggregated into a single-layer aggregated network,
which sets the age and times as the weights of edges. In the second step, meta-paths are selected
according to the semantics within the temporal dataset. In the third step, the meta-path augmented
information matrix W M is calculated, preserving structure evolution and semantics. In the fourth
step, an encoder is used to obtain the low-dimensional embedding representation of nodes. Finally, a
decoder is applied to perform link prediction.

4.2. Encoder
4.2.1. Modified Self-Attention Mechanism
Self-attention mechanisms are widely used in the field of natural language processing,
which exploits its ability to capture the internal correlation of data and applies it to graphs
to convert the influence of other nodes in the graph on the core node into the degree of
attention of the core node on other nodes [34]. The self-attention mechanism based on
meta-paths and temporal structural features is implemented in three main steps. In the
first step, the query, key and value of each node are calculated based on the meta-path M,
as shown in Equations (4)–(6); in the second step, the magnitude of the relevance of each
node to the current node is calculated and normalized, and the result is used as the weight
of each node to the core node, as shown in Equation (7); and in the third step, the value
of each node is weighted to the core node by summation to obtain the representation of
each node considering the global node to core node importance information, as shown in
Equation (7).

K M = WKM · W M (4)

Q M = WQM · W M (5)

V M = WVM · W M (6)
where the meta-path M augmented information matrix W M ∈ Rn×n and n = VRi , VRi
denotes the set of nodes in the aggregation network G with node type Ri . WKM ∈ Rd×n ,
WQM ∈ Rd×n and WVM ∈ Rd×n are generated by random initialization, and d is the node
embedding representation dimension. To improve the computational speed, matrix opera-
tions are used. K M ∈ Rd×n , Q M ∈ Rd×n and V M ∈ Rd×n denote the query, key and value
matrices of graph G based on meta-path M.
T
!
QM · KM
M
W = so f tmax
d √ VM (7)
d
Mathematics 2023, 11, 3541 8 of 17

T
where Q M · K M = scoreij , scoreij denotes node ni ’s attention to node n j , d is the dimen-
 

sion of the node’s embedding representation, and the division by d stabilizes the gradient.
To calculate the degree of influence of each node on the core node, the activation function
with so f tmax (·) is used for the normalization process, and the result obtained from the
normalization process is used as the weight of the node on the core node’s value.
The meta-path M augmented information matrix in this paper is based on the meta-
path set to reconstruct the network structure, which accounts for the rich semantic informa-
tion in the heterogeneous network. Simultaneously, the elements in the matrix account for
the rich temporal information contained in each edge. Therefore, the matrix is used as an
input to the self-attention mechanism that considers the amount of information. In contrast
to the original self-attention mechanism, the modified self-attention model not only enables
the learned results to consider the magnitude of the relevance of different nodes to that
node but also accounts for the temporal attributes of the dynamic heterogeneous graph.
Since the relationships between nodes are generated based on defined meta-paths, this
self-attention mechanism can enhance the main structural features in the heterogeneous
graph and weaken the unimportant structural features, ensuring that the learned node
embedding representations contain specific semantic information.

4.2.2. Modified GCN


As opposed to a GNN, a GCN is essentially a message passing process that can
aggregate information about a node’s neighbours based on their importance to the core
node and update the node features [18]. The graph convolutional neural network based
on meta-path and temporal attributes is implemented in two steps. In the first step, node
neighbourhood information is aggregated, and in the second step, the node feature is
updated as shown in Equation (8).
 
X ( l ) = σ D −1 A
d M X ( l −1) W ( l −1) (8)

where X (0) = WdM, W


d M denotes the relationship between the enhanced meta-path M
M = A M + I M , where A b ∈ Rn×n ,
  M  
augmented information matrix; A
d d M = b
aij , A = aij , A
A M ∈ Rn×n , n = V , and A
Ri
d M is based on the meta-path M aggregated graph G of the

adjacency matrix. If node ni and node n j based on meta-path M have links, then aij = 1;
otherwise, aij = 0. Because the node itself does not have a meta-path based on the formation
of the meta-path M, aii = 0, and the node’s own information cannot be aggregated when the
node feature is updated. To avoid this situation, the node’s neighbours can be considered
using unit matrices I ∈ Rn×n so that b aii = 1. Therefore, the node feature update can
consider its own features. D = [dii ], dii is the degree of node ni , and the feature vector of
neighbour nodes of node ni is the weighted average because low degrees of nodes on the
neighbours are assumed to have a greater impact; and σ () represents an arbitrary activation
function. In this paper, ReLU is selected as the activation function and l represents the
number of layers of modified GCN. If a layer of the modified GCN is set up, it will lead
to the node features of the neural network, causing each node feature update to consider
only the features of its neighbouring nodes. If too many of its layers are set, the sense of
the field will be too large for each node feature update. The lack of joints will affect the
node, resulting in a decline in the effect of the last layer of l N generated by X (l N ) = H M ,
where H M is the meta-path augmented information matrix after aggregation of the local
information based on the meta-path, and W (l −1) is used for the random initialization of the
generated weights matrix.
The structural information contained in the relationship-enhanced meta-path M aug-
mented informativeness matrix W dM accounts for the neighbourhood information generated

based on the meta-path M, the relevance of each node to the core node, and the decreasing
amount of information carried by edges over time. Because of this, it is used as an input to
the graphical convolutional neural network based on meta-paths and temporal attributes,
Mathematics 2023, 11, 3541 9 of 17

which guides the nodes to rely on the meta-path-based neighbourhood information when
they aggregate the neighbourhood information and accounts for the influence of each node
to the core node. Meanwhile, GCN based on meta-paths and temporal attributes accounts
for the change in information carried on each edge over time when updating the node
features, which can make the node features include temporal attributes.

4.2.3. Modified GAT


GATs can achieve better results in temporal network tasks than GCNs since the training
phase is performed only on subgraphs, while the testing phase must handle unknown
vertices [19]. GATs based on meta-paths and temporal attributes are implemented in two
main steps. The first step is learning to compute the attention coefficients to obtain the
correlation between vertices, as shown in Equation (9); the second step is performing the
node feature update as shown in Equation (11).
 
seij = a Whi ||Wh j , j ∈ N ( vi ) (9)
where hi represents the ith row of the augmented information matrix H after enhancing
the local information, which is the feature of node ni ; W ∈ Rd×n represents the shared
parameter, which can enhance the node features and is part of the feature enhancement
method; N (vi ) represents the set of neighbour nodes of nodevi ; [·||·] concatenates the
enhanced node features; and a() maps the concatenated high-dimensional features to a
real number. so f tmax () is used to normalize the similarity coefficients of the neighbouring
nodes of the central node to obtain the attention coefficient of each neighbouring node
when the node is aggregated. σ() is the activation function; in this paper, the LeakyReLU
activation function is used, as shown in Equation (10).
  
exp σ seij
αij = (10)
∑k∈ N (vi ) exp σ seik


The weighted summation of the features of each neighbouring node is performed


using the attention coefficients αij , as shown in Equation (11).
 
hbi = σ ∑ j∈ N (v ) αij h j (11)
i

Since the neighbourhood information in the local information enhanced augmented


augmentation matrix H is generated based on multiple specific meta-paths, it is used as an
input to the graph attention network based on meta-paths and temporal attributes, and
the updated node features can contain specific semantic information. In addition, the aug-
mented local information in the augmented information matrix H contains information that
decreases over time, thus reflecting the dynamically changing characteristics of different
meta-paths.

4.3. Decoder
In this paper, the link prediction problem is considered a binary classification problem.
The information in the first n snapshot is used to predict whether there is an edge between
two nodes in the first n + 1 snapshot.
 Two possible labels are given to the edges. If there is
a link between two nodes ni , n j , it is considered to be a positive sample, and its label is
1; otherwise, the label is 0. The existing edges in the n + 1 time slice are taken as positive
samples. Random sampling is used to obtain the same number of negative samples as
positive samples, where the negative samples are the edges that do not exist in the n + 1
snapshot. The multilayer perceptron MLP(·) is used to predict the existence of connected
edges between two nodes in the future network, as shown in Equation (12).
 
yb = MLP H b (12)
Mathematics 2023, 11, 3541 10 of 17

The information loss during model iteration is calculated using the binary cross-
entropy loss function, as shown in Equation (13).

1
n∑
loss = − [y ln yb + (1 − y) ln(1 − yb)] (13)

where y is the true value and yb is the predicted value.

5. Experiments
In this section, we describe more comprehensive experiments to illustrate the superiority
of the LP-THN model over other models in terms of prediction results. First, LP-THN and
other low-dimensional embedding representation methods are compared. Second, the superior
values of each parameter of LP-THN are determined through parameter sensitivity analysis.

5.1. Datasets and Settings


5.1.1. Datasets
We evaluated the performance of LP-THN on three temporal network datasets: one
for academic collaboration networks, one for music recommendation networks, and one
for film recommendation networks. The specific statistics of these three datasets are shown
in Table 1.
• AMiner [35] is a scientific research network that contains three types of nodes. We used
articles from this network published in five research areas in 9 time slices from 1990 to
1997. We considered two meta-paths: APA, which denotes author collaborations, and
APCPA, which denotes author participation in the same conference.
• Last.FM [36] is an online music platform. The network contains 3 types of nodes, and
the dataset we used contains partial information generated on the platform during
the years 1956, 1947, 1979 and 2005–2010, divided into 5 time slices. We considered
two meta-paths: U AU, which indicates that two users listened to the same artist, and
U ATAU, which indicates that two users listened to artists with the same musical style.
• MovieLens [37] is a noncommercial film recommendation platform. The network
contains a total of three types of nodes, and the dataset we used contains a portion
of the information generated on the site between 1996 and 2018, divided into 8 time
slices. We considered two meta-paths: U MU, which indicates that two users have
rated the same author, and U MGMU, which indicates that two users have rated a
film on the same topic.

Table 1. Statistics of Datasets.

Datasets Node Types #Node Meta-Path Time Steps


Author (A) 7043
APA
AMiner Paper (P) 5371 9
APCPA
Conference (C) 17
User (U) 1234
UAU
Last.FM Artist (A) 9438 5
UATAU
Tag (T) 5472
User (U) 819
UMU
MovieLens Movie (M) 12,677 8
UMGMU
Genre (G) 39

5.1.2. Baselines and Evaluation Metrics


To validate the performance of LP-THN, it is compared with existing network low-
dimensional embedding representation methods. These methods include five node low-
dimensional embedding representations based on shallow neural networks (i.e., DeepWalk,
Node2Vec, LINE, Struc2Vec, metapath2vec), one node low-dimensional embedding repre-
sentation based on matrix decomposition (i.e., M-NMF) and one node low-dimensional
Mathematics 2023, 11, 3541 11 of 17

embedding representation based on a deep neural network (i.e., Deeplink). Except meta-
path2vec, the other six methods are applied on static homogeneous networks, while
metapath2vec is a method applied on static heterogeneous networks. XGBoost is used as a
classifier for link prediction of baselines.
• DeepWalk [12]: node sequences are obtained by a random walk, and the node se-
quences are then input into Word2vec to obtain low-dimensional embedded represen-
tations of the nodes.
• Node2Vec [13]: node sequences are obtained by biased random walk, and the node
sequences are used as input to Word2vec to obtain low-dimensional embedded repre-
sentations of the nodes.
• LINE [15]: node embeddings are learned by maximizing the similarity between a node
and its first-order and second-order neighbours.
• M-NMF [14]: based on nonnegative matrix partitioning, which captures the commu-
nity structure in the graph as well as the similarity between nodes.
• Deeplink [38]: a deep learning method for node embedding representation that uses a
deep convolutional neural network to learn node embedding representation.
• Struc2Vec [39]: the context sequence of a node is constructed by traversing the depth-
first search path of each node, and the sequence is then fed into Word2vec to obtain
the node’s low-dimensional embedding representation.
• Metapath2vec [21]: node sequences are obtained by a random meta-path-based walk,
and the sequences are fed into Word2vec to obtain a representation of node embed-
dings in heterogeneous networks.
In this paper, three common evaluation metrics, AUC, F1 and ACC, are chosen to
evaluate the performance of these methods. The AUC can be interpreted as the probability
that the similarity value of an existing randomly selected edge is larger than the similarity
value of a non-existing randomly selected edge. F1 can be regarded as a weighted average
of the model’s accuracy and recall. The ACC is the ratio of the number of correctly predicted
samples to the total number of predicted samples. The larger the values of the AUC, F1
and ACC are, the better the model performs.

5.1.3. Experimental Settings


To ensure the comparison results are fair, the node dimension of the output of all
methods is set to 32, and aij = 5, p = 2, and q = 1. The initial values of the adaptive
weights w M1 and w M2 are set to 0.45 and 0.55, respectively. For the comparison methods
that require random walks, the number of walks per node is set to 80, the walk length is
set to 10, and the window size is set to 5. The types of nodes and edges in the network are
ignored when performing the homogeneous network embedding method.

5.2. Analysis of the Experimental Results


5.2.1. Comparison Experiments
In this paper, the information on the time slice [t1 , tn−1 ] of the dataset is used to predict
the links in the time slice tn of the dataset. The AMiner dataset predicts the links in the
time slice t9 using the information on the time slice [t1 , t8 ], the Last.FM dataset predicts
the links in time slice t5 using the information on time slice [t1 , t4 ], and the MovieLens
dataset predicts the links in time slice t8 using the information on time slice [t1 , t7 ]. The
node pairs with links present on time slice tn are taken as positive samples, and the same
number of node pairs without links on time slice tn are randomly selected as negative
samples. LP-THN-1st and LP-THN-2nd denote that the LP-THN model considers only
the first-order similarity meta-paths and only the second-order similarity meta-paths,
respectively. The results in Table 2 demonstrate several discoveries: (a) Our proposed
LP-THN model has better prediction results on all three datasets than the other models.
This is because our model considers the dynamics of the network structure over time and
retains the semantic information contained in the network. (b) The Metapath2vec method
outperforms other embedding representations based on homogeneous networks because
Mathematics 2023, 11, 3541 12 of 17

Metapath2vec accounts for the semantic information contained in the network. However, it
does not perform as well as the LP-THN model because it does not account for the temporal
information contained in the network. (c) The results of LP-THN- 1st are better than the
results of LP-THN-2nd because LP-THN- 1st considers only the first-order similarity and
LP-THN- 2nd considers only the second-order similarity. The node relationships extracted
from first-order similarity meta-paths are closer than those extracted from second-order
similarity meta-paths and therefore more informative for link prediction.

5.2.2. Parameter Sensitivity Analysis


First, the effect of using different values of parameter p and parameter q in Equation (2)
is investigated. The literature [26] demonstrates that a larger parameter q in nASF M indicates
a greater amount of remaining information on the meta-path M, and a larger parameter p
indicates a longer active period of information on the meta-path M. Figure 4 shows the effect
of different parameter p values on the performance of LP-THN on the AMiner, Last.FM and
MovieLens datasets. We obtain the optimal values of the parameter p in the three datasets to be
8, 1 and 4, respectively. Interestingly, we also find that parameter p changes, the performance
of LP-THN on the MovieLens dataset changes significantly, while its performance on the
Last.FM remains almost unchanged. The performance of LP-THN on the AMiner dataset
fluctuates greatly when p = 6 and varies less when other p values are taken. A precision–recall
curve is used to show the effect of the parameters on the model performance; the closer the
precision–recall curve is to the upper right, the better the model performs.
Figure 5 shows the effect of varying the parameter q on the LP-THN model in the
AMiner, Last.FM and MovieLens datasets. When the value range of parameter q is restricted
to [1, 10], its optimal values in the three datasets are 5, 5 and 3, respectively. Similarly to the
effect of parameter p on the LP-THN model, the effect of varying parameter q is still more
obvious in the MovieLens dataset than in the other datasets. Parameter q has less effect on
its performance in the Last.FM dataset, except that the fluctuation when q = 1 is larger. We
found the reason for the above phenomenon to be related to the time dimension spanned by
the temporal dataset. The effects of parameters p and q on the model are impacted by the
amount of information remaining on the meta-paths, and the Last. FM dataset aggregates
information from only 4 time slices, limiting the effect of these two parameters.
Figure 6 shows the effect of different aggregation methods considering meta-paths
with first-order similarity and second-order similarity on the performance of LP-THN in
the three temporal datasets. The different aggregation methods impact LP-THN less in
the AMiner and Last. FM datasets but more in the MovieLens dataset. Different types of
meta-paths contain different semantics, and the best results are obtained by aggregating
different meta-paths with adaptive weights.
Figure 7 shows the effect that varying the output dimension of the encoder has on
the performance of LP-THN on three temporal datasets. We set the dimensions to 10, 32,
64, 128, 256 and 512. In the AMiner and MovieLens datasets, the performance of LP-THN
largely fluctuates at a dimension of 128, and when the dimension is greater than 256, the
performance continuously decreases. However, changing dimension has little effect on the
performance of LP-THN on the Last.FM dataset. Our model can capture rich structural,
semantic and temporal information using low-dimensional representations.
Mathematics 2023, 11, 3541 13 of 17

Table 2. Performance Evaluation of LP-THN.

Datasets Metric DeepWalk Node2Vec LINE M-NMF Deeplink Struc2Vec Metapath2vec LP-THN-1st LP-THN-2nd LP-THN
AUC 0.9380 0.9625 0.9482 0.9415 0.9398 0.9332 0.9623 0.9912 0.9903 0.9934
AMiner F1 0.9403 0.9632 09493 0.9418 0.942 0.9358 0.9631 0.9627 0.9355 0.9686
ACC 0.9380 0.9625 0.9482 0.9415 0.9398 0.9332 0.9624 0.9728 0.9386 0.969
AUC 0.8539 0.9675 0.969 0.9223 0.9559 0.9594 0.9667 0.9844 0.9778 0.9848
Last.FM F1 0.8819 0.9707 0.9717 0.9318 0.9608 0.9638 0.9701 0.9768 0.9721 0.9768
ACC 0.8604 0.9687 0.9699 0.9249 0.9576 0.9610 0.9680 0.9752 0.9726 0.9761
AUC 0.8832 0.8576 0.9744 0.8967 0.9068 0.9345 0.9872 0.9913 0.9935 0.9945
MovieLens F1 0.8861 0.8642 0.9737 0.9000 0.9014 0.9333 0.9867 0.9927 0.9921 0.9981
ACC 0.8816 0.8553 0.9737 0.8947 0.9079 0.9342 0.9868 0.9933 0.9872 0.9860
The bold represents the best AUC/F1/ACC score within a network.
Mathematics 2023, 11, 3541 14 of 17

Figure 4. Effect of the parameter p on performance in different temporal heterogeneous network


datasets. Precision–recall curves are used to show the effect of the variation in p on the performance
of LP-THN.

Figure 5. Effect of the parameter q on performance in different temporal heterogeneous networks.


Precision–recall curves are used to show the effect of varying q on the performance of LP-THN.

Figure 6. Effect of meta-path aggregation method on LP-THN in different temporal datasets.

1.0

0.9

0.8
AUC

0.7

AMiner
0.6
Last.FM
MovieLens

10 32 64 128 256 512


d dimension

Figure 7. Effect of dimension d on LP-THN in different temporal datasets.


Mathematics 2023, 11, 3541 15 of 17

5.2.3. Ablation Experiments


The traditional temporal heterogeneous network link prediction methods ignore the
fact that the newly added edge will be active for a period of time and then calm down. In
this paper, information lifecycle was considered to describe the above case. In order to verify
whether the influence of information lifecycle on the prediction results was considered,
ablation experiments were conducted, as shown in Figure 8. We can see that LP-THN with
meta-path augmented residual information matrix has better performance than LP-THN
with meta-path augmented matrix. The meta-path augmented residual information matrix
considers both semantic and temporal information through information lifecycle, while
LP-THN with meta-path augmented matrix only considers semantic information. Meta-
path augmented matrix is a matrix with 0 and 1. If there is a meta-path instance between
two nodes, the corresponding position of the two nodes in the matrix is 1, otherwise, the
corresponding position of the two nodes in the matrix is 0.

Figure 8. Effect of information lifecycle on LP-THN in different temporal datasets.

5.2.4. Complexity Analysis


Since DeepWalk, Node2vec, LINE, M-NMF, Deeplink, and Struc2Vec are all node
embedding representations on homogeneous networks, their time complexity is much
higher than that of LP-THN. LP-THN takes the meta-path augmented residual information
matrix as the model input, so it needs to update only the meta-path node embedding
representations at both ends instead of all nodes in the network. In this subsection, the
time complexity of LP-THN is analysed. The complexity of this model is mainly due to
the encoder. The above analysis demonstrates that the encoder is mainly composed of
three parts: the modified self-attention module, modified GCN module andmodified 
2
GAT module. The time complexity of the modified self-attention module is O V M d ,
where V M denotes the number of nodes belonging to the node types at both ends of the
meta-path, and d denotes the dimension of the node embedding representation. The time
complexity of the modified GCN module is O E M , where E M denotes the number


ofinstances
 of selected meta-paths. The time complexity of the modified GAT module is
2 2
O V M d + O E M d . The summarized time complexity of LP-THN is O V M d +

 
2
O EM + O V M d + O EM d .
 

6. Conclusions
In this paper, we study the link prediction problem on temporal heterogeneous net-
works and propose an encoder–decoder framework, LP-THN. The semantic information
contained in meta-paths in temporal networks is captured by a proposed meta-path aug-
mented residual information matrix that portrays the dynamic process of network structure
changes over time. Considering the lifecycle of the information carried by meta-paths
enables LP-THN to account for both the rich semantic information in heterogenous net-
works and the evolution of network structure over time. Experiments demonstrate that the
LP-THN model proposed in this paper outperforms existing baseline frameworks. In our
future work, We plan to extend our model to large-scale networks and consider aggregating
the attributes of nodes into the embedding process, and process data in temporal networks
in real time.
Mathematics 2023, 11, 3541 16 of 17

Author Contributions: Conceptualization, J.C.; methodology, J.C.; software, J.C.; validation, J.C.;
formal analysis, J.C.; investigation, J.C.; resources, J.C.; data curation, J.C.; writing—original draft
preparation, J.C.; writing—review and editing, J.C.; visualization, J.C.; supervision, J.L.; project
administration, J.J.; funding acquisition, J.L. and J.J. All authors have read and agreed to the published
version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China (NNSFC)
under Grant 72001209, 72231011 and 72071206, and the Science Foundation for Outstanding Youth
Scholars of Hunan Province under Grant 2022JJ20047.
Institutional Review Board Statement: This article does not involve ethical research and does not
require ethical approval.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The common dataset used in this study can be obtained from links https:
//www.aminer.cn/, http://www.lastfm.com and https://movielens.org/ (accessed on 14 June 2023).
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Stanfield, Z.; Coşkun, M.; Koyutürk, M. Drug response prediction as a link prediction problem. Sci. Rep. 2017, 7, 40321. [CrossRef]
[PubMed]
2. Nasiri, E.; Berahm, K.; Rostami, M.; Dabiri, M. A novel link prediction algorithm for protein-protein interaction networks by
attributed graph embedding. Comput. Biol. Med. 2021, 137, 104772. [CrossRef] [PubMed]
3. Liben-Nowell, D.; Kleinberg, J. The link prediction problem for social networks. In Proceedings of the Twelfth International
Conference on Information and Knowledge Management, New Orleans, LA, USA, 9–11 November 2003; pp. 556–559.
4. Cho, H.; Yu, Y. Link prediction for interdisciplinary collaboration via co-authorship network. Soc. Netw. Anal. Min. 2018, 8, 1–12.
[CrossRef]
5. Liu, G. An ecommerce recommendation algorithm based on link prediction. Alex. Eng. J. 2022, 61, 905–910. [CrossRef]
6. Adamic, L.A.; Adar, E. Friends and neighbors on the web. Soc. Net. 2003, 25, 211–230. [CrossRef]
7. Zhou, T.; Lü; L.; Zhang, Y.C. Predicting missing links via local information. Eur. Phys. J. B 2009, 71, 623–630. [CrossRef]
8. Katz, L. A new status index derived from sociometric analysis. Psychometrika 1953, 18, 39–49. [CrossRef]
9. Leicht, E.A.; Holme, P.; Newman, M.E. Vertex similarity in networks. Phys. Rev. E 2006, 73, 026120. [CrossRef]
10. Fouss, F.; Pirotte, A.; Renders, J.M.; Saerens, M. Random-walk computation of similarities between nodes of a graph with
application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 2007, 19, 355–369. [CrossRef]
11. Jeh, G.; Widom, J. Simrank: A measure of structural-context similarity. In Proceedings of the Eighth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 538–543.
12. Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710.
13. Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864.
14. Wang, X.; Cui, P.; Wang, J.; Pei, J.; Zhu, W.; Yang, S. Community preserving network embedding. In Proceedings of the AAAI
Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31.
15. Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. Line: Large-scale information network embedding. In Proceedings of the
24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 1067–1077.
16. Scarselli, F.; Gori, M.; Tsoi, A.C. The graph neural network model. IEEE Trans. Neural Net. 2008, 20, 61–80. [CrossRef]
17. Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. NIPS 2017, 30, 1024–1034.
18. Kipf, T.N.; Welling, M. Semi-supervised, classification, with, graph, convolutional, networks. arXiv 2017, arXiv:1609.02907.
19. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903.
20. Shi, C.; Wang, R.; Wang, X. Survey on Heterogeneous Information Networks Analysis and Applications. J. Softw. 2022, 33, 598–621.
21. Dong, Y.; Chawla, N.V.; Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of
the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August
2017; pp. 135–144.
22. Fu, T.Y.; Lee, W.C.; Lei, Z. Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017;
pp. 1797–1806.
23. Chang, S.; Han, W.; Tang, J.; Qi, G.J.; Aggarwal, C.C.; Huang, T.S. Heterogeneous network embedding via deep architectures. In
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia,
10–13 August 2015; pp. 119–128.
Mathematics 2023, 11, 3541 17 of 17

24. Zhang, C.; Song, D.; Huang, C.; Swami, A.; Chawla, N.V. Heterogeneous graph neural network. In Proceedings of the 25th ACM
SIGKDD International Conference on Knowledge Discovery Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 793–803.
25. Shi, C.; Hu, B.; Zhao, W.X.; Philip, S.Y. Heterogeneous information network embedding for recommendation. IEEE Trans. Knowl.
Data Eng. 2018, 31, 357–370. [CrossRef]
26. Zhang, R.; Wang, Q.; Yang, Q.; Wei, W. Temporal link prediction via adjusted sigmoid function and 2-simplex structure. Sci. Rep.
2022, 12, 16585. [CrossRef]
27. Zou, L.; Zhan, X.X.; Sun, J.; Hanjalic, A.; Wang, H. Temporal network prediction and interpretation. IEEE Trans. Netw. Sci. Eng.
2021, 9, 1215–1224. [CrossRef]
28. Zhou, L.; Yang, Y.; Ren, X.; Wu, F.; Zhuang, Y. Dynamic network embedding by modeling triadic closure process. In Proceedings
of the AAAI Conference on Artificial Intelligence, New Orleans, LO, USA, 2–7 February 2018; Volume 32.
29. Nguyen, G.H.; Lee, J.B.; Rossi, R.A.; Ahmed, N.K.; Koh, E.; Kim, S. Continuous-time dynamic network embeddings. In
Proceedings of the Companion Proceedings of the Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 969–976.
30. Rahman, M.; Saha, T.K.; Hasan, M.A.; Xu, K.S.; Reddy, C.K. Dylink2vec: Effective feature representation for link prediction in
dynamic networks. arXiv 2018, arXiv:1804.05755.
31. Abbas, K.; Abbasi, A.; Dong, S.; Niu, L.; Chen, L.; Chen, B. A Novel Temporal Network-Embedding Algorithm for Link Prediction
in Dynamic Networks. Entropy 2023, 25, 257. [CrossRef]
32. Yin, Y.; Ji, L.X.; Zhang, J.P.; Pei, Y.L. DHNE: Network representation learning method for dynamic heterogeneous networks. IEEE
Access 2019, 7, 134782–134792. [CrossRef]
33. Wang, X.; Lu, Y.; Shi, C.; Wang, R.; Cui, P.; Mou, S. Dynamic heterogeneous information network embedding with meta-path
based proximity. IEEE Trans. Knowl. Data Eng. 2020, 34, 1117–1132. [CrossRef]
34. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need.
NIPS 2017, 30, 5998–6008.
35. Tang, J.; Zhang, J.; Yao, L.; Li, J.; Zhang, L.; Su, Z. Arnetminer: Extraction and mining of academic social networks. In Proceedings
of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27
August 2008; pp. 990–998.
36. Cantador, I.; Brusilovsky, P.; Kuflik, T. Second workshop on information heterogeneity and fusion in recommender systems
(HetRec2011). In Proceedings of the Fifth ACM Conference on Recommender Systems, Chicago, IL, USA, 23–27 October 2011;
pp. 387–388.
37. Harper, F.M.; Konstan, J.A. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst. 2015, 5, 1–19. [CrossRef]
38. Zhou, F.; Liu, L.; Zhang, K.; Trajcevski, G.; Wu, J.; Zhong, T. Deeplink: A deep learning approach for user identity linkage. In
Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA, 15–19 April 2018;
pp. 1313–1321.
39. Ribeiro, L.F.; Saverese, P.H.; Figueiredo, D.R. struc2vec: Learning node representations from structural identity. In Proceedings of
the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August
2017; pp. 385–394.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like