Unveiling Hidden Implicit Similarities For Cross-Domain Recommendation
Unveiling Hidden Implicit Similarities For Cross-Domain Recommendation
Unveiling Hidden Implicit Similarities For Cross-Domain Recommendation
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
1
Abstract—E-commerce businesses are increasingly dependent on recommendation systems to introduce personalized services and
products to targeted customers. Providing effective recommendations requires sufficient knowledge about user preferences and
product (item) characteristics. Given the current abundance of available data across domains, achieving a thorough understanding of
the relationship between users and items can bring in more collaborative filtering power and lead to a higher recommendation accuracy.
However, how to effectively utilize different types of knowledge obtained across domains is still a challenging problem. In this paper, we
propose to discover both explicit and implicit similarities from latent factors across domains based on matrix tri-factorization. In our
research, common factors in a shared dimension (users or items) of two coupled matrices are discovered, while at the same time,
domain-specific factors of the shared dimension are also preserved. We will show that such preservation of both common and
domain-specific factors are significantly beneficial to cross-domain recommendations. Moreover, on the non-shared dimension, we
propose to use the middle matrix of the tri-factorization to match the unique factors, and align the matched unique factors to transfer
cross-domain implicit similarities and thus further improve the recommendation. This research is the first that proposes the transfer of
knowledge across the non-shared (non-coupled) dimensions. Validated on real-world datasets, our approach outperforms existing
algorithms by more than two times in term of recommendation accuracy. These empirical results illustrate the potential of utilizing both
explicit and implicit similarities for making across-domain recommendations.
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
2
(a) Percentage of high income family per LGA (b) Number of “break and enter dwelling” incidents per LGA
Fig. 1: (Best viewed in color) Implicit correlations between income and crime rate. Local government areas (LGA) in New South Wales (NSW)
state with more high-income families have a lower “break and enter dwelling” incidents. Data in a) is from Australian Bureau of Statistics and b)
is from NSW Bureau of Crime Statistics and Research.
coupled learning between datasets [19], [20], [21]. Besides factor for both datasets, is that cross-domain datasets have
these explicit similarities, we hypothesize that cross-domain their unique knowledge of their domains. Thus, HISF as-
datasets also have other implicit correlations in their re- sumes the coupled factors not to be equal, but they contain
maining dimensions. Our intuition comes from consumer common parts, which are shared between datasets across
segmentation which is a popular concept in marketing. domains, and domain-specific parts, which are unique for
Consumers can be grouped by their behaviors, for example, each domain.
“tech leaders” is a group of consumers with a strong desire Also, our another contribution is with implicit similar-
to own new smartphones with the latest technologies or ities. The fact that non-coupled dimensions in the afore-
another user group who only changes a new phone when mentioned example contain non-overlapping users prevents
the current one is out of order. Although users on Amazon a direct knowledge sharing in their non-coupled factors.
and Walmart are different, users sharing similar interests However, their latent behaviors are correlated and should
may share similar behaviors on the same set of products. be shared. These latent behaviors can be captured in low-
We think that this type of correlation also implicitly exists rank factors by matrix tri-factorization. As factorization is
in many other scenarios in the real world, for example, equivalent to spectral clustering [25], different users with
local suburbs with more high-income families may lead to a similar preferences are grouped in non-coupled user factors.
lower crime rate as shown in Fig. 1, or users with interest on Developing on this concept, we hypothesize that latent
action books may also like related movie genres (e.g., action clusters in these non-coupled factors may have a close
films). Thus, using both explicit and implicit similarities relationship. Therefore, we align correlated clusters in non-
properly will have a great potential to be applied to many coupled factors to be as close as possible. This idea matches
applications. the fundamental concept of CF in the sense that similar user
Different approaches have been proposed to perform a groups who rate similarly will continue to do so.
joint analysis of multiple datasets [22]. However, all of the In short, our key contributions are:
existing algorithms use explicit similarities as a bridge to 1) Sharing common latent variables while preserving
collaborate among datasets. The popular Collective Matrix unique ones in the coupled factors (Sect. 4.1): We propose
Factorization (CMF) [19] jointly analyzes datasets by as- coupled factors to include both common and their own
suming them to have an identical low-rank factor in their unique parts (e.g., common V(0) and domain-specific V(1) ,
coupled dimension. In this case, the shared identical factor V(2) in Figure 2.) This model better captures the true explicit
captures the explicit similarities across domains. Li et al. characteristics of cross-domain datasets.
[23] suggest correlated datasets to share explicit hidden 2) Aligning implicit similarities in non-coupled factors
rating patterns. The similarities between rating patterns are (Sect. 4.2): We introduce a method to leverage implicit
then transfered from one to another dataset. Gao et al. [24] similarities. HISF is the first factorization algorithm that
extend Li’s idea to include unique patterns in each dataset. utilizes both explicit and implicit similarities across domains
Weike et al. [4] regularizes factors of the target user-by- for recommendation accuracy improvement.
item matrix with those, called principal coordinates, from 3) Introducing an algorithm that optimizes all factors
user profile and item information matrices. Although these (Sect. 4.3): All factors are optimized by an algorithm follow-
explicit similarities showed their effectiveness in improving ing alternating least squared approach (Algorithm 1). We
recommendation, there are still rich implicit features that test it with real-world datasets, and the empirical results
were not used but have great potential to further improve suggest our proposed HISF the best choice for joint analysis
the recommendation. of datasets across domains (Sect. 6).
Motivated by this literature gap, we propose a Hid- 4) Utilizing knowledge from potentially unlimited
den Implicit Similarities Factorization model (HISF) as the number of sources (Sect. 5): We propose a generalized
first algorithm to utilize both explicit and implicit simi- model utilizing both similarities from multiple datasets.
larities across datasets. Our idea for explicit similarities, Our experiments demonstrate its advantages of leveraging
differed from CMF which assumes an identical coupled correlations from more than two datasets, suggesting the
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
3
item
user of X(1) specific parts
domain 1
[ ]
T
V(0)
V(1)
common parts
X(1) U(1) S(1) (explicitly shared)
user clusters alignment
(implicitly shared)
item
user of
domain 2 X(2) specific parts
[ ]
T
V(0)
X(2) U(2) S(2)
V(2)
Fig. 2: (Best viewed in color) Our Hidden Implicit Similarities Factorization model for two rating matrices X(1) and X(2) . They are coupled in
their item dimension. Our proposed joint analysis decomposes X(1) into U(1) , S(1) , common V(0) and its specific V(1) . At the same time, X(2)
is factorized into U(2) , S(2) , common V(0) and its specific V(2) . Note that our model also matches clusters of users with similar interests in
non-coupled factors U(1) and U(2) by aligning correlated clusters (captured in their columns) as close as possible.
X ≈ U × S × VT
Symbol Description
(i) where r is the rank of the factorization, UT × U = I and
X Rating matrix from i-th dataset
U(i) The first dimension factor of X(i)
VT × V = I.
V(0) Common parts of the coupled factors In this case, U is the user factor, V is the item factor
V(i) Domain-specific parts of the coupled factor of X(i) and S is the weight between U and V. These factors can be
S(i) Weighting factor of X(i) found by solving the optimization of the following:
AT Transpose of A
2
A+ minL =
X − U × S × VT
Moore-Penrose pseudo inverse of A (1)
I The identity matrix
kAk Frobenius norm
n, m, p Dimension length
c Number of common clusters in coupled factors 2.4 Joint analysis of different datasets
r Rank of decomposition
ΩX Number of observations in X
Joint analysis of two matrices coupled in one dimension can
∂
Partial derivative with respect to x be done by minimizing a coupled loss function [19]:
∂x
L Loss function
2
2
L =
X(1) − U(1) × S(1) × VT
+
X(2) − U(2) × S(2) × VT
λ Regularization parameter
× Multiplication
where V is the common coupled factor for both datasets.
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
4
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
5
item
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 U(1) U(1) U(1) U(1)
user 1
T T
X(1) U(1) S(1) V(1) U’(1) S’(1) V’(1)
a) b)
Fig. 3: (Best viewed in color) Matrix factorization as a clustering U(2) U(2) U(2) U(2)
method. Suppose X(1) contains ratings of users for items and there
are two user groups represented by their rating similarities (brown
circles and green triangles). When X(1) is factorized with matrix tri- (a) (b) (c) (d)
factorization, these two user groups are captured in columns of the
user factor. Two possible cases can happen: a) users with brown circles Fig. 5: (Best viewed in color) User clusters’ alignment between X(1) in
are in the first column and those with green triangles are in the second
Fig. 3 and X(2) in Fig. 4. There are four possible cases: in a) and d) the
column of U(1) ; or b) users with brown circles are in the second column
user cluster in the first column of U(1) matches the user cluster in the
and those with green triangles are in the first column of U0 (1) . first column of U(2) and the user cluster in the second column of U(1)
matches the user cluster in the second column of U(2) ; in b) and c) the
item user cluster in the first column of U(1) matches the user cluster in the
1
1 1 1 1
1
1 1 1 second column of U(2) and the user cluster in the second column of
1
1 1 1 1 1 1 1 1
U(1) matches the user cluster in the first column of U(2) . These show
user 2 the challenge to determine how clusters in U(1) and U(2) are aligned
at each iteration.
T T
X(2) U(2) S(2) V(2) U’(2) S’(2) V’(2)
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
6
xy
u1,x+u2,x+u3,x u1,y+u2,y+u3,y
m T1 = ( , )
3 3
U u4,x+u5,x u4,y+u5,y
m T2 = ( , )
2 2
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
7
Algorithm 1: HISF: Utilizing both explicit and implicit partial derivative of L with respect to it is set to zero.
similarities from two matrices m
δL X (1) (1) T (01) (01) T
Input : X(1) , X(2) , E =−2 Xi,j − ui vj vj
(1) T
δui j
Output: U(1) , S(1) , V(0) , V(1) , U(2) , S(2) , V(2)
(1) T (1) T
− bT + 2λui
+ 2 ui
1 Randomly initialize all factors
(1) T (1) T T
2 Initialize L by a small number = − 2xi,∗ V(01) + 2ui V(01) V(01)
(1) T (1) T
3 repeat + 2ui − 2bT + 2λui
4 PreL = L
(1) T (2) T (1) T
where bT = −ml + m∗ + ui , l is the cluster
5 Find matches between clusters in U(1) and U(2)
(1) T
user i belongs to and xi,∗
is a row vector of all observed
6 Solve U(1) by (5) (1)
7 Solve U(2) by (6) Xi,j , ∀j ∈ [1, m].
8 Solve common V(0) by (7) By setting δL (1) T
= 0, we achieve updating rule for
δui
9 Solve domain-specific V(1) by (8) (1) T
ui :
10 Solve domain-specific V(2) by (9)
+
Solve S(1) by (10)
11 (1) T T (1) T
ui = V(01) V(01) + (λ + 1)I xi,∗ V(01) + bT (5)
12 Solve S(2) by (11)
13 Compute L following (2) (2) T
In the same way, optimal uk can be derived from:
14 until ( PreL−L
PreL < E ) +
(2) T T (2) T
uk = V(02) V(02) + (λ + 1)I xk,∗ V(02) + bT (6)
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
8
(0) (1) T
Our updating rule for vj is: xi,∗ V(01) , and performs Cholesky decompositions for the
ΩX(1)
(0)
T T
+
T (1) T (2)
pseudo inverse. Preparing V(01) requires O( n r) opera-
vj = U(1) U(1) + U(2) U(2) + λI U(1) y∗,j + U(2) y∗,j (1) T
(01) T (01) Ω
(7)
tions while V V and takes O( Xn r2 )
xi,∗ V(01) (1)
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
9
TABLE 2: Dimension and number of known entries for training, valida-
tion and testing of census data on New South Wales (NSW)(X(1) ) and
• Total Household Income (weekly) by Rent (weekly)
Victoria (VIC) (X(2) ) states as well as crime statistics of NSW (X(3) ). • Family Composition by Mortgage Repayment
• Family Composition by Income Comparison for Par-
Characteristics X(1) X(2) X(3) ents/Partners
Dimension 154 × 7,889 81 × 7,889 154 × 62
• Family Composition and Social Marital Status by
Training 91,069 47,900 661 Number of Dependent Children
Validation 4,793 2,521 34 • Selected Labour Force, Education and Migration
Testing 23,965 12,605 173 Characteristics
• Family Composition and Labour Force Status of Par-
TABLE 3: Dimension and number of known entries for training, vali-
dation and testing of Amazon dataset on books (X(4) ), Movies (X(5) )
ent(s)/Partners by Total Family Income (weekly)
and Electronics (X(6) ). • Non-School Qualification: Level of Education by Age
by Sex
Characteristics X(4) X(5) X(6) • Non-School Qualification: Field of Study by Age by
Dimension 5,000 × 5,000 5,000 × 5,000 5,000 × 5,000 Sex
Training 158,907 94,665 41,126 • Labour Force Status by Age by Sex
Validation 8,363 4,982 2,164 • Industry of Employment by Sex
Testing 18,585 11,071 4,809 • Occupation by Sex
We form a matrix X(1) (LGA by population and family
6 E XPERIMENTS AND A NALYSIS profile) for NSW and another matrix X(2) for VIC. Values of
We evaluate our proposed HISF in comparison with existing the matrix are normalized by row. We then randomly select
algorithms, including CMF [19], CST [4], CBT [23] and 10% of the data and use its 80% for training and 20% for
CLFM [24]. This evaluation thoroughly studies two test testing.
cases: one with two matrices and another one with three
matrices. Our goal is to evaluate how well these algorithms 6.1.2 Dataset #2
suggest unknown information based on the observed cross- Bureau of Crime Statistics and Research (BOCSAR)2 pro-
domain ratings. For this purpose, we compare them based vides a statistical record of criminal incidents within 154
on the commonly used root mean squared error (RMSE) LGAs of New South Wales. There are 62 specific offence cat-
metric. egories. We form a matrix X(3) (LGA by offence categories)
v whose entries represents how many cases were reported for
u 2
an offence category in an LGA. Values of the matrix are
uP
u n,m U × S × VT − X
u i,j i j i,j normalized by row. We randomly select 10% of the data for
RMSE =
t
ΩX the experiments. Among them, 80% of X(3) are for training
and the rest are for testing.
where ΩX is the number of observations of X.
6.1.3 Dataset #3
6.1 Data for the experiments Three matrices of ratings for books, movies and electronics
We use three publicly available datasets for our experi- are extracted from Amazon website [29]. The book data
ments. Their characteristics are summarized in Table 2 and contains ratings from 305,475 users on 888,057 books; the
Table 3. movie data is ratings from the same 305,475 users on 128,097
movies and TV programs; the electronics data is of the same
6.1.1 Dataset #1 users on 196,894 items. All ratings are from 1 to 5. For this
Australian Bureau of Statistics (ABS)1 publishes comprehen- experiment, the data is constructed as the following:
sive census data of New South Wales (NSW) and Victoria - We first adopt the same sub-sampling approach as in
(VIC) states. The dataset for NSW comprises of populations [4] by randomly extracting 104 × 104 dense rating matrices
and family profiles of 154 areas, so-called “local government from these three matrices. Then, we take three sub-matrices
areas” (LGA), and of 81 LGAs in VIC. 7,889 aspects of of 5,000×5,000 each, as summarized in Table 3. All sub-
population and family profile are extracted from the below matrices share the same users, but no common items.
census categories: - All ratings are normalized by 5 so that their values are
from 0.2 to 1.
• Rent (weekly) by Landlord Type
• Rent (weekly) by Family Composition for Couple
6.2 Experimental settings
Families
• Rent (weekly) by Family Composition for One Parent We performed factorization with different ranks. Also, each
Families algorithm was run five times. The mean and standard
• Total Family Income (weekly) by Number of Chil- deviation of the results are reported in the next Section.
dren for Couple Families Furthermore, we assume small changes across consecutive
• Total Family Income (weekly) by Number of Chil- iterations indicate an algorithm’s convergence. Thus, we
dren for One Parent Families stopped the algorithms when changes were less than 10−5 .
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
10
TABLE 4: Mean and standard deviation of tested RMSE on ABS NSW and ABS VIC data with different algorithms. For CST, when X(1) is the
target, X(2) is used as an auxiliary data and vice versa. HISF1 denotes the algorithm with only explicit similarities whereas HISF2 uses both
implicit and explicit similarities. Best results for each rank are in bold. The Hotelling’s T squared tests row presents p-value of Hotelling’s T
squared tests between each algorithm and our proposed HISF.
6.3 Empirical results columns are 3, 5, 5, 7, 7 and 7 for the decomposition rank
Three scenarios are tested as the following: of 5, 7, 9, 11, 13 and 15, respectively. It means that common
parts together with domain-specific parts better capture the
true correlation nature of datasets. These explicit similarities
6.3.1 Case #1. Latent demographic profile similarities and
together with implicit similar group alignments allow better
latent LGA groups similarities can help to collaboratively
knowledge leveraging between datasets, thus, improving
suggest unknown information in these states
recommendation accuracy.
We use X(1) and X(2) as described in Sect. 6.1 which are To confirm the statistical significance of our method,
from different LGAs of two states. Nevertheless, both of we perform Hotelling’s T squared tests [30] which is the
them are ratings for the same demographic categories. They multivariate version of the t-tests in univariate statistics.
share some common explicit demography similarities as Our objective is to validate if our proposed algorithm dif-
well as implicit LGAs’ latent ones. We would like to assess fers significantly from baselines. We use this multivariate
how well both explicit similarities in demography dimen- Hotelling’s T squared tests here because each population
sion and implicit ones in LGA dimension collaboratively involves observations from two variables: NSW (ABS NSW
suggest unknown information. X(1) ) and VIC (ABS VIC X(2) ). For testing the null hypoth-
Table 4 shows mean and standard deviation of RMSE esis that each pair of algorithms (CMF vs. HISF, CBT vs.
of all algorithms on tested ABS data for New South Wales HISF, CLFM vs. HISF and CST vs. HISF) has identical mean
(X(1) ) and Victoria (X(2) ) states. Both CBT and CLFM that RMSE vectors, let
assume two states have similar demography patterns in H0 : population mean RMSEs are identical for all of the variables
latent sense clearly perform the worst. The results demon- (µN SW 1 = µN SW 2 and µV IC 1 = µV IC 2 )
strate that these explicit similarities (in the form of latent H1 : at least one pair of these means is different
demography patterns) do not fully capture the correlation
(µN SW 1 6= µN SW 2 or µV IC 1 6= µV IC 2 )
nature of two datasets in this case. Thus, they do not help
both CBT and CLFM to improve their performance signif- Because all p-values are smaller than α (0.05), we re-
icantly. CMF applies another approach to take advantages ject the null hypothesis. Therefore, the observed difference
of explicit correlations between NSW state’s population and between the baselines and our proposed algorithm is statis-
family profile and those of VIC state. Specifically, CMF’s as- tically different.
sumption on the same population and family profile factor
between NSW and VIC helps improve its performance over 6.3.2 Case #2. Latent LGA similarities and latent similari-
that of CBT and CLFM. CST allows a more flexible utiliza- ties between demography and crime can help to collabora-
tion of explicit correlations between NSW’s population and tively suggest unknown crime and state information
family profile and those of VIC state than CMF does. As a The advantages of both explicit and implicit similarities
result, CST achieves a little bit higher accuracy than CMF in are further confirmed in Table 5. In this case, they are
recommending NSW’s and VIC’s missing information. applied to other cross domains: ABS NSW demography
Nevertheless, the prediction accuracy can be improved (X(1) ) and NSW Crime (X(3) ). These datasets have explicit
even more as illustrated with our proposed idea of explicit similarities in their LGA latent factors. At the same time,
and implicit similarities discovery. Utilizing them helps our implicit similarities in demography profile and criminal
proposed HISF to achieve about two times higher accuracy behaviors are also utilized. Our proposed HISF leveraging
compared with CMF, about up to 47% improvement for both similarities outperforms existing algorithms. It is worth
NSW and up to 25% for VIC compared to CST. These im- to note here that performance of CST, in this case, is worse
pressive results are achieved when the numbers of common than that of CMF (about two times worse for NSW and a
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
11
TABLE 5: Mean and standard deviation of tested RMSE on ABS NSW demography and BOCSAR NSW crime data with different algorithms. For
CST, when X(1) is the target, X(3) is used as an auxiliary data and vice versa. HISF1 denotes the algorithm with only explicit similarities whereas
HISF2 uses both implicit and explicit similarities. Best results for each rank are in bold. The Hotelling’s T squared tests row presents p-value of
Hotelling’s T squared tests between each algorithm and our proposed HISF.
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
12
TABLE 6: Mean and standard deviation of tested RMSE on book, movie and electronics data with different algorithms. CST is not applied here
as it does not support two or more principal coordinates on one factor. Best results for each rank are in bold. The Hotelling’s T squared tests row
presents p-value of Hotelling’s T squared tests between each algorithm and our proposed HISF.
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
13
R EFERENCES
7.3 Cluster-Level Latent Factor Model (CLFM)
[1] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques
The assumption that two datasets from different domains for recommender systems,” Computer, 2009.
have the same rating patterns would be unrealistic in [2] P. Lops, M. de Gemmis, and G. Semeraro, “Content-based rec-
ommender systems: State of the art and trends,” in Recommender
practice. This intuition motivated Gao et al. [24] to pro- Systems Handbook, 2011.
pose a generalized codebook sharing model by introducing [3] Y. Koren and R. Bell, “Advances in collaborative filtering,” in
CLFM. In specific, CLFM only shared common parts of Recommender Systems Handbook, 2011.
[4] W. Pan, E. W. Xiang, N. N. Liu, and Q. Yang, “Transfer learning in
the rating patterns between datasets. By doing so, CLFM collaborative filtering for sparsity reduction,” in Proceedings of the
model learned the only explicitly shared latent parts be- AAAI Conference on Artificial Intelligence, 2010.
tween datasets. When the number of commonly shared [5] W. Chen, W. Hsu, and M. L. Lee, “Making recommendations from
parts equals decomposition rank, CLFM is CBT. In all other multiple domains,” in Proceedings of the ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 2013.
cases, CLFM model learns the only the shared latent space [6] C.-Y. Li and S.-D. Lin, “Matching users and items across domains
while preserving the unique characteristics of each data. to improve the recommendation quality,” in Proceedings of the ACM
This model, thus, can be used to joint analyze multiple SIGKDD International Conference on Knowledge Discovery and Data
Mining, 2014.
datasets to overcome the sparsity of each of them. [7] M. Jiang, P. Cui, X. Chen, F. Wang, W. Zhu, and S. Yang, “So-
cial recommendation with cross-domain transferable knowledge,”
IEEE Transactions on Knowledge and Data Engineering, 2015.
7.4 Coordinate System Transfer (CST) [8] Y. Wei, Y. Zheng, and Q. Yang, “Transfer knowledge between
cities,” in Proceedings of the ACM SIGKDD International Conference
All of the above methods were based on an equally shared on Knowledge Discovery and Data Mining, 2016.
correlation between datasets. Basing on observations that [9] D. Yang, J. He, H. Qin, Y. Xiao, and W. Wang, “A graph-based
the rating matrix for recommendation is often sparse while recommendation across heterogeneous domains,” in Proceedings of
the ACM International on Conference on Information and Knowledge
auxiliary data of the users and items can sometimes be Management, 2015.
found, Weike et al. [4] suggested that common coordinates, [10] Y.-F. Liu, C.-Y. Hsu, and S.-H. Wu, “Non-linear cross-domain
e.g., users tastes, would exist between the main matrix collaborative filtering via hyper-structure transfer,” in Proceedings
and its side data. As a result, these coordinates could be of the International Conference on Machine Learning, 2015.
[11] T. Iwata and T. Koh, “Cross-domain recommendation without
transferred from the auxiliary data to the main rating matrix. shared users or items by sharing latent vector distributions,” in
The author proposed Coordinate System Transfer (CST) Proceedings of the International Conference on Artificial Intelligence and
model to construct principal coordinates in low-dimensional Statistics, 2015.
space for the users and items. It then utilized these explicit [12] H. Jing, A. C. Liang, S. D. Lin, and Y. Tsao, “A transfer probabilistic
collective factorization model to handle sparse data in collabora-
similarities on principal coordinates as regularization terms tive filtering,” in Proceedings of the IEEE International Conference on
on the target data. Data Mining, 2014.
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2923904, IEEE
Transactions on Knowledge and Data Engineering
14
[13] L. Zhao, S. J. Pan, E. W. Xiang, E. Zhong, Z. Lu, and Q. Yang, [37] O. Moreno, B. Shapira, L. Rokach, and G. Shani, “Talmud: Transfer
“Active transfer learning for cross-system recommendation,” in learning for multiple domains,” in Proceedings of the ACM Interna-
Proceedings of the AAAI Conference on Artificial Intelligence, 2013. tional Conference on Information and Knowledge Management, 2012.
[14] L. Hu, J. Cao, G. Xu, L. Cao, Z. Gu, and C. Zhu, “Personalized
recommendation via cross-domain triadic factorization,” in Pro-
ceedings of the International Conference on World Wide Web, 2013.
[15] B. Wang, M. Ester, Y. Liao, J. Bu, Y. Zhu, Z. Guan, and D. Cai, “The
million domain challenge: Broadcast email prioritization by cross-
domain recommendation,” in Proceedings of the ACM SIGKDD Quan Do is a PhD student in Advanced An-
International Conference on Knowledge Discovery and Data Mining, alytics Institute, University of Technology Syd-
2016. ney, Australia. He conducts research on data
[16] C. C. Hsu, M. Y. Yeh, and S. d. Lin, “A general framework for mining and publishes papers on tensor fac-
implicit and explicit social recommendation,” IEEE Transactions on torization, recommendation systems and cross-
Knowledge and Data Engineering, 2018. domain learning.
[17] F. Wu, Z. Yuan, and Y. Huang, “Collaboratively training sentiment
classifiers for multiple domains,” IEEE Transactions on Knowledge
and Data Engineering, 2017.
[18] J. D. Zhang, C. Y. Chow, and J. Xu, “Enabling kernel-based
attribute-aware matrix factorization for rating prediction,” IEEE
Transactions on Knowledge and Data Engineering, 2017.
[19] A. P. Singh and G. J. Gordon, “Relational learning via collective
matrix factorization,” in Proceedings of the ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining, 2008.
Wei Liu is a Senior Lecturer at the Advanced
[20] E. Acar, T. G. Kolda, and D. M. Dunlavy, “All-at-once optimiza-
Analytics Institute, School of Software, Faculty
tion for coupled matrix and tensor factorizations,” arXiv preprint
of Engineering and Information Technology, the
arXiv:1105.3422, 2011.
University of Technology, Sydney. Before joining
[21] K. Shin, L. Sael, and U. Kang, “Fully scalable methods for dis- UTS, he was a Research Fellow at the Univer-
tributed tensor factorization,” IEEE Transactions on Knowledge and sity of Melbourne and then a Machine Learning
Data Engineering, 2017. Researcher at NICTA. He obtained his PhD from
[22] W. Pan, “A survey of transfer learning for collaborative recom- the University of Sydney. He works in the areas
mendation with auxiliary data,” Neurocomput., 2016. of machine learning and data mining and has
[23] B. Li, Q. Yang, and X. Xue, “Can movies and books collaborate?: been publishing papers in research topics of
Cross-domain collaborative filtering for sparsity reduction,” in tensor factorization, adversarial learning, graph
Proceedings of the International Joint Conference on Artifical Intelli- mining, causal inference, and anomaly detection.
gence, 2009.
[24] S. Gao, H. Luo, D. Chen, S. Li, P. Gallinari, and J. Guo, “Cross-
domain recommendation via cluster-level latent factor model,”
in Proceedings of the European Conference on Machine Learning and
Knowledge Discovery in Databases, 2013.
[25] C. Ding, T. Li, W. Peng, and H. Park, “Orthogonal nonnegative Jin Fan received the B.Sc. degree from Xian
matrix t-factorizations for clustering,” in Proceedings of the ACM Jiaotong University, China, the M.Sc. and Ph.D.
SIGKDD International Conference on Knowledge Discovery and Data degrees from Loughborough University, U.K.
Mining, 2006. She is currently an Associate Professor with the
[26] T. G. Kolda and B. W. Bader, “Tensor decompositions and applica- Department of Computer Science and Technol-
tions,” SIAM Rev., 2009. ogy, Hangzhou Dianzi University, China. Her re-
[27] C. Bauckhage, “k-means clustering is matrix factorization,” in search interests lie in the general areas of mo-
arxiv: 1512.07548, 2015. bile sensing and related data analytics aspects.
[28] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative filtering for
implicit feedback datasets,” in Proceedings of the IEEE International
Conference on Data Mining, 2008.
[29] R. He and J. McAuley, “Ups and downs: Modeling the visual
evolution of fashion trends with one-class collaborative filtering,”
in Proceedings of the International Conference on World Wide Web,
2016.
[30] H. Hotelling, “The generalization of student’s ratio,” The Annals of Dacheng Tao (F’15) is Professor of Computer
Mathematical Statistics, 1931. Science and ARC Laureate Fellow in the School
of Information Technologies and the Faculty of
[31] P. Bhargava, T. Phan, J. Zhou, and J. Lee, “Who, what, when, and
Engineering and Information Technologies, and
where: Multi-dimensional collaborative recommendations using
the Inaugural Director of the UBTECH Sydney
tensor factorization on sparse user-generated data,” in Proceedings
Artificial Intelligence Centre, at the University of
of the International Conference on World Wide Web, 2015.
Sydney. He mainly applies statistics and mathe-
[32] W. Pan, N. N. Liu, E. W. Xiang, and Q. Yang, “Transfer learning to
matics to Artificial Intelligence and Data Science.
predict missing ratings via heterogeneous user feedbacks,” in Pro-
His research interests spread across computer
ceedings of the International Joint Conference on Artificial Intelligence,
vision, data science, image processing, machine
2011.
learning, and video surveillance. His research
[33] Y. Shi, M. Larson, and A. Hanjalic, “Mining contextual movie
results have expounded in one monograph and over 500+ publications
similarity with matrix factorization for context-aware recommen-
at prestigious journals and prominent conferences, such as IEEE T-
dation,” ACM Trans. Intell. Syst. Technol., 2013.
PAMI, T-NNLS, T-IP, JMLR, IJCV, NIPS, ICML, CVPR, ICCV, ECCV,
[34] J. Yoo and S. Choi, “Weighted nonnegative matrix co-tri- ICDM; and ACM SIGKDD, with several best paper awards, such as the
factorization for collaborative prediction,” in Proceedings of the best theory/algorithm paper runner up award in IEEE ICDM’07, the best
Asian Conference on Machine Learning: Advances in Machine Learning, student paper award in IEEE ICDM’13, the distinguished student paper
2009. award in the 2017 IJCAI, the 2014 ICDM 10-year highest-impact paper
[35] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix award, and the 2017 IEEE Signal Processing Society Best Paper Award.
factorization,” in Advances in Neural Information Processing Systems He received the 2015 Australian Scopus-Eureka Prize, the 2015 ACS
13, 2001. Gold Disruptor Award, and the 2015 UTS Vice Chancellor’s Medal for
[36] B. Li, Q. Yang, and X. Xue, “Transfer learning for collaborative Exceptional Research. He is a Fellow of the IEEE, AAAS, OSA, IAPR,
filtering via a rating-matrix generative model,” in Proceedings of and SPIE.
the International Conference on Machine Learning, 2009.
1041-4347 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.