Incremental diversification for very large sets

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR '11, 2011

Abstract

ABSTRACT Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user&#39;s query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.

Wolf Siberski hasn't uploaded this paper.

Let Wolf know you want this paper to be uploaded.

Ask for this paper to be uploaded.

Log In

Incremental diversification for very large sets

Related topics