Lecture notes in business information processing, 2018
Volume and veracity of Resource Description Framework (RDF) data in the web are two main issues i... more Volume and veracity of Resource Description Framework (RDF) data in the web are two main issues in managing information. Due to the diversity of RDF data, several researchers enriched the basic RDF data model with trust information to rate the trustworthiness of the collected data.
In the last decade, skyline queries have attracted the interest of several researchers in the dat... more In the last decade, skyline queries have attracted the interest of several researchers in the database field due to their ability to retrieve interesting objects among a large set of objects. Skyline analysis is a powerful tool in a wide spectrum of real applications including multi-criteria optimal decision making, preference answering and many applications where uncertain, imprecise and noisy data inherently exist. As large amounts of distributed data over Internet are communicated and shared, an important problem is to retrieve the global skyline from all the distributed local sites. In addition, though the skyline queries can control selection, there exist not much works that can handle skyline queries under database updates. In this paper, based on the marginal points notions, we introduce new methods to efficiently compute the global skyline from distributed local sites and over frequently updated databases. The efficiency and effectiveness of our proposal are verified by extensive experimental results.
When querying Knowledge Bases (KBs), users are faced with large sets of data, often without knowi... more When querying Knowledge Bases (KBs), users are faced with large sets of data, often without knowing their underlying structures. It follows that users may make mistakes when formulating their queries, therefore receiving an unhelpful response. In this paper, we address the plethoric answers problem, the situation where a query produces significantly more results than the user was expecting. We deal with this problem by identifying the parts of the failing query, called Minimal Failure Inducing Subqueries (MFIS), that cause plethoric answers. As long as the query contains an MFIS, it will fail to reach a sufficiently low amount of answers. Thanks to these MFIS, interactive and automatic approaches can be set up to help the user reformulate their query. The dual notion of MFIS, maXimal Succeeding Subqueries (XSS), is also useful. They are queries with the most parts of the original query that return non plethoric answers. Our goal is to compute MFIS and XSS efficiently, so that they may be used to solve the plethoric answers problem. We propose two algorithms that leverage query and data properties to compute MFIS and XSS. We show experimentally that our algorithms clearly outperform a baseline method on generated queries as well as real user-submitted queries.
Uncertain, imprecise and noisy data arise in a number of domains including sensor networks and da... more Uncertain, imprecise and noisy data arise in a number of domains including sensor networks and data integration. Skyline analysis is a powerful tool in a wide spectrum of real applications involving multi-criteria optimal decision making. The skyline operator aims at returning the most interesting objects in a database. Previous researches showed that the skyline size over uncertain data is too large to be exploited. In this paper, we propose an advanced skyline analysis over uncertain databases where uncertainty is modelled by the evidence theory. We particularly tackle the following two important issues: (1) model the skyline query over an evidential database (2) rank the evidential skyline result and retrieve k skyline objects that are expected to have the highest score with considering the confidence level of the objects. We also study its impact on the top-k result. The efficiency and effectiveness of our proposal are verified by extensive experimental results.
Due to the exploding number of information stored and shared over Internet, and the introduction ... more Due to the exploding number of information stored and shared over Internet, and the introduction of new technologies to capture and transit data, managing imperfect data is an important issue in many applications. An important tool for reasoning with imperfect data is the evidence theory, which is a generalization of the Bayesian inference. We call databases whose data imperfection are processed thanks to the evidence theory, the evidential databases. In this paper, we design the evidential database meta-model using an Oriented-Object modelling language (UML) and we implement it using an Object-Relational database. Although the implementation is not native, it showed an acceptable scalability.
International Journal of Intelligent Systems, Dec 7, 2010
This paper introduces a new type of database queries involving preferences. The idea is to consid... more This paper introduces a new type of database queries involving preferences. The idea is to consider competitive conditional preference clauses structured as a tree, of the type "preferably P 1 or • • • or P n ; if P 1 then preferably P 1,1 or. . .; if P 2 then preferably P 2,1 or. .. ," where the P i s are not exclusive (thus the notion of competition). The paper defines two possible interpretations of such queries and outlines two evaluation techniques which follow from them.
Uncertain relations between temporal points are represented by means of possibility distributions... more Uncertain relations between temporal points are represented by means of possibility distributions over the three basic relations "smaller than", "equal to", and "greater than". Operations for computing inverse relations, for composing relations, for combining relations coming from different sources and pertaining to the same temporal points, or for representing negative information, are defined. An illustrative example of representing and reasoning with uncertain temporal relations is given. This paper shows how possibilistic temporal uncertainty can be handled in the setting of point algebra. Moreover, the paper emphasizes the advantages of the possibilistic approach over a probabilistic approach previously proposed. This work does for the temporal point algebra what the authors previously did for the temporal interval algebra.
Recent years have witnessed the development of large Knowledge Bases (KBs). Due to the lack of in... more Recent years have witnessed the development of large Knowledge Bases (KBs). Due to the lack of information about the content and schema semantics of KBs, users are often not able to correctly formulate KB queries that return the intended result. In this paper, we consider the problem of failing RDF queries, i.e. queries that return an empty set of answers. Query relaxation is one cooperative technique proposed to solve this problem. In the context of RDF data, several works proposed query relaxation operators and ranking models for relaxed queries. But none of them tried to find the causes of an RDF query failure given by Minimal Failing Subqueries (MFSs) as well as successful queries that have a maximal number of triple patterns named Maximal Succeeding Subqueries (XSSs). Inspired by previous work in the context of relational databases and recommender systems, we propose two complementary approaches to fill this gap. The Lattice-Based Approach (LBA) leverages the theoretical properties of MFSs and XSSs to efficiently explore the subquery lattice of the failing query. The Matrix-Based Approach (MBA) computes a matrix that records alternative answers to the failing query with the triple patterns they satisfy. The skyline of this matrix directly gives the XSSs of the failing query. This matrix can also be used as an index to improve the performance of LBA. The practical interest of these two approaches are shown via a set of experiments conducted on the LUBM benchmark and a comparative study with baseline and related work algorithms.
HAL (Le Centre pour la Communication Scientifique Directe), Oct 24, 2011
International audienceAbstract. Current approaches for service discovery are based on semantic kn... more International audienceAbstract. Current approaches for service discovery are based on semantic knowledge, such as ontologies and service behavior (described as a process model). However, these approaches still remain with a high selectivity rate, resulting in a large number of services offering similar functionalities and behavior. One way to improve the selectivity rate is to cope with user preferences defined on quality attributes. In this paper, we propose a novel approach for service retrieval that takes into account the service process model and relies both on preference satisfiability and structural similarity. User query and target process models are represented as annotated graphs, where user preferences on QoS attributes are modelled by means of fuzzy sets. A flexible evaluation strategy based on fuzzy linguistic quantifiers is introduced. Finally, different ranking methods are discussed
When querying Knowledge Bases (KBs), users are faced with large sets of data, often without knowi... more When querying Knowledge Bases (KBs), users are faced with large sets of data, often without knowing their underlying structures. It follows that users may make mistakes when formulating their queries, therefore receiving an unhelpful response. In this paper, we address the plethoric answers problem, the situation where the user query produces significantly more results than the user was expecting. The common approach to solving this problem, i.e. the top-K approach, reduces the query's result size by applying various criteria to select only some answers. This selection is performed without considering the causes producing plethoric answers, and can therefore miss an underlying issue within the query. We deal with this problem by proposing an approach that identifies the parts of the failing query, called Minimal Failure Inducing Subqueries (MFIS), that cause plethoric answers. As long as the query contains an MFIS, it will fail to reach a sufficiently low amount of answers. Thus, thanks to these MFIS, interactive and automatic approaches can be set up to help the user in reformulating their query. The dual notion of MFIS, called Maximal Succeeding Subqueries (XSS), is also useful. They provide queries with a maximal number of parts of the original query that return non plethoric answers. Our goal is to compute MFIS and XSS efficiently, so that they may be used to solve the plethoric answers problem. We
In many applications domains, time modeling has a fundamental role. Allen temporal relations are ... more In many applications domains, time modeling has a fundamental role. Allen temporal relations are one of the most used and known formalisms for modeling and handling temporal data. However, classical Allen relations deal only with crisp time information, but time is often subjective and fuzzy. This paper discusses a disjunctive view of temporal relations between fuzzy time intervals. This approach is mostly based on an extension of Allen relations allowing to compare two fuzzy time intervals. By leveraging some particular fuzzy comparison indices, this extension is implemented in the Fuzz - TIME tool developed in our previous works.
Queries posed by a user over a database do not always return the desired responses. It may someti... more Queries posed by a user over a database do not always return the desired responses. It may sometimes result an empty set of answers especially when data are pervaded with uncertainty and imprecision. Thus, to address this problem, we propose an approach for relaxing a failing query in the context of evidential databases. The uncertainty in such databases is expressed within the belief function theory. The key idea of our approach is to use a machine learning method more precisely the belief K-modes clustering technique to relax the failing queries by modifying the constraints in order to provide successful alternatives which may be of interest to the user.
Modern enterprises are increasingly moving towards a service oriented architecture for data shari... more Modern enterprises are increasingly moving towards a service oriented architecture for data sharing by putting their data sources behind services, thereby providing an interoperable way to interact with their data. This class of services is known as DaaS ( Data-as-a-Service ) services. DaaS Composition is a powerful solution to answer the user's complex queries by combining primitive DaaS services. User preferences are a key aspect that must be considered in the service composition process. A more general and suitable approach to model preferences is based on fuzzy sets theory [3]. Fuzzy sets are very well suited to the interpretation of linguistic terms and constitute a convenient way for a user to express her/his preferences. For example, when expressing preferences about the price of a car, users often employ fuzzy terms like rather cheap, affordable , etc. However as DaaS services proliferate, a large number of candidate compositions that would use different (most likely competing) services may be used to answer the same query. Hence, it is important to set up an effective service composition framework that would identify and retrieve the most relevant services and return the top- k compositions according to the user preferences.
HAL (Le Centre pour la Communication Scientifique Directe), Nov 30, 2019
International audienceNowadays, the processes of selecting web services which give the same funct... more International audienceNowadays, the processes of selecting web services which give the same functionality with different quality of service (QoS) become an important issue. To deal with the large number of Web services candidates, K-representative Skyline is appeared as a Skyline variant to find the short list of the most relevant Web services that represent a summary about the full skyline result. However, it returns generally a conflicting result. The AHP (Analytical Hierarchic Processes) method and its variants as Fuzzy AHP are widely used in ranking incomparable alternatives. However, it requires a huge number of inputs for users to fulfill a multiple comparison matrix, which make it difficult to use in practice notably in Web services selection field. In this work, we propose an improved Fuzzy AHP called IFAHP which allows to: i) elicit the QoS importance level using linguistic terms based on natural language, asking fewer efforts to users, ii) group the QoS attributes according to their importance level, iii) reduce the number of inputs and generate automatically all pair-wise matrix with respect to each attribute, using a discretization algorithm. The experimental evaluation conducted on real world dataset illustrates the feasibility and the effectiveness of our approach
HAL (Le Centre pour la Communication Scientifique Directe), Oct 27, 2020
Les utilisateurs d'une base de connaissances sont confrontés à un grand volume de données dont il... more Les utilisateurs d'une base de connaissances sont confrontés à un grand volume de données dont ils peuvent ignorer la structure sous-jacente. Ainsi, ils peuvent commettre des erreurs dans la formulation de leurs requêtes et obtenir des réponses non satisfaisantes. Nous nous intéressons ici au cas particulier du problème des réponses pléthoriques, où une requête produit beaucoup plus de résultats que n'attendait l'utilisateur. L'approche la plus connue pour traiter ce problème, la méthode dite top-K, consiste à classer les résultats pour ne retourner que les meilleures réponses. Cependant, si la requête comporte de mauvaises préconceptions, cette stratégie ne règle pas la source du problème et donc ne constitue pas une solution satisfaisante. Nous proposons donc une nouvelle méthode coopérative, permettant aux utilisateurs de comprendre l'origine des réponses pléthoriques de leur requête. Pour cela nous fournissons deux informations : (i) les parties minimales de la requête entraînant des réponses pléthoriques, et (ii) les parties maximales de la requête dont les réponses ne sont pas pléthoriques. Pour identifier ces deux informations, nous proposons deux algorithmes et montrons leur efficacité par rapport à une méthode naïve en utilisant des données synthétiques et réelles.
Lecture notes in business information processing, 2018
Volume and veracity of Resource Description Framework (RDF) data in the web are two main issues i... more Volume and veracity of Resource Description Framework (RDF) data in the web are two main issues in managing information. Due to the diversity of RDF data, several researchers enriched the basic RDF data model with trust information to rate the trustworthiness of the collected data.
In the last decade, skyline queries have attracted the interest of several researchers in the dat... more In the last decade, skyline queries have attracted the interest of several researchers in the database field due to their ability to retrieve interesting objects among a large set of objects. Skyline analysis is a powerful tool in a wide spectrum of real applications including multi-criteria optimal decision making, preference answering and many applications where uncertain, imprecise and noisy data inherently exist. As large amounts of distributed data over Internet are communicated and shared, an important problem is to retrieve the global skyline from all the distributed local sites. In addition, though the skyline queries can control selection, there exist not much works that can handle skyline queries under database updates. In this paper, based on the marginal points notions, we introduce new methods to efficiently compute the global skyline from distributed local sites and over frequently updated databases. The efficiency and effectiveness of our proposal are verified by extensive experimental results.
When querying Knowledge Bases (KBs), users are faced with large sets of data, often without knowi... more When querying Knowledge Bases (KBs), users are faced with large sets of data, often without knowing their underlying structures. It follows that users may make mistakes when formulating their queries, therefore receiving an unhelpful response. In this paper, we address the plethoric answers problem, the situation where a query produces significantly more results than the user was expecting. We deal with this problem by identifying the parts of the failing query, called Minimal Failure Inducing Subqueries (MFIS), that cause plethoric answers. As long as the query contains an MFIS, it will fail to reach a sufficiently low amount of answers. Thanks to these MFIS, interactive and automatic approaches can be set up to help the user reformulate their query. The dual notion of MFIS, maXimal Succeeding Subqueries (XSS), is also useful. They are queries with the most parts of the original query that return non plethoric answers. Our goal is to compute MFIS and XSS efficiently, so that they may be used to solve the plethoric answers problem. We propose two algorithms that leverage query and data properties to compute MFIS and XSS. We show experimentally that our algorithms clearly outperform a baseline method on generated queries as well as real user-submitted queries.
Uncertain, imprecise and noisy data arise in a number of domains including sensor networks and da... more Uncertain, imprecise and noisy data arise in a number of domains including sensor networks and data integration. Skyline analysis is a powerful tool in a wide spectrum of real applications involving multi-criteria optimal decision making. The skyline operator aims at returning the most interesting objects in a database. Previous researches showed that the skyline size over uncertain data is too large to be exploited. In this paper, we propose an advanced skyline analysis over uncertain databases where uncertainty is modelled by the evidence theory. We particularly tackle the following two important issues: (1) model the skyline query over an evidential database (2) rank the evidential skyline result and retrieve k skyline objects that are expected to have the highest score with considering the confidence level of the objects. We also study its impact on the top-k result. The efficiency and effectiveness of our proposal are verified by extensive experimental results.
Due to the exploding number of information stored and shared over Internet, and the introduction ... more Due to the exploding number of information stored and shared over Internet, and the introduction of new technologies to capture and transit data, managing imperfect data is an important issue in many applications. An important tool for reasoning with imperfect data is the evidence theory, which is a generalization of the Bayesian inference. We call databases whose data imperfection are processed thanks to the evidence theory, the evidential databases. In this paper, we design the evidential database meta-model using an Oriented-Object modelling language (UML) and we implement it using an Object-Relational database. Although the implementation is not native, it showed an acceptable scalability.
International Journal of Intelligent Systems, Dec 7, 2010
This paper introduces a new type of database queries involving preferences. The idea is to consid... more This paper introduces a new type of database queries involving preferences. The idea is to consider competitive conditional preference clauses structured as a tree, of the type "preferably P 1 or • • • or P n ; if P 1 then preferably P 1,1 or. . .; if P 2 then preferably P 2,1 or. .. ," where the P i s are not exclusive (thus the notion of competition). The paper defines two possible interpretations of such queries and outlines two evaluation techniques which follow from them.
Uncertain relations between temporal points are represented by means of possibility distributions... more Uncertain relations between temporal points are represented by means of possibility distributions over the three basic relations "smaller than", "equal to", and "greater than". Operations for computing inverse relations, for composing relations, for combining relations coming from different sources and pertaining to the same temporal points, or for representing negative information, are defined. An illustrative example of representing and reasoning with uncertain temporal relations is given. This paper shows how possibilistic temporal uncertainty can be handled in the setting of point algebra. Moreover, the paper emphasizes the advantages of the possibilistic approach over a probabilistic approach previously proposed. This work does for the temporal point algebra what the authors previously did for the temporal interval algebra.
Recent years have witnessed the development of large Knowledge Bases (KBs). Due to the lack of in... more Recent years have witnessed the development of large Knowledge Bases (KBs). Due to the lack of information about the content and schema semantics of KBs, users are often not able to correctly formulate KB queries that return the intended result. In this paper, we consider the problem of failing RDF queries, i.e. queries that return an empty set of answers. Query relaxation is one cooperative technique proposed to solve this problem. In the context of RDF data, several works proposed query relaxation operators and ranking models for relaxed queries. But none of them tried to find the causes of an RDF query failure given by Minimal Failing Subqueries (MFSs) as well as successful queries that have a maximal number of triple patterns named Maximal Succeeding Subqueries (XSSs). Inspired by previous work in the context of relational databases and recommender systems, we propose two complementary approaches to fill this gap. The Lattice-Based Approach (LBA) leverages the theoretical properties of MFSs and XSSs to efficiently explore the subquery lattice of the failing query. The Matrix-Based Approach (MBA) computes a matrix that records alternative answers to the failing query with the triple patterns they satisfy. The skyline of this matrix directly gives the XSSs of the failing query. This matrix can also be used as an index to improve the performance of LBA. The practical interest of these two approaches are shown via a set of experiments conducted on the LUBM benchmark and a comparative study with baseline and related work algorithms.
HAL (Le Centre pour la Communication Scientifique Directe), Oct 24, 2011
International audienceAbstract. Current approaches for service discovery are based on semantic kn... more International audienceAbstract. Current approaches for service discovery are based on semantic knowledge, such as ontologies and service behavior (described as a process model). However, these approaches still remain with a high selectivity rate, resulting in a large number of services offering similar functionalities and behavior. One way to improve the selectivity rate is to cope with user preferences defined on quality attributes. In this paper, we propose a novel approach for service retrieval that takes into account the service process model and relies both on preference satisfiability and structural similarity. User query and target process models are represented as annotated graphs, where user preferences on QoS attributes are modelled by means of fuzzy sets. A flexible evaluation strategy based on fuzzy linguistic quantifiers is introduced. Finally, different ranking methods are discussed
When querying Knowledge Bases (KBs), users are faced with large sets of data, often without knowi... more When querying Knowledge Bases (KBs), users are faced with large sets of data, often without knowing their underlying structures. It follows that users may make mistakes when formulating their queries, therefore receiving an unhelpful response. In this paper, we address the plethoric answers problem, the situation where the user query produces significantly more results than the user was expecting. The common approach to solving this problem, i.e. the top-K approach, reduces the query's result size by applying various criteria to select only some answers. This selection is performed without considering the causes producing plethoric answers, and can therefore miss an underlying issue within the query. We deal with this problem by proposing an approach that identifies the parts of the failing query, called Minimal Failure Inducing Subqueries (MFIS), that cause plethoric answers. As long as the query contains an MFIS, it will fail to reach a sufficiently low amount of answers. Thus, thanks to these MFIS, interactive and automatic approaches can be set up to help the user in reformulating their query. The dual notion of MFIS, called Maximal Succeeding Subqueries (XSS), is also useful. They provide queries with a maximal number of parts of the original query that return non plethoric answers. Our goal is to compute MFIS and XSS efficiently, so that they may be used to solve the plethoric answers problem. We
In many applications domains, time modeling has a fundamental role. Allen temporal relations are ... more In many applications domains, time modeling has a fundamental role. Allen temporal relations are one of the most used and known formalisms for modeling and handling temporal data. However, classical Allen relations deal only with crisp time information, but time is often subjective and fuzzy. This paper discusses a disjunctive view of temporal relations between fuzzy time intervals. This approach is mostly based on an extension of Allen relations allowing to compare two fuzzy time intervals. By leveraging some particular fuzzy comparison indices, this extension is implemented in the Fuzz - TIME tool developed in our previous works.
Queries posed by a user over a database do not always return the desired responses. It may someti... more Queries posed by a user over a database do not always return the desired responses. It may sometimes result an empty set of answers especially when data are pervaded with uncertainty and imprecision. Thus, to address this problem, we propose an approach for relaxing a failing query in the context of evidential databases. The uncertainty in such databases is expressed within the belief function theory. The key idea of our approach is to use a machine learning method more precisely the belief K-modes clustering technique to relax the failing queries by modifying the constraints in order to provide successful alternatives which may be of interest to the user.
Modern enterprises are increasingly moving towards a service oriented architecture for data shari... more Modern enterprises are increasingly moving towards a service oriented architecture for data sharing by putting their data sources behind services, thereby providing an interoperable way to interact with their data. This class of services is known as DaaS ( Data-as-a-Service ) services. DaaS Composition is a powerful solution to answer the user's complex queries by combining primitive DaaS services. User preferences are a key aspect that must be considered in the service composition process. A more general and suitable approach to model preferences is based on fuzzy sets theory [3]. Fuzzy sets are very well suited to the interpretation of linguistic terms and constitute a convenient way for a user to express her/his preferences. For example, when expressing preferences about the price of a car, users often employ fuzzy terms like rather cheap, affordable , etc. However as DaaS services proliferate, a large number of candidate compositions that would use different (most likely competing) services may be used to answer the same query. Hence, it is important to set up an effective service composition framework that would identify and retrieve the most relevant services and return the top- k compositions according to the user preferences.
HAL (Le Centre pour la Communication Scientifique Directe), Nov 30, 2019
International audienceNowadays, the processes of selecting web services which give the same funct... more International audienceNowadays, the processes of selecting web services which give the same functionality with different quality of service (QoS) become an important issue. To deal with the large number of Web services candidates, K-representative Skyline is appeared as a Skyline variant to find the short list of the most relevant Web services that represent a summary about the full skyline result. However, it returns generally a conflicting result. The AHP (Analytical Hierarchic Processes) method and its variants as Fuzzy AHP are widely used in ranking incomparable alternatives. However, it requires a huge number of inputs for users to fulfill a multiple comparison matrix, which make it difficult to use in practice notably in Web services selection field. In this work, we propose an improved Fuzzy AHP called IFAHP which allows to: i) elicit the QoS importance level using linguistic terms based on natural language, asking fewer efforts to users, ii) group the QoS attributes according to their importance level, iii) reduce the number of inputs and generate automatically all pair-wise matrix with respect to each attribute, using a discretization algorithm. The experimental evaluation conducted on real world dataset illustrates the feasibility and the effectiveness of our approach
HAL (Le Centre pour la Communication Scientifique Directe), Oct 27, 2020
Les utilisateurs d'une base de connaissances sont confrontés à un grand volume de données dont il... more Les utilisateurs d'une base de connaissances sont confrontés à un grand volume de données dont ils peuvent ignorer la structure sous-jacente. Ainsi, ils peuvent commettre des erreurs dans la formulation de leurs requêtes et obtenir des réponses non satisfaisantes. Nous nous intéressons ici au cas particulier du problème des réponses pléthoriques, où une requête produit beaucoup plus de résultats que n'attendait l'utilisateur. L'approche la plus connue pour traiter ce problème, la méthode dite top-K, consiste à classer les résultats pour ne retourner que les meilleures réponses. Cependant, si la requête comporte de mauvaises préconceptions, cette stratégie ne règle pas la source du problème et donc ne constitue pas une solution satisfaisante. Nous proposons donc une nouvelle méthode coopérative, permettant aux utilisateurs de comprendre l'origine des réponses pléthoriques de leur requête. Pour cela nous fournissons deux informations : (i) les parties minimales de la requête entraînant des réponses pléthoriques, et (ii) les parties maximales de la requête dont les réponses ne sont pas pléthoriques. Pour identifier ces deux informations, nous proposons deux algorithmes et montrons leur efficacité par rapport à une méthode naïve en utilisant des données synthétiques et réelles.
Uploads
Papers by Allel Hadjali