A data warehouse stores materialized views derived from one or more sources for the purpose of e ... more A data warehouse stores materialized views derived from one or more sources for the purpose of e ciently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and or the cost of maintaining the selected views, given a limited amount of resource such as materialization time, storage space, or total view maintenance time.
A data warehouse stores materialized views derived from one or more sources for the purpose of e ... more A data warehouse stores materialized views derived from one or more sources for the purpose of e ciently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and or the cost of maintaining the selected views, given a limited amount of resource such as materialization time, storage space, or total view maintenance time.
We carry out a longitudinal study of evolution of small-time scaling behavior of Internet traffic... more We carry out a longitudinal study of evolution of small-time scaling behavior of Internet traffic on the MAWI dataset spanning 8 years. MAWI dataset contains a number of anomalies which interfere with the correct identification of scaling behavior, and hence to mitigate these effects, we use a sketch-based procedure for robust estimation of scaling exponent. We first show the importance of robust estimation procedure while studying small-time scaling behavior of Internet traffic. We further study the evolution of the following properties concerning the origins of small-time scaling behavior: (1) Scaling at IP level is independent of flow arrivals and (2) Dense flows are primary correlationcausing factor in small time scales. Traditionally these properties have been shown to hold by using a semi-experiments based methodology. We next show that due to network anomalies, semi-experiments can result in misleading inferences. Hence we propose and motivate the use of "robust semi-experiments" i.e., a semi-experiment coupled with the use of a robust estimation procedure for inferring scaling behavior. By making use of robust semi-experiments we find the above properties to be invariant across the entire MAWI dataset. Our other results consist in showing that dense flows form a larger fraction of aggregate traffic for recent traces and hence recent traces show larger short range correlations vis-a-vis earlier traces.
Abstract—A data warehouse stores materialized views of data from one or more sources, with the pu... more Abstract—A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be ...
Sensor and Ad Hoc Communications and Networks, 2004
One of the useful approaches to exploit redundancy in a sensor network is to keep active only a s... more One of the useful approaches to exploit redundancy in a sensor network is to keep active only a small subset of sensors that are sufficient to cover the region required to be monitored. The set of active sensors should also form a connected communication graph, so that they can autonomously respond to application queries and/or tasks. Such a set of active sensor is known as a connected sensor cover, and the problem of selecting a minimum connected sensor cover has been well studied when the transmission radius and sensing radius of each sensor is fixed. In this article, we address the problem of selecting a minimum energy-cost connected sensor cover, when each sensor node can vary its sensing and transmission radius; larger sensing or transmission radius entails higher energy cost. For the above problem, we design various centralized and distributed algorithms, and compare their performance through extensive experiments. One of the designed centralized algorithms (called CGA) is shown to perform within an O(log n) factor of the optimal solution, where n is the size of the network. We have also designed a localized algorithm based on Voronoi diagrams which is empirically shown to perform very close to CGA, and due to its communicationefficiency results in significantly prolonging the sensor network lifetime.
Wireless sensors rely on battery power, and in many applications it is difficult or prohibitive t... more Wireless sensors rely on battery power, and in many applications it is difficult or prohibitive to replace them. Hence, in order to prolongate the system's lifetime, some sensors can be kept inactive while others perform all the tasks. In this paper, we study the k-coverage problem of activating the minimum number of sensors to ensure that every point in the area is covered by at least k sensors. This ensures higher fault tolerance, robustness, and improves many operations, among which position detection and intrusion detection. The k-coverage problem is trivially NP-complete, and hence we can only provide approximation algorithms. In this paper, we present an algorithm based on an extension of the classical ε-net technique. This method gives an O(log M)approximation, where M is the number of sensors in an optimal solution. We do not make any particular assumption on the shape of the areas covered by each sensor, besides that they must be closed, connected, and without holes.
Abstract. Data mining is an attempt to automatically extract useful information from passive data... more Abstract. Data mining is an attempt to automatically extract useful information from passive data using various artificial intelligence techniques [2]. Conventional database systems offer little support for data mining applications. At the same time, statistical and machine learning ...
Data gathering is a very important functionality in sensor networks. Most of current data gatheri... more Data gathering is a very important functionality in sensor networks. Most of current data gathering researches have been emphasized on issues such as energy efficiency and network lifetime maximization; and the technique of data aggregation is usually used to reduce the number of radio transmissions. However, there are many emerging sensor network applications with different requirements and constraints. Rather, they are time critical, i.e., delivering sensed information of each individual sensor node back to a central base station quickly becomes most important. In this paper, we consider collision-free delay efficient data gathering problem in sensor networks, assuming that no data aggregation happens in intermediate nodes. We formally formulate this problem and propose optimal and near-optimal algorithms for different topologies. Particularly, in general topology, we present two approximation algorithms with performance ratio of 2 and 1+1/(k+1), respectively.
2009 IEEE 25th International Conference on Data Engineering, 2009
Developing powerful paradigms for programming sensor networks is critical to realize the full pot... more Developing powerful paradigms for programming sensor networks is critical to realize the full potential of sensor networks as collaborative data processing engines. In this article, we motivate and develop a deductive framework for programming sensor networks, extending the prior vision of viewing sensor network as a distributed database. The deductive programming approach is declarative, very expressive, and amenable to automatic optimizations. Such a framework allows users to program sensor network applications at a high-level without worrying about the low-level tedious details. Our system translates a given deductive program to efficient distributed code that runs on individual nodes. To facilitate the above translation, we develop techniques for distributed and asynchronous evaluation of deductive programs in sensor networks. Our techniques generalize to recursive programs without negations, arbitrary nonrecursive programs with negations, and in general to arbitrary "locally non-recursive" programs with function symbols. We present performance results on TOSSIM, a network simulator, and a small network testbed.
IEEE Transactions on Knowledge and Data Engineering, 2005
Abstract—A data warehouse stores materialized views of data from one or more sources, with the pu... more Abstract—A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be ...
A d ata w arehouse collects and i n tegrates data from multiple, autonomous, heterogeneous, sourc... more A d ata w arehouse collects and i n tegrates data from multiple, autonomous, heterogeneous, sources. The w arehouse e ectively maintains one or more materialized views over the source data. In this paper we d escribe the architecture of the Whips prototype system, which collects, transforms, and integrates data for the w arehouse. We s h ow h ow t he required functionality can be divided among c o o perating distributed CORBA objects, providing both scalability a n d t he exibility n eeded for supporting di erent a p plication needs and h eterogeneous sources. The Whips prototy p e i s a f u nctioning system implemented at S t anford University a n d w e provide preliminary performance results.
2006 IEEE International Conference on Communications, 2006
Abstract— Directional antennas in wireless mesh networks can improve spatial reuse. However, usin... more Abstract— Directional antennas in wireless mesh networks can improve spatial reuse. However, using them effectively needs specialized protocol support at the MAC layer, which is always not practical. In this work, we present a topology control approach to effectively ...
2006 IEEE International Conference on Communications, 2006
In recent years, with the advent of wireless technology and file sharing applications, the tradit... more In recent years, with the advent of wireless technology and file sharing applications, the traditional client-server model has begun to lose its prominence. Instead, information sharing by spontaneously connected nodes has emerged as a new framework. In such networks, all network nodes ...
A sensor network is a wireless ad hoc network of resourceconstrained sensor nodes. In this articl... more A sensor network is a wireless ad hoc network of resourceconstrained sensor nodes. In this article, we address the problem of communication-efficient implementation of the SQL "join" operator in sensor networks. We design an optimal join-implementation algorithm that provably incurs minimum communication cost under certain reasonable assumptions. In addition, we design a much faster suboptimal heuristic that empirically delivers a near-optimal solution. We evaluate the performance of our designed algorithms through extensive simulations.
In this paper, we address an optimization problem that arises in the context of cache placement i... more In this paper, we address an optimization problem that arises in the context of cache placement in sensor networks. In particular, we consider the cache placement problem where the goal is to determine a set of nodes in the network to cache/store the given data item, such that ...
Views stored in a data warehouse need to be kept current. As recomputing the views is very expens... more Views stored in a data warehouse need to be kept current. As recomputing the views is very expensive, incremental maintenance algorithms are required. Over recent years, several incremental maintenance algorithms have been proposed. None of the proposed algorithms handle the general case of relational expressions involving aggregate and outerjoin operators efficiently.
International Conference on Network Protocols, 2007
Radio frequency~i dentification (RFIDW is a technology where a reader device can "sense" the pres... more Radio frequency~i dentification (RFIDW is a technology where a reader device can "sense" the presence of a closeby object by reading a tag device attached to the object. To improve coverage, multiple RFID readers can be deployed in the given region. In this paper, we consider the problem of slotted scheduled access of RFID tags in a multiple reader environment.
A data warehouse stores materialized views derived from one or more sources for the purpose of e ... more A data warehouse stores materialized views derived from one or more sources for the purpose of e ciently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and or the cost of maintaining the selected views, given a limited amount of resource such as materialization time, storage space, or total view maintenance time.
A data warehouse stores materialized views derived from one or more sources for the purpose of e ... more A data warehouse stores materialized views derived from one or more sources for the purpose of e ciently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and or the cost of maintaining the selected views, given a limited amount of resource such as materialization time, storage space, or total view maintenance time.
We carry out a longitudinal study of evolution of small-time scaling behavior of Internet traffic... more We carry out a longitudinal study of evolution of small-time scaling behavior of Internet traffic on the MAWI dataset spanning 8 years. MAWI dataset contains a number of anomalies which interfere with the correct identification of scaling behavior, and hence to mitigate these effects, we use a sketch-based procedure for robust estimation of scaling exponent. We first show the importance of robust estimation procedure while studying small-time scaling behavior of Internet traffic. We further study the evolution of the following properties concerning the origins of small-time scaling behavior: (1) Scaling at IP level is independent of flow arrivals and (2) Dense flows are primary correlationcausing factor in small time scales. Traditionally these properties have been shown to hold by using a semi-experiments based methodology. We next show that due to network anomalies, semi-experiments can result in misleading inferences. Hence we propose and motivate the use of "robust semi-experiments" i.e., a semi-experiment coupled with the use of a robust estimation procedure for inferring scaling behavior. By making use of robust semi-experiments we find the above properties to be invariant across the entire MAWI dataset. Our other results consist in showing that dense flows form a larger fraction of aggregate traffic for recent traces and hence recent traces show larger short range correlations vis-a-vis earlier traces.
Abstract—A data warehouse stores materialized views of data from one or more sources, with the pu... more Abstract—A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be ...
Sensor and Ad Hoc Communications and Networks, 2004
One of the useful approaches to exploit redundancy in a sensor network is to keep active only a s... more One of the useful approaches to exploit redundancy in a sensor network is to keep active only a small subset of sensors that are sufficient to cover the region required to be monitored. The set of active sensors should also form a connected communication graph, so that they can autonomously respond to application queries and/or tasks. Such a set of active sensor is known as a connected sensor cover, and the problem of selecting a minimum connected sensor cover has been well studied when the transmission radius and sensing radius of each sensor is fixed. In this article, we address the problem of selecting a minimum energy-cost connected sensor cover, when each sensor node can vary its sensing and transmission radius; larger sensing or transmission radius entails higher energy cost. For the above problem, we design various centralized and distributed algorithms, and compare their performance through extensive experiments. One of the designed centralized algorithms (called CGA) is shown to perform within an O(log n) factor of the optimal solution, where n is the size of the network. We have also designed a localized algorithm based on Voronoi diagrams which is empirically shown to perform very close to CGA, and due to its communicationefficiency results in significantly prolonging the sensor network lifetime.
Wireless sensors rely on battery power, and in many applications it is difficult or prohibitive t... more Wireless sensors rely on battery power, and in many applications it is difficult or prohibitive to replace them. Hence, in order to prolongate the system's lifetime, some sensors can be kept inactive while others perform all the tasks. In this paper, we study the k-coverage problem of activating the minimum number of sensors to ensure that every point in the area is covered by at least k sensors. This ensures higher fault tolerance, robustness, and improves many operations, among which position detection and intrusion detection. The k-coverage problem is trivially NP-complete, and hence we can only provide approximation algorithms. In this paper, we present an algorithm based on an extension of the classical ε-net technique. This method gives an O(log M)approximation, where M is the number of sensors in an optimal solution. We do not make any particular assumption on the shape of the areas covered by each sensor, besides that they must be closed, connected, and without holes.
Abstract. Data mining is an attempt to automatically extract useful information from passive data... more Abstract. Data mining is an attempt to automatically extract useful information from passive data using various artificial intelligence techniques [2]. Conventional database systems offer little support for data mining applications. At the same time, statistical and machine learning ...
Data gathering is a very important functionality in sensor networks. Most of current data gatheri... more Data gathering is a very important functionality in sensor networks. Most of current data gathering researches have been emphasized on issues such as energy efficiency and network lifetime maximization; and the technique of data aggregation is usually used to reduce the number of radio transmissions. However, there are many emerging sensor network applications with different requirements and constraints. Rather, they are time critical, i.e., delivering sensed information of each individual sensor node back to a central base station quickly becomes most important. In this paper, we consider collision-free delay efficient data gathering problem in sensor networks, assuming that no data aggregation happens in intermediate nodes. We formally formulate this problem and propose optimal and near-optimal algorithms for different topologies. Particularly, in general topology, we present two approximation algorithms with performance ratio of 2 and 1+1/(k+1), respectively.
2009 IEEE 25th International Conference on Data Engineering, 2009
Developing powerful paradigms for programming sensor networks is critical to realize the full pot... more Developing powerful paradigms for programming sensor networks is critical to realize the full potential of sensor networks as collaborative data processing engines. In this article, we motivate and develop a deductive framework for programming sensor networks, extending the prior vision of viewing sensor network as a distributed database. The deductive programming approach is declarative, very expressive, and amenable to automatic optimizations. Such a framework allows users to program sensor network applications at a high-level without worrying about the low-level tedious details. Our system translates a given deductive program to efficient distributed code that runs on individual nodes. To facilitate the above translation, we develop techniques for distributed and asynchronous evaluation of deductive programs in sensor networks. Our techniques generalize to recursive programs without negations, arbitrary nonrecursive programs with negations, and in general to arbitrary "locally non-recursive" programs with function symbols. We present performance results on TOSSIM, a network simulator, and a small network testbed.
IEEE Transactions on Knowledge and Data Engineering, 2005
Abstract—A data warehouse stores materialized views of data from one or more sources, with the pu... more Abstract—A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be ...
A d ata w arehouse collects and i n tegrates data from multiple, autonomous, heterogeneous, sourc... more A d ata w arehouse collects and i n tegrates data from multiple, autonomous, heterogeneous, sources. The w arehouse e ectively maintains one or more materialized views over the source data. In this paper we d escribe the architecture of the Whips prototype system, which collects, transforms, and integrates data for the w arehouse. We s h ow h ow t he required functionality can be divided among c o o perating distributed CORBA objects, providing both scalability a n d t he exibility n eeded for supporting di erent a p plication needs and h eterogeneous sources. The Whips prototy p e i s a f u nctioning system implemented at S t anford University a n d w e provide preliminary performance results.
2006 IEEE International Conference on Communications, 2006
Abstract— Directional antennas in wireless mesh networks can improve spatial reuse. However, usin... more Abstract— Directional antennas in wireless mesh networks can improve spatial reuse. However, using them effectively needs specialized protocol support at the MAC layer, which is always not practical. In this work, we present a topology control approach to effectively ...
2006 IEEE International Conference on Communications, 2006
In recent years, with the advent of wireless technology and file sharing applications, the tradit... more In recent years, with the advent of wireless technology and file sharing applications, the traditional client-server model has begun to lose its prominence. Instead, information sharing by spontaneously connected nodes has emerged as a new framework. In such networks, all network nodes ...
A sensor network is a wireless ad hoc network of resourceconstrained sensor nodes. In this articl... more A sensor network is a wireless ad hoc network of resourceconstrained sensor nodes. In this article, we address the problem of communication-efficient implementation of the SQL "join" operator in sensor networks. We design an optimal join-implementation algorithm that provably incurs minimum communication cost under certain reasonable assumptions. In addition, we design a much faster suboptimal heuristic that empirically delivers a near-optimal solution. We evaluate the performance of our designed algorithms through extensive simulations.
In this paper, we address an optimization problem that arises in the context of cache placement i... more In this paper, we address an optimization problem that arises in the context of cache placement in sensor networks. In particular, we consider the cache placement problem where the goal is to determine a set of nodes in the network to cache/store the given data item, such that ...
Views stored in a data warehouse need to be kept current. As recomputing the views is very expens... more Views stored in a data warehouse need to be kept current. As recomputing the views is very expensive, incremental maintenance algorithms are required. Over recent years, several incremental maintenance algorithms have been proposed. None of the proposed algorithms handle the general case of relational expressions involving aggregate and outerjoin operators efficiently.
International Conference on Network Protocols, 2007
Radio frequency~i dentification (RFIDW is a technology where a reader device can "sense" the pres... more Radio frequency~i dentification (RFIDW is a technology where a reader device can "sense" the presence of a closeby object by reading a tag device attached to the object. To improve coverage, multiple RFID readers can be deployed in the given region. In this paper, we consider the problem of slotted scheduled access of RFID tags in a multiple reader environment.
Uploads
Papers by Himanshu Gupta