Windows Azure™ Marketplace Datamarket: Published
Windows Azure™ Marketplace Datamarket: Published
Windows Azure™ Marketplace Datamarket: Published
DataMarket
Published
October 2010
Applies to
Windows Azure Marketplace DataMarket
Summary
Windows Azure Marketplace features DataMarket, a new cloud-based service that provides a global
marketplace for information including data, web services, and analytics. With DataMarket, content providers
can make their datasets available to a wide audience around the world, subscribers can locate a dataset that
addresses their needs through rich discovery, and developers can write code to consume the datasets on any
platform.
Copyright
This is a preliminary document and may be changed substantially prior to final commercial release of the
software described herein.
The information contained in this document represents the current view of Microsoft Corporation on the
issues discussed as of the date of publication. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft
cannot guarantee the accuracy of any information presented after the date of publication.
This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting
the rights under copyright, no part of this document may be reproduced, stored in, or introduced into a
retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying,
recording, or otherwise), or for any purpose, without the express written permission of Microsoft
Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any license to these patents,
trademarks, copyrights, or other intellectual property.
Microsoft, Windows Azure Marketplace, Access, Active Directory, Excel, IntelliSense, Microsoft Dynamics,
SharePoint, SQL Azure, SQL Server, Visual Studio, Windows, Windows Live, and Windows Server are
trademarks of the Microsoft group of companies.
2
Contents
Introduction....................................................................................................................................................................... 4
Key Features of DataMarket ........................................................................................................................................ 5
Richer Analytics............................................................................................................................................................ 8
Integration with Information Worker Applications ........................................................................................ 9
Developers..................................................................................................................................................................... 9
Data Mash-Ups ......................................................................................................................................................... 11
Independent Software Vendors .......................................................................................................................... 11
Summary ..................................................................................................................................................................... 15
Explore DataMarket Today! ................................................................................................................................. 16
3
Introduction
The internet is a source of vast quantities of data, both public and commercial content. Many
organizations publish datasets in a wide variety of disparate formats, to which customers can
subscribe. However, it can be difficult for customers to locate and subscribe to these datasets.
Furthermore, it can be challenging to use these datasets in ways that add value.
Consider a business that has identified a need for a specific type of data, such as customers and
their buying habits, products from suppliers, geographical information, population statistics,
scientific research, political statistics, or entertainment information. An internet search will locate
several competing data suppliers. But how does the customer make a fair and direct comparison
of the dataset features to select the one most suitable?
And this is just the beginning. After the company has located and chosen a suitable dataset, how
do they integrate it into their business? The fact is, data is often available in a wide variety of
formats. For example, many publishers use XML, but define their own schema, and may use
SOAP, REST, or JSON to exchange information. As a result, the business must devote
development time to integrate the dataset into its desktop applications, web sites, cloud
applications, and any other data-consuming software. This issue is multiplied across every single
dataset that the company acquires from various sources.
It is after the dataset has been integrated into the company that users get their hands on
experience. So poor quality data if present becomes obvious only at this point, and then the
purchase and development costs involved are wasted. And although many dataset suppliers
promise a certain level of availability through their Service Level Agreements (SLAs), some
suppliers are over-ambitious and may not meet their obligations.
Auditing and billing can also be a problem as each data publisher is likely to bill using different
criteria which may not suit the subscriber's use. For example, a monthly subscription may be
expensive if a dataset is used exclusively by a small department. The company may subscribe it
because the data is essential, even though it pays the same price as a customer who generates
ten times the number of queries. Furthermore, a publisher may not provide statistics regarding
data use. If a company wants to know the dataset usage, it may have to develop its own code for
keeping track.
And finally, using that dataset in conjunction with other data sources can be problematic. Can it
be mashed up and associated with other data? Can its semantics be easily augment, flexibly
associate, and correlate data? All of these problems are multiplied when an organization
subscribes to many different datasets from different suppliers.
Windows Azure Marketplace DataMarket can help resolve these issues because it allows
developers and information workers to easily discover, purchase, and manage premium data
4
subscriptions on any platform. Essentially, DataMarket is an information marketplace that brings
data, imagery, and real-time web services from leading commercial data providers and
authoritative public data sources together into a single location. It offers a unified provisioning
and billing framework. In addition, Marketplace provides OData APIs services for accessing data,
so developers and information workers can consume this premium content using virtually any
platform, application, or business workflow.
This paper describes the most important features of DataMarket and how they address common
business needs. It also outlines the common business scenarios that DataMarket addresses and
describes the system's architecture.
One of the real benefits that DataMarket provides is consistency, from the way datasets are
described, to the method in which subscriptions are managed. It handles all usage tracking and
billing so that providers can easily reach new consumers, and subscribers can view all their usage
in a single location. As a result, billing is more flexible, whether subscribers choose pay-as-they
go transactions, monthly subscriptions, or even enterprise volume licensing. And when it comes
to integrating data into business applications, they can use the same techniques and similar code
with subsequent subscriptions, because of the consistent presentation of data and the ability to
automatically generate new proxy classes.
5
Figure 1: The DataMarket Catalog
Furthermore, because DataMarket is built on Microsoft Azure® and runs in industry-leading data
centers, you won’t need to make heavy investments in hardware. The service provides almost
unlimited scalability and can guarantee high availability. And when you need to increase the size
of your dataset, DataMarket scales smoothly with your requirements.
6
web services. You no longer have to provide e-commerce functionality such as shopping baskets,
check-out tools, and invoicing because DataMarket does that for you, with high security and
availability. In addition, subscribers trust the data found in DataMarket, because they know they
will get high-quality data and excellent service. DataMarket can even broker data from any
source, whether it’s found in Windows Azure Storage and SQL Azure™ databases or third-party
clouds and private data centers.
The information available in DataMarket will only continue to grow and diversify. Furthermore,
some information sets are published on a commercial basis and others are free, such as public
domain data from federal and state governments and free trials to commercial content.
For subscribers, the unified billing infrastructure means that tracking data usage and predicting
bills is simple--even when they use many subscriptions with multiple content partners. Microsoft
handles it all.
At the same time, content partners won’t need custom billing and invoicing systems. Instead,
they get a versatile and powerful system that supports multiple tenants straight out of the box.
7
Microsoft handles fulfillment, and DataMarket tracks all customer access and provides detailed
reports.
With DataMarket, you can also create several different subscription models for a single dataset.
For example, you could create a free subscription with partial access and a premium subscription
with full access to all data. You can also control how queries are performed on a web service or
structured dataset—and even control what’s returned from a visual interface with no coding
required. In addition, you can use this visual interface to set pricing, terms of use, marketplace
descriptions, samples, and more.
Furthermore, we encourage content providers to tag the data and supply semantic hints to
application developers and information workers. By doing so, disparate datasets can logically be
combined and joined by clients to extend the power of the datasets.
Richer Analytics
DataMarket offers the ability to enrich existing analytics, helping content providers extend the
power of their datasets. In fact, you can become a content partner even if you have no data to
publish, simply by creating reports and analyses of the detailed data from other providers. Or you
can simply build and consume reports for your own purposes. After reports are created, they can
be bought and sold in the same way as datasets, allowing individuals with expertise in particular
domains to deliver rich experiences to consumers and information workers.
Furthermore, you can create mash-ups — reports that analyze data from multiple datasets,
including datasets from other content providers in an ecosystem that ensures content providers
receive monetization for their assets and ISVs and report authors generate revenue from
supplying domain knowledge. For example, you could create a report that analyzes your
organization's sales data in the light of weather records, such as how a cold winter affected your
clothing sales and how you can capitalize on such events in the future.
8
Integration with Information Worker Applications
DataMarket integrates with desktop applications smoothly and is an easy way to improve
productivity. For example, a dataset could add information to the Microsoft Office Word
Research task pane. In Microsoft Office Excel®, data from DataMarket could enrich pivot tables
and provide extra insights into business data. Reports in Microsoft Office Access® or Microsoft
SQL Server® can mash up data from the local database with DataMarket information. You can
even use the DataMarket Add-in for Excel to discover, purchase, and use DataMarket datasets
without ever leaving the familiar Excel environment—and then integrate your data with
PowerPivot for Excel for rich, self-service business intelligence and Bing™ maps to use spatial
datasets for quick, visual instant answers.
Service Explorer is incredibly useful for developers who build cross-platform applications because
it creates URLs that they can copy and paste into their application code to call the Web service. .
Developers can use OData URIs to connect to the datasets and consume them in their
applications. In addition, they can use the “Add Service Reference” capability in Visual Studio to
generate proxy classes. The secure REST based OData APIs provide an abstraction over where the
data resides—whether it’s a remote web service, a blob store, a rich SQL database, or content in
the Azure platform.
Service Explorer also works well for information workers. For instance, they can download a
PowerPivot file that enables rich data analysis within Excel.
Typical Scenarios
DataMarket improves the discovery and acquisition of content in a vast variety of business and
non-commercial scenarios. A few examples are discussed below.
Developers
DataMarket helps developers make the most of the rich data in its catalog at every stage of
development. In the beginning, you can take out a trial subscription to some datasets to identify
the most appropriate content for enabling the application and ensuring that it meets the
customer's needs. Then, you can visually explore the content in the browser-based Service
Explorer tool, submitting queries and previewing results.
9
When you are sure you have the right dataset, DataMarket assists you as you build your
application. The Service Explorer tool can return results in Atom 1.0 or raw formats for use as
sample data and generates URLs to the queries you run. You can copy and paste these URLs into
your code to call the service. Most importantly, you can also download automatically generated
C# proxy classes. When you import these into your application, you have strongly-typed access
with full IntelliSense support to ease development.
DataMarket Application Programming Interfaces (APIs) help developers work with datasets in the
same way on many different platforms. Because the APIs are consistent, you can quickly develop
code to support desktop, Web, mobile, and other clients. And because they are DataMarket is
built on the REST architecture and static services feature full support for the Open Data Format
specification, high quality data is simple to discover.
10
Data Mash-Ups
A mash-up is any application or visualization that combines data from more than one source to
provide a new experience. On the internet, for example, data from a Web service could be
combined with a mapping tool, such as Microsoft Bing™ maps, to provide a geographical view
that is not possible with the Web service alone. Such a mash-up often makes hidden trends plain,
such as geographical clusters of events that are impossible to spot from zip codes.
With DataMarket, data presentation is consistent, which means that creating mash-ups is fast and
only requires a small amount of development time. As you explore the catalog, simple but
insightful possible mash-ups become obvious and as more datasets are added, the possibilities
will multiply. And by featuring tags to aid semantic analysis, DataMarket makes associations and
mash-ups easier than ever.
11
tenants to datasets and ensure that one customer's usage does not contend with other
customers.
DataMarket addresses all these issues because it is a unified data marketplace with consistent
APIs, billing infrastructure, and multi-tenant support, and it runs in market-leading data centers
that guarantee robustness. It represents a great way to capitalize on the quality of your dataset in
which you have invested.
12
dashboards, and mobile phone apps, all of which are supported. As a result, the developer can
deliver much greater insight—without large overhead.
Because DataMarket provides a unified and scalable billing infrastructure, vendors do not need
to solve these challenges for themselves. Instead, subscribers can easily compare your data with
competitors' so they can determine suitability and quality. Then, they can find their usage
statistics instantly and predict costs in advance. Ultimately, they have more confidence in the
payment system.
Architectural Overview
The following sections illustrate the design of the DataMarket service.
13
Figure 5: DataMarket components are shown in blue.
In Figure 5, data consumers of various kinds are found on the left. Notice that DataMarket can be
accessed from many platforms and that Office applications and client-server systems—such as
SQL Server and Microsoft Dynamics® Servers—can use its data. DataMarket also supports any
operating system or hardware platform are supported, because data access uses REST and Atom
1.0 standards and is secured with Secure Sockets Layer (SSL).
The Front End Windows Azure (FEWA) load balancer is a key component because it ensures full
use of the data center and rapid query response. It also insulates users from changes in the server
infrastructure and makes sure that DataMarket scales seamlessly.
At the bottom of Figure 5, you can see the components of DataMarket APIs and marketplace.
Notice that users can authenticate with their Windows Live® ID or with Access Control Services
(ACS). ACS includes identity federation and delegation facilities that you could use, for example,
to integrate your Active Directory® with DataMarket. In this way, users can access DataMarket
through their usual Microsoft Windows® user account. Notice also that DataMarket tracks all
access and generates invoices in logging and accounts databases.
Publication Architecture
On the right of Figure 6 are the data stores. Because DataMarket runs in the cloud, Windows
Azure and SQL Azure make ideal data stores. However, it is important to note that cloud services
14
and data centers from third parties can be used just as easily and are supported via proxy layers
that conform to DataMarket SLA and interfaces.
DataMarket includes Data Access Layers (DALs) that encapsulate all the logic required to query
the data store and remote web services. Please note that load balancers are built into DataMarket
on the publication side as well as the data access side, which provides smooth scaling to large
data centers and heavy traffic.
Windows Azure or SQL Azure data stores are natural choices because of their high availability
and resilience. However, DataMarket can also include datasets that use third party cloud services
or data centers for storage. To ensure the quality of service in these situations, we investigate
Service Level Agreements, load balancing, availability, bandwidth, failover, and other fault-
tolerance features.
For public domain content, the provider must be the authoritative source of the curated content.
For commercial content providers, the organization must have the right to sell the content in
Microsoft's supported markets and be in the top five (by annual sales) of their industry or vertical.
Content providers that are requested by the ISV and Information Worker community will also be
encouraged to apply for on-boarding their content in DataMarket.
Conclusion
Summary
DataMarket helps simplify all the steps associated with discovering, exploring, and acquiring
information. It helps content providers reduce the challenges of marketing and selling their high
15
quality services and datasets, just as it helps consumers ensure they get quality data that is secure
and easy to use.
Easy publication of data whether it is blob data, structured data, or dynamic Web services.
Developer tooling on the Microsoft platform to ease Visual Studio and .NET development.
An easy way to get your content to Microsoft’s global developer and information worker
community.
A scalable Microsoft cloud computing platform that handles delivery, billing, and reporting.
Developers get:
Trial subscriptions that let you investigate content and develop applications without paying
data royalties.
Simple transaction and subscription models that support pay as you grow access to multi-
million-dollar datasets.
Consistent REST-based OData APIs across all datasets that facilitate development on any
platform.
The Service Explorer tool, which you can use to visually build and explore APIs and preview
results.
Automatic C# proxy classes that eliminate the need to write long XML and Web service
code.
16