Report of Url Shortening

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

A URL SHORTENING

MICROSERVICE
AUGUST 2017

AUTHOR:
Varun Joshi
SUPERVISORS:
Pedro Ferreira, Bruno Silva De Souza
IT-CDA-IC
CERN openlab Report // 2017

EXECUTIVE SUMMARY

A URL shortening REST API that supports custom URLs, callbacks to analytics services,
authentication and easy deployment.

It consists of an API and a redirection service. The API has two endpoints: tokens and URLs. The
tokens endpoint is used for authentication management and the URLs endpoint is used for adding
and changing short URLs.

The service has a Docker/OpenShift template, is documented using Swagger and has extensive unit
tests.

2
A URL SHORTENING SERVICE
CERN openlab Report // 2017

ABSTRACT

This report describes the design and implementation of the URL shortening service developed using
Python and Flask-apispec. The service has supports an API for adding and removing URLs. It also
has support for custom metadata and callbacks for analytics.

3
A URL SHORTENING SERVICE
CERN openlab Report // 2017

TABLE OF CONTENTS

PROJECT SPECIFICATION 05

PROJECT STRUCTURE 05

API 06

TOKENS

URLs

OTHER FEATURES 08

SWAGGER DOCUMENTATION

OPENSHIFT TEMPLATE

UNIT TESTS

CONCLUSIONS 08

FUTURE WORK

ACKNOWLEDGEMENTS 09

4
A URL SHORTENING SERVICE
CERN openlab Report // 2017

1. PROJECT SPECIFICATION

The Indico team at CERN needs the ability to shorten URLs for easier sharing. This need is felt in the
Web Frameworks section too for maintaining the go.web.cern.ch service. Hence, this project aims to
create a URL shortening REST API microservice that can be used by any application. The broad
requirements for the project were:

1. The ability to shorten URLs to a predefined length;


2. The ability to make custom short URLs;
3. To support custom metadata formats;
4. To support webhooks/callback URLs;
5. Support for tokens for authentication/authorization;
6. Easy deployment to support multiple domains;

The deliverables for the project therefore can be listed as:

- The API, including a token management service;


- The Redirection Service, which will redirect the user to the original URL;
- Swagger Documentation; 1
- Docker/OpenShift Template capable of running the application on a PaaS container-based
infrastructure;

To implement the deliverables, the following technologies were used:

• Python
• Flask microframework
• Flask-Apispec library;
• Docker, OpenShift have been used

2. PROJECT STRUCTURE

Figure 1.1 The Project Structure

1
http://swagger.io

5
A URL SHORTENING SERVICE
CERN openlab Report // 2017

The microservice will be utilized by clients like Indico and go.cern.ch. They will use the API to create and
maintain shortcuts. Every service can have its own instance of the URL shortening microservice and
hence use its own domain.

The clients call the API to create shortcuts or to view the shortcuts that have already been created.

The shortcut users use the redirection service to get to the long URL that is pointed to by the shortcut. The
redirection service also can optionally call back an analytics service like Piwik to keep track of usage
statistics.

3. API

The API is responsible for letting client applications like Indico or go.cern.ch create new short URLs and
for storing them. It has two major endpoints:

a. Tokens

Tokens are used for authentication and authorization. Tokens have several properties:

a. API Key: a UUID that is used as the authentication token for all API calls.
b. Name: this is a unique identifier used to determine the creator of the token.
c. Last Access: shows the date and time when the token was used last; useful for identifying
dormant accounts.
d. Callback URL: points to the analytics service that the client application wants to use. This URL is
called with the user agent of the request as payload every time the user clicks a short URL.
e. Uses: The number of times a given token has been used. Can be used to retire tokens.

Tokens can be of two types: admin and non-admin. Admin tokens can create and modify other tokens and
the URLs owned by those tokens. Non-admin tokens can only look at and modify their own URLs.

The JSON payload format for this endpoint is generally of the form:

{
"api_key": "077cc752-98b1-4c39-8397-95d826fd776c",
"callback_url": null,
"is_admin": true,
"is_blocked": false,
"last_access": "2017-08-18T10:17:13.071477+00:00",
"name": "god",
"token_uses": 2
}

The various methods in for this endpoint are:

GET /tokens/ : This method is used to view all the tokens. Can only be called by an admin token.
Supports the following query parameters to filter the results:

1. is_admin
2. Is_blocked
3. Name

6
A URL SHORTENING SERVICE
CERN openlab Report // 2017

4. callback_url

GET /tokens/<api_key> : This method is used to view the details of a particular token. Can only be
called by an admin token.

POST /tokens/ : Used to create new tokens. Supports the same query parameters as GET /tokens/.

PATCH /tokens/<api_key> : Used to modify a token. Supports the same query parameters as GET
/tokens/.

b. URLs

URLs store the short URL code and other related information. Their properties are:

a. Token: This is a pointer to the token that created the URL.


b. Metadata: This is a generic key-value store. It can be used to store information about the URL that
is specific to the client application and places no constraints on the structure of the data contained
within.
c. Shortcut: The actual short URL. It can be a randomly generated string or a custom URL. It can
only contain alphabets, numbers and ‘-’.
d. URL: The original URL that the user is redirected to.

This endpoint supports the following methods:

GET /urls/ : This method is used to view all the URLs for a given token. Supports the following query
parameters to filter the results:

1. metadata.x: This is a general key-value store, so x can be any value.


2. url
3. all: Boolean. A call to this method with this set to true by an admin token will return all the URLs
matching the other filters, disregarding the owner of the URLs.

The general format of the JSON payload is:

{
"metadata": "{}",
"shortcut": "XkZ8S",
"token": "077cc752-98b1-4c39-8397-95d826fd776c",
"url": "http://www.google.com"
}

GET /urls/<shortcut> : This method is used to view the details of a particular shortcut.

POST /urls/ : Used to create a new random shortcut. The query params are:

1. allow_reuse: This handles two cases: if it is set to true and the long URL mentioned in the request
exists, it just returns the shortcut currently linked to that URL in the database. The other case
comes into play for the PUT /urls/.

PUT /urls/<shortcut> : Used to create a custom URL. The query params are:

7
A URL SHORTENING SERVICE
CERN openlab Report // 2017

1. allow_reuse: This is the other case: if it is set to true and the requested shortcut already exists, it
changes the current long URL to the one mentioned in the new request.

4. Other Features

a. Swagger Documentation

We have used Swagger to automatically generate API documentation. Flask-apispec had inbuilt support
for this. It automatically scans the Flask app object and determines the endpoints and their properties.

b. OpenShift Template

To enable deployment to OpenShift, we used the Kompose tool. The compose tool converts a docker-
compose YAML file into the required OpenShift templates needed for deployment.

c. Unit Tests

For testing the application to make sure that it functions as expected by the clients, we have used the
pytest testing framework.

5. Conclusions

The deliverables mentioned in the introduction of the report have successfully been implemented. The
code resides at https://github.com/ThiefMaster/ursh and a test instance has been created at http://test-
ursh.web.cern.ch. It can readily be used by the Web Frameworks and Indico teams at CERN to ease
shortening URLs and keeping track of short URLs.

a. Future Work

The following features are desirable and can be implemented to increase the utility of the project:

1. Health Check: To increase reliability and reduce storage costs, a scheduled task can be used to
routinely check if URLs in the database return 2xx HTTP codes.
2. Pagination: The API is currently not paginated. It might be a useful feature to have for clients with
many URLs.
3. Logging URL Lifecycle: We currently rely on application logs to determine events like creation,
updating or deletion of a URL. It might be useful to store these events with the history in a
separate table.

8
A URL SHORTENING SERVICE
CERN openlab Report // 2017

6. Acknowledgements

I would like to thank my mentors, Pedro and Bruno for making sure that I had help at every stage. Adrian
Moennich was also instrumental in the creation of this project. He reviewed almost all the code written,
making sure it was up to standard.

9
A URL SHORTENING SERVICE

You might also like