Report of Url Shortening
Report of Url Shortening
Report of Url Shortening
MICROSERVICE
AUGUST 2017
AUTHOR:
Varun Joshi
SUPERVISORS:
Pedro Ferreira, Bruno Silva De Souza
IT-CDA-IC
CERN openlab Report // 2017
EXECUTIVE SUMMARY
A URL shortening REST API that supports custom URLs, callbacks to analytics services,
authentication and easy deployment.
It consists of an API and a redirection service. The API has two endpoints: tokens and URLs. The
tokens endpoint is used for authentication management and the URLs endpoint is used for adding
and changing short URLs.
The service has a Docker/OpenShift template, is documented using Swagger and has extensive unit
tests.
2
A URL SHORTENING SERVICE
CERN openlab Report // 2017
ABSTRACT
This report describes the design and implementation of the URL shortening service developed using
Python and Flask-apispec. The service has supports an API for adding and removing URLs. It also
has support for custom metadata and callbacks for analytics.
3
A URL SHORTENING SERVICE
CERN openlab Report // 2017
TABLE OF CONTENTS
PROJECT SPECIFICATION 05
PROJECT STRUCTURE 05
API 06
TOKENS
URLs
OTHER FEATURES 08
SWAGGER DOCUMENTATION
OPENSHIFT TEMPLATE
UNIT TESTS
CONCLUSIONS 08
FUTURE WORK
ACKNOWLEDGEMENTS 09
4
A URL SHORTENING SERVICE
CERN openlab Report // 2017
1. PROJECT SPECIFICATION
The Indico team at CERN needs the ability to shorten URLs for easier sharing. This need is felt in the
Web Frameworks section too for maintaining the go.web.cern.ch service. Hence, this project aims to
create a URL shortening REST API microservice that can be used by any application. The broad
requirements for the project were:
• Python
• Flask microframework
• Flask-Apispec library;
• Docker, OpenShift have been used
2. PROJECT STRUCTURE
1
http://swagger.io
5
A URL SHORTENING SERVICE
CERN openlab Report // 2017
The microservice will be utilized by clients like Indico and go.cern.ch. They will use the API to create and
maintain shortcuts. Every service can have its own instance of the URL shortening microservice and
hence use its own domain.
The clients call the API to create shortcuts or to view the shortcuts that have already been created.
The shortcut users use the redirection service to get to the long URL that is pointed to by the shortcut. The
redirection service also can optionally call back an analytics service like Piwik to keep track of usage
statistics.
3. API
The API is responsible for letting client applications like Indico or go.cern.ch create new short URLs and
for storing them. It has two major endpoints:
a. Tokens
Tokens are used for authentication and authorization. Tokens have several properties:
a. API Key: a UUID that is used as the authentication token for all API calls.
b. Name: this is a unique identifier used to determine the creator of the token.
c. Last Access: shows the date and time when the token was used last; useful for identifying
dormant accounts.
d. Callback URL: points to the analytics service that the client application wants to use. This URL is
called with the user agent of the request as payload every time the user clicks a short URL.
e. Uses: The number of times a given token has been used. Can be used to retire tokens.
Tokens can be of two types: admin and non-admin. Admin tokens can create and modify other tokens and
the URLs owned by those tokens. Non-admin tokens can only look at and modify their own URLs.
The JSON payload format for this endpoint is generally of the form:
{
"api_key": "077cc752-98b1-4c39-8397-95d826fd776c",
"callback_url": null,
"is_admin": true,
"is_blocked": false,
"last_access": "2017-08-18T10:17:13.071477+00:00",
"name": "god",
"token_uses": 2
}
GET /tokens/ : This method is used to view all the tokens. Can only be called by an admin token.
Supports the following query parameters to filter the results:
1. is_admin
2. Is_blocked
3. Name
6
A URL SHORTENING SERVICE
CERN openlab Report // 2017
4. callback_url
GET /tokens/<api_key> : This method is used to view the details of a particular token. Can only be
called by an admin token.
POST /tokens/ : Used to create new tokens. Supports the same query parameters as GET /tokens/.
PATCH /tokens/<api_key> : Used to modify a token. Supports the same query parameters as GET
/tokens/.
b. URLs
URLs store the short URL code and other related information. Their properties are:
GET /urls/ : This method is used to view all the URLs for a given token. Supports the following query
parameters to filter the results:
{
"metadata": "{}",
"shortcut": "XkZ8S",
"token": "077cc752-98b1-4c39-8397-95d826fd776c",
"url": "http://www.google.com"
}
GET /urls/<shortcut> : This method is used to view the details of a particular shortcut.
POST /urls/ : Used to create a new random shortcut. The query params are:
1. allow_reuse: This handles two cases: if it is set to true and the long URL mentioned in the request
exists, it just returns the shortcut currently linked to that URL in the database. The other case
comes into play for the PUT /urls/.
PUT /urls/<shortcut> : Used to create a custom URL. The query params are:
7
A URL SHORTENING SERVICE
CERN openlab Report // 2017
1. allow_reuse: This is the other case: if it is set to true and the requested shortcut already exists, it
changes the current long URL to the one mentioned in the new request.
4. Other Features
a. Swagger Documentation
We have used Swagger to automatically generate API documentation. Flask-apispec had inbuilt support
for this. It automatically scans the Flask app object and determines the endpoints and their properties.
b. OpenShift Template
To enable deployment to OpenShift, we used the Kompose tool. The compose tool converts a docker-
compose YAML file into the required OpenShift templates needed for deployment.
c. Unit Tests
For testing the application to make sure that it functions as expected by the clients, we have used the
pytest testing framework.
5. Conclusions
The deliverables mentioned in the introduction of the report have successfully been implemented. The
code resides at https://github.com/ThiefMaster/ursh and a test instance has been created at http://test-
ursh.web.cern.ch. It can readily be used by the Web Frameworks and Indico teams at CERN to ease
shortening URLs and keeping track of short URLs.
a. Future Work
The following features are desirable and can be implemented to increase the utility of the project:
1. Health Check: To increase reliability and reduce storage costs, a scheduled task can be used to
routinely check if URLs in the database return 2xx HTTP codes.
2. Pagination: The API is currently not paginated. It might be a useful feature to have for clients with
many URLs.
3. Logging URL Lifecycle: We currently rely on application logs to determine events like creation,
updating or deletion of a URL. It might be useful to store these events with the history in a
separate table.
8
A URL SHORTENING SERVICE
CERN openlab Report // 2017
6. Acknowledgements
I would like to thank my mentors, Pedro and Bruno for making sure that I had help at every stage. Adrian
Moennich was also instrumental in the creation of this project. He reviewed almost all the code written,
making sure it was up to standard.
9
A URL SHORTENING SERVICE