QCon London - Gilt Microservices

Download as pdf or txt
Download as pdf or txt
You are on page 1of 92

BUILDING A MODERN

MICROSERVICES
ARCHITECTURE @
QCON LONDON - 2015
YONI (JONATHAN) GOLDBERG

- GiltDirect, Sale Personalization, Loyalty, SEO,


Post-purchase, Login/Registration
- MIT CS BS/Meng | Google | IBM | IDF
- Israel | Brooklyn | Coffee | JS/Node | Arduino | Running |
Kite Surfing | Poker

THREE TAKE AWAYS


FROM GILT'S STORY
Which problems will be solved?
Which challenges will you face?
Is it the right choice for you?

WHAT IS GILT?
Flash Sales Business
Founded in 2007
Top 50 Internet-Retailer
~150 Engineers

ANOTHER WAY TO LOOK AT


GILT

THE CLASSIC STARTUP


STORY

THE EARLY DAYS


2007 - Ruby on Rails
the hottest new thing
The goal was to get to market fast

WE WERE ABLE TO HANDLE OUR


TRAFFIC PRETTY WELL

UNTIL LOUBOUTIN CAME TO


GILT

TECHNOLOGY PAIN POINTS 2009


Spike required to launch 1,000s of ruby processes
Postgres was overloaded
Routing traffic between ruby processes sucked

DEV PAIN POINTS


1000 Models/Controllers, 200K LOC, 100s of jobs
Lots of contributors + no ownership
Difficult deployments with long integration cycles
Hard to identify root causes

WE NEEDED TO SOLVE
THE PROBLEM FAST

THREE THINGS HAPPENED


Started the transition to the JVM
M(a/i)cro-Service Era Started
Dedicated data stores

WHY JVM?
Widely adopted
Stable
Better support for concurrency
Better GC vs MRI

FIRST 10 SERVICES

WE SOLVED 90% OF OUR SCALING


PROBLEMS
BUT NOT THE DEVELOPERS PAIN
POINTS

SOLVED PAIN POINTS


Spike required to launch 1,000s of ruby processes
Postgres was overloaded
Routing traffic between ruby processes sucked

STILL OPEN PAIN POINTS


New services became semi-monolithic
1000 Models/Controllers, 200K LOC, 100s of jobs
Lots of contributors + no ownership
Difficult deployments with long integration cycles

WHY WE DOUBLED DOWN ON


MICRO-SERVICES
Empower teams and ownership
Smaller scope
Simpler and Easier deployments and rollbacks

WE BEGAN THE TRANSITION TO


SCALA AND PLAY
LOSA - LOTS OF SMALL (WEB) APPS
[SAME AS MICRO-SERVICES BUT FOR WEB-APPS]

AS OF LAST WEEK WE HAVE AROUND


300 SERVICES IN PROD

WHY THE INCREASE?

APP BOOTSTRAP
rake bootstrap:admin-web
rake bootstrap:client-server-core
rake bootstrap:play
rake bootstrap:play-ui-build
rake bootstrap:sbt-library
rake bootstrap:schema

# Bootstrap a admin-web service


# Bootstrap a client-server-core service
# Bootstrap a play service
# Bootstrap a play-ui-build service
# Bootstrap a sbt-library service
# Bootstrap a schema service

DEMO

HOW TO DEFINE A
MICROSERVICE?
FUNCTIONALITY SCOPE
NUMBER OF DEVS INVOLVED

"A SERVICE-ORIENTED ARCHITECTURE


COMPOSED OF LOOSELY COUPLED
ELEMENTS THAT HAVE BOUNDED
CONTEXTS"
- ADRIAN COCKCROFT

CURRENT CHALLENGES
Deployments and Testing (Functional/Integration)
Dev/Integration Environments
Service Discoverability
Who owns this service!?
Monitoring

MICRO SERVICE
DEPLOYMENT V3

CHALLENGES THAT WERE


SOLVED IN V2:
Frustrating to deploy semi-manually (Capistrano)
Scary to deploy other teams services
Hard to execute functional tests between services

SBT
Motivation: Scala adoption
Complex Scala syntax
Cool features: ~test, shell, console
Hard to debug

GILT-SBT-BUILD
Simple config for all the services
Pulls many plugins:
[nexus, testing, RPMs, run scripts, Monitoring, SemVer, ...]
Custom commands (e.g 'sbt release')

ION-CANNON + SBT
Run tests on dedicated Env
Dark canary releases
Easy rollbacks
Integrated health checks

DATA CENTER
LIMITATIONS
&&
MORE TECHNOLOGY
OWNERSHIP

IMMUTABLE
INFRASTRUCTURE

V3
Per department AWS account and budget
Longer deploys [New instances / CNAME swaps]
Each team has AWS expertise

API LOVE

WELL DEFINED REST APIS SOLVE


DISCOVERABILITY, DOCUMENTATION
AND INNER ADOPTION

APIS @ GILT
www.apidoc.me and Swagger.io
Describe the API in simple json
Auto generates versioned docs, routes and clients
Per team - API design committee

DEPENDENCY FUN [DEMO]

"MID-TIER MICRO-SERVICE"
BIGGEST PERFORMANCE CHALLENGE
NETWORK IO

ON DEV/INTEGRATION ENVIRONMENTS
The hardware is not strong enough
No one wants to compile 20 services
Service Dependencies

EACH TEAM HAS A STAGING


ENV
SERVICE_PORTS=[
4001, #listing-service
8235, #svc-user-set
9420, #svc-free-fall
7895, #svc-Loyalty
8155, #web-loyalty
9410, #web inventory status
7898, #admin-loyalty
7899, #notification
7102, #rouge
9530, #svc-component
6802, #svc-waitlist-submit
4066, #svc-action-sale
....
PORT_FORWARD_ARGS=SERVICE_PORTS.map { |port|
['-L', "#{port}:localhost:#{port}"]

STAGING DIFFICULTIES:
Hard to keep all the services up to date
Maxed our staging env capacities
Shared data across legacy apps

WITH OUR MIGRATION TO AWS


[TEAM RESPONSIBILITY]
A pool of integration instances and dbs
Develop with prod instances [Not a fan]

ON OWNERSHIP
"code stays much longer than people" - SB

CODE OWNERSHIP

CURRENT APPROACH
Code Review!Code Review!Code Review!
Team owns services, not individual developers
Ownership transfer

DATA OWNERSHIP

WE TRANSITIONED TO
MICRO-DBS
MICROSERVICES NEEDS THEIR OWN
MONGODB | POSTGRES | RDS |
VOLDEMORT

MANAGE MICRO-RELATIONAL
DBS
SCHEMA EVOLUTION
MANAGER
https://github.com/gilt/schema-evolution-manager

PRINCIPLES OF
SCHEMA EVOLUTION
MANAGER
Independent from the service code
Manages the schema evolutions in a Git repo
Schema changes are deployed as tar flies
No rollbacks
Schema changes are required to be incremental

ON MONITORING

THE TOOLS WE USE

GRAPHITE / OPENTSDB

BUT IT WASN'T ENOUGH

CAVE
HTTP://CAVELLC.GITHUB.IO
CONTINUOUS | AUDIT | VAULT |
ENTERPRISE

ALERT EXAMPLES [SCALA]


orders [shipTo: US].sum.5m < 1000
response-time [svc: svc-team, env: prod].p999 > 1000

SUMMARY

WHAT'S NEXT ?
BUILD YOUR NEXT FEATURE
IN A NEW SERVICE

QUESTION TIME
@yoni_goldberg
jgoldberg@gilt.com
www.yonigoldberg.com

You might also like