Microservice - 500k CCU PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Microservice - 500K CCU

1
About me
Tran Xuan Viet (Viet Tran)
Solution Architect at 200lab

• Former Solution Architect at Sendo.

• Former CTO at Skylab

• Former Software Engineer at Foody.

viettranx@gmail.com

2
What is CCU?

• CCU = Concurrent User

• Total connected users at the same

• CCU is not Request Per Seconds

3
Agenda

• A simple high load system

• Monitoring & Tracing

• Monolith app to microservice (Sendo)

• Microservice with gRPC and Protobuf (Sendo)

• Common problems and solutions

4
A simple high load system

5
How to build a system can serve 500K
CCU?

6
First at all, we need to know what makes
our system slow

7
Monitoring & Logging

8
MySQL Monitoring

9
MySQL Monitoring (cont.)

10
MySQL Slow Query

11
And many monitor metrics for databases
& services

12
Tracing (Jaeger)

13
Distributed Tracing
with OpenCensus & Jaeger

14
Current tracing
/some_endpoint

30ms 400ms
GATEWAY SERVICE A SERVICE B

What happened in this service ?

We're tracing on each service, but the details in each !!!

15
Tracing on each service manually

func doSomething() {
SERVICE B
funcTimer := timer.Start("doSomething")
defer funcTimer.Stop()
logicATimer := timer.Start("logicA")
logicA()
logicATimer.Stop()
logicBTimer := timer.Start("logicA")
logicB()
logicBTimer.Stop()
}

16
Tracing on each service manually (cont.)

func doSomething() {
SERVICE B
funcTimer := timer.Start("doSomething")
defer funcTimer.Stop()
On Console:
logicATimer := timer.Start("logicA")
doSomething 400ms
logicA()
logicA 100ms
logicATimer.Stop()
logicB 300ms
logicBTimer := timer.Start("logicA")
logicB()
logicBTimer.Stop()
}

17
It works, but have some
disadvantages

18
Some disadvantages
Time

doSomething

Span LogicA

LogicB

Lack of something like a Span Tree as above

19
Some disadvantages (cont.)

SERVICE A SERVICE B SERVICE B1

SERVICE C SERVICE C1

SERVICE C2

Lack of some component as Collectors for all span data

20
Some disadvantages (cont.)

func doSomething() {
SERVICE B
funcTimer := timer.Start("doSomething")

CPU Overhead on production defer funcTimer.Stop()


logicATimer := timer.Start("logicA")
logicA()
logicATimer.Stop()
logicBTimer := timer.Start("logicA")
logicB()
logicBTimer.Stop()
}

21
That's why we need them

OpenCensus Jaeger

22
A single distribution of libraries that automatically collect
traces and metrics from your app, display them locally,
and send them to any backend. Learn more
• Distributed Trace Collection
• Low Overhead
• Backend Support.

Language Support:

Backend Support:

23
Jaeger is a distributed tracing system released as open
source by Uber Technologies. Learn more
• Distributed context propagation
• Distributed transaction monitoring
• Root cause analysis
• Service dependency analysis
• Performance / latency optimization

Storage Support:

24
Opensensus & Jaeger Deployment
Collector Node

UDP T-Channel Write


Golang +
Span Data Agent Collector

Service Node Storage

UDP Query
Golang +
Span Data Agent
Push

Service Node

. UI

.
.
Kafka
Kafka Broker
25
26
JAEGER UI
Microservice

27
Microservice

Microservices remove dependent

But add more complex for system

28
A use case from Sendo

29
Brief history of Sendo system

In 2012: Sendo system was based on Magento

30
Brief history of Sendo system (cont)

Image Source: Beesion Technologies

31
Brief history of Sendo system (cont)

In 2016: Sendo started to use microservices architecture


Image Source: DZone

32
Brief history of Sendo system (cont)

We use multiple languages to build the services


Image Source: Weaveworks

33
Problems

• High latency and low throughput

• Duplication data structures

• Hard to failure recovery on distributed system

34
We decided to migrate almost
source codes to

35
Microservices with
gRPC and Protobuf

36
Developer Workflow

37
A demo Protobuf file

38
A demo Protobuf file (cont.)

39
Generated files

40
Sendo Microservices
• Service Mesh with Envoy

• Service Discovery with Consul

• Load balancing with Nginx and Envoy

• Very high throughput with Protobuf

41
Problems solved so far

• High latency and low throughput



Use Go and gRPC for Inter-service communication.

• Duplication data structures



Use protobuf for generating data structures

• Hard to failure recover on distributed system



The hardest part. How ?

42
Control Plane with Istio

Image Source:
43 istio.io
Dynamic Routing

Image Source:
44 istio.io
Istio Mixer

Image Source:
45 istio.io
Istio

46
Kubernets Summary

47
Kubernets Summary (cont)

48
Logging with Fluentd & ElasticSearch

49
Visualize logging data: Grafana

50
Problems solved so far

• High latency and low throughput



Use Go and gRPC for Inter-service communication.

• Duplication data structures



Use protobuf for generating data structures

• Hard to failure recover on distributed system



Data Plane and Control Plane: Istio, Kubernetes

Logging system: Fluentd, Elastic Search, Grafana

Monitoring system: Netdata, Graphite, Grafana, Prometheus

Distributed Tracing: Jeager [,Opencensus]

51
Thank you

52

You might also like