The InfoQ EMag Service Mesh Guide 1594819347902
The InfoQ EMag Service Mesh Guide 1594819347902
Service Mesh
Ultimate Guide
Service Mesh
Ultimate Guide
IN THIS ISSUE
PRODUCTION EDITOR Ana Ciobotaru / COPY EDITORS Lawrence Nyveen & Susan Conant / DESIGN Dragos Balasoiu
GENERAL FEEDBACK feedback@infoq.com / ADVERTISING sales@infoq.com / EDITORIAL editors@infoq.com
The InfoQ eMag / Issue #86/ July 2020
Key Takeaways
• A service mesh manages all ser- • As a service mesh is on the critical
vice-to-service communication path for every request being handled
within a distributed (potentially within the system, it can also provide
microservice-based) software sys- additional “observability,” such as
tem. It accomplishes this typically distributed tracing of a request, fre-
via the use of “sidecar” proxies that quency of HTTP error codes, global
are deployed alongside each service and service-to-service latency.
through which all traffic is transpar-
• There are clear benefits provided by
ently routed.
the use of a service mesh, but the
• Proxies used within a service mesh tradeoffs of added complexity and
are typically “application layer” the requirement of additional runtime
aware (operating at Layer 7 in the resources should be analyzed.
OSI networking stack). This means
• Service mesh technology is rapidly
that traffic routing decisions and the
becoming part of the (cloud native)
labeling of metrics can draw upon
application platform “plumbing.” The
data in HTTP headers or other appli-
interesting innovation within this
cation layer protocol metadata.
space is happening in relation to the
• A service mesh provides dynamic higher-level abstractions and the
service discovery and traffic man- human-focused control planes.
agement, including traffic shad-
• Popular service meshes include:
owing (duplicating) for testing, and
Linkerd, Istio, Consul, Kuma, and
traffic splitting for canary releasing,
Maesh. Supporting technologies
incremental rollout, and A/B type
within this space include: Layer
experimentation.
7-aware proxies, such as Envoy,
• A service mesh also supports the HAProxy, NGINX, and MOSN; and
implementation and enforcement of service mesh orchestration, visual-
cross cutting requirements, such as ization, and understandability tool-
security (providing service identity ing, such as SuperGloo, Kiali, and
and TLS) and reliability (rate limiting, Dive.
circuit-breaking).
3
The InfoQ eMag / Issue #86 / July 2020
4
The InfoQ eMag / Issue #86/ July 2020
The Service
Mesh Pattern
5
The InfoQ eMag / Issue #86 / July 2020
6
Service Mesh Architecture: Looking Under the Hood
A control plane “supervises the Istio architecture, demonstrating the how the control
work,” and takes all the individual plane and proxy data plane interact (courtesy of
instances of the data plane — a the Istio documentation)
7
The InfoQ eMag / Issue #86 / July 2020
Use Cases
There are a variety of use cases headers or other application layer (non idempotent) HTTP POST
that a service mesh can enable or protocol metadata. requests.
support.
Service-to-Service Observability of Traffic
Dynamic Service Discovery and Communication Reliability As a service mesh is on the
Routing A service mesh supports the critical path for every request
A service mesh provides dynam- implementation and enforce- being handled within the system,
ic service discovery and traffic ment of cross cutting reliability it can also provide additional
management, including traffic requirements, such as request “observability,” such as distribut-
shadowing (duplicating) for retries, timeouts, rate limiting, ed tracing of a request, frequency
testing, and traffic splitting for and circuit-breaking. A service of HTTP error codes, global and
canary releasing and A/B type mesh is often used to compen- service-to-service latency. Al-
experimentation. sate (or encapsulate) dealing though a much overused phrase
with the eight fallacies of dis- in the enterprise space, service
Proxies used within a service tributed computing. It should be meshes are often proposed as a
mesh are typically “application noted that a service mesh can method to capture all of the data
layer” aware (operating at Layer 7 only offer wire-level reliability necessary to implement a “sin-
in the OSI networking stack). This support (such as retrying an gle pane of glass” view of traffic
means that traffic routing deci- HTTP request), and ultimately flows within the entire system.
sions and the labeling of metrics the service should be respon-
can draw upon data in HTTP sible for any related business
impact such as avoiding multiple
8
Communication Security sometimes tempted to anoint easiest approach to manage, but
9
Service Mesh Service Mesh Comparisons:
The InfoQ eMag / Issue #86 / July 2020
• Linkerd tutorial
10
The InfoQ eMag / Issue #86/ July 2020
History of the Service Mesh
InfoQ has been tracking the In late 2014, Netflix released Even service meshes that
topic that we now call service an entire suite of JVM-based util- emerged outside of the unicorns,
mesh since late 2013, when ities including Prana, a “sidecar” such as HashiCorp’s Consul,
Airbnb released SmartStack, process that allowed application took inspiration from the afore-
which offered an out-of-process services written in any language mentioned technology, often
service discovery mechanism to communicate via HTTP to aiming to implement the CoreOS
(using HAProxy) for the emerg- standalone instances of the li- coined concept of "GIFEE”; Goo-
ing “microservices” style archi- braries. In 2016, the NGINX team gle infrastructure for everyone
tecture. Many of the previously began talking about “The Fabric else.
labeled “unicorn” organizations Model,” which was very similar to
were working on similar technol- a service mesh, but required the For a deep-dive into the history
ogies before this date. From the use of their commercial NGINX of how the modern service mesh
early 2000s Google was devel- Plus product for implementation. concept evolved, Phil Calça-
oping its Stubby RPC framework do has written a comprehensive
that evolved into gRPC, and Other highlights from the his- article "Pattern: Service Mesh.”
the Google Frontend (GFE) and tory of the service mesh in-
Global Software Load Balanc- clude the releases of Istio in
er (GSLB), traits of which can May 2017, Linkerd 2.0 in July
be seen in Istio. In the earlier 2018, Consul Connect and Super-
2010s, Twitter began work on Gloo in November 2018, service
the Scala-powered Finagle from mesh interface (SMI) in May
which the Linkerd service mesh 2019, and Maesh and Kuma in
emerged. September 2019.
11
Exploring the (Possible) Future of Service Meshes
The InfoQ eMag / Issue #86 / July 2020
As service mesh technology interoperability across different The Buoyant team is leading the
is still within the early adop- service mesh technologies in- charge with developing effective
tion phase, there is a lot of scope cluding Istio, Linkerd, and Consul human-centric control planes for
for future work. Broadly speaking, Connect. service mesh technology. They
there are four areas of particular have recently released Dive, a
interest: adding support for use The topic of integrating service SaaS-based “team control plane”
cases beyond RPC, standard- meshes with the platform fabric for platform teams operating Ku-
izing the interface and opera- can be further divided into two bernetes. Dive adds higher-level,
tions, pushing the service mesh sub-topics. human-focused, functionality on
further into the platform fabric, top of the Linkerd service mesh,
and building effective human First, there is work being con- and provides a service catalog,
control planes for service mesh ducted to reduce the networking an audit log of application releas-
technology. overhead introduced by a service es, a global service topology, and
mesh data plane. This includes more.
Kasun Indrasiri has explored the data plane development kit
"The Potential for Using a Service (DPDK), which is a userspace
Mesh for Event-Driven Messag- application that "bypasses the
ing,” in which he discussed two heavy layers of the Linux ker-
main emerging architectural nel networking stack and talks
patterns for implementing mes- directly to the network hard-
saging support within a service ware," and work by the Cilium
mesh: the protocol proxy sidecar, team that utilizes the extend-
and the HTTP bridge sidecar. ed Berkley Packet Filter (eBPF)
This is an active area of devel- functionality in the Linux kernel
opment within the service mesh for "very efficient networking,
community, with the work to- policy enforcement, and load
wards supporting Apache Kafka balancing functionality." Another
within Envoy attracting a fair team is mapping the concept of a
amount of attention. service mesh to L2/L3 payloads
with Network Service Mesh, as an
Christian Posta has previously attempt to “re-imagine network
written about attempts to stan- function virtualization (NFV) in a
dardize the usage of service cloud-native way.”
meshes in “Towards a Unified,
Standard API for Consolidating Second, there are multiple initia-
Service Meshes.” This arti- tives to integrate service meshes
cle also discusses the Service more tightly with public cloud
Mesh Interface (SMI) that was platforms, as seen in the intro-
recently announced by Micro- duction of AWS App Mesh, GCP
soft and partners at KubeCon Traffic Director, and Azure Ser-
EU. The SMI defines a set of vice Fabric Mesh.
common and portable APIs that
aims to provide developers with
12
The InfoQ eMag / Issue #86/ July 2020
13
FAQ
What is a service mesh? If I am deploying microservices, Shouldn’t a service mesh be
The InfoQ eMag / Issue #86 / July 2020
A service mesh is a technology do I need a service mesh? part of Kubernetes or the “cloud
that manages all service-to-ser- Not necessarily. A service mesh native platform” that applications
vice, “east-west,” traffic within a adds operational complexity to are being deployed onto?
distributed (potentially microser- the technology stack, and there- Potentially. There is an argu-
vice-based) software system. It fore is typically only deployed if ment for maintaining separation
provides both business-focused the organization is having trouble of concerns within cloud na-
functional operations, such as scaling service-to-service com- tive platform components (e.g.,
routing, and nonfunctional sup- munication, or has a specific use Kubernetes is responsible for
port, for example, enforcing se- case to resolve. providing container orchestration
curity policies, quality of service, and a service mesh is responsi-
and rate limiting. It is typically Do I need a service mesh to ble for service-to-service com-
(although not exclusively) im- implement service discovery with munication). However, work is
plemented using sidecar proxies microservices? underway to push service mesh-
through which all services com- No. A service mesh provides one like functionality into modern
municate through. way of implementing service dis- Platform-as-a-Service (PaaS)
covery. Other solutions include offerings.
How does a service mesh differ language-specific libraries (such
from an API gateway? as Ribbon and Eureka, or Finagle) How do I implement, deploy, or
A service mesh manages all rollout a service mesh?
service-to-service, “east-west,” Does a service mesh add The best approach would be to
traffic within a distributed (po- overhead/latency to my service- analyse the various service mesh
tentially microservice-based) to-service communication? products (see above), and follow
software system. It provides Yes, a service mesh adds at least the implementation guidelines
both business-focused function- two extra network hops when a specific to the chosen mesh. In
al operations, such as routing, service is communicating with general, it is best to work with all
and nonfunctional support, for another service (the first is from stakeholders and incrementally
example, enforcing security pol- the proxy handling the source’s deploy any new technology into
icies, quality of service, and rate outbound connection, and the production.
limiting. second is from the proxy han-
dling the destination’s inbound Can I build my own service
An API gateway manages all connection). However, this addi- mesh?
ingress, “north-south,” traffic into tional network hop typically oc- Yes, but the more pertinent ques-
a cluster, and provides addition- curs over the localhost or loop- tion is should you? Is building a
al support for cross-functional back network interface, and adds service mesh a core competen-
communication requirements. It only a small amount of latency cy of your organization? Could
acts as the single entry point into (on the order of milliseconds). you be providing value to your
a system and enables multiple Experimenting with and under- customers in a more effective
APIs or services to act cohesively standing whether this is an issue way? Are you also committed
and provide a uniform experience for the target use case should be to maintaining your own mesh,
to the user. part of the analysis and evalua- patching it for security issues,
tion of a service mesh. and constantly updating it to take
advantage of new technologies?
With the range of open source
14
and commercial service mesh Can the words “Istio” and
15
Additional Resources
The InfoQ eMag / Issue #86 / July 2020
• The Service Mesh: What Every Software Engineer Needs to Know about the World’s Most Over-Hyped
Technology
• Service Meshes
Glossary
API gateway: Manages all ingress (north-south) traffic into a cluster, and provides additional. It acts as
the single entry point into a system and enables multiple APIs or services to act cohesively and provide a
uniform experience to the user.
Control plane: Takes all the individual instances of the data plane (proxies) and turns them into a distrib-
uted system that can be visualized and controlled by an operator.
Data plane: A proxy that conditionally translates, forwards, and observes every network packet that flows
to and from a service network endpoint.
East-West traffic: Network traffic within a data center, network, or Kubernetes cluster. Traditional network
diagrams were drawn with the service-to-service (inter-data center) traffic flowing from left to right (east
to west) in the diagrams.
Envoy Proxy: An open-source edge and service proxy, designed for cloud-native applications. Envoy is
often used as the data plane within a service mesh implementation.
Ingress traffic: Network traffic that originates from outside the data center, network, or Kubernetes cluster.
Istio: C++ (data plane) and Go (control plane)-based service mesh that was originally created by Google
and IBM in partnership with the Envoy team from Lyft.
Kubernetes: A CNCF-hosted container orchestration and scheduling framework that originated from
Google.
16
Linkerd: A Rust (data plane) and Go (control plane) powered service mesh that was derived from an early
Maesh: A Go-based service mesh from Containous, the maintainers of the Traefik API gateway.
MOSN: A Go-based proxy from the Ant Financial team that implements the (Envoy) xDS APIs.
North-South traffic: Network traffic entering (or ingressing) into a data center, network, or Kubernetes
cluster. Traditional network diagrams were drawn with the ingress traffic entering the data center at the
top of the page and flowing down (north to south) into the network.
Service mesh: Manages all service-to-service (east-west) traffic within a distributed (potentially micros-
ervice-based) software system. It provides both functional operations, such as routing, and nonfunctional
support, for example, enforcing security policies, quality of service, and rate limiting.
Service Mesh Interface (SMI): A work-in-progress standard interface for service meshes deployed onto
Kubernetes.
Service mesh policy: A specification of how a collection of services/endpoints are allowed to communi-
cate with each other and other network endpoints.
Sidecar: A deployment pattern, in which an additional process, service, or container is deployed alongside
an existing service (think motorcycle sidecar).
Single pane of glass: A UI or management console that presents data from multiple sources in a unified
display.
Traffic shaping: Modifying the flow of traffic across a network, for example, rate limiting or load shedding.
17
InfoQ @ InfoQ InfoQ InfoQ
Curious about
previous issues?
The InfoQ eMag / Issue #81 / January 2020 The InfoQ eMag / Issue #79 / November 2019 The InfoQ eMag / Issue #77 / October 2019
.NET
Testing, Observing, Systems in Production
and Understanding
Core 3
@emilywithcurls
Tyler Treat on Navigating Interview with Using the .Net Core Template An Engineer’s Sustainable Operations Testing in
12 Microservices Obscuring
Microservice the .NET Scott Hunter on Engine to Create Custom Guide to a Good in Complex Systems with Production—Quality
Testing Techniques Complexity
Observability Ecosystem .NET Core 3.0 Templates and Projects Night’s Sleep Production Excellence Software, Faster
FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT
This eMag takes a deep In this eMag we explore To tame complexity and its
dive into the techniques and some more of the benefits effects, organizations need
culture changes required of .NET Core and how it can a structured, multi-pronged,
to successfully test, benefit not only traditional human-focused approach,
observe, and understand .NET developers, but all that: makes operations
microservices. technologists who need to work sustainable, centers
bring robust, performant decisions around customer
and economical solutions to experience, uses continuous
market. testing, and includes chaos
engineering and system
observability. In this eMag,
we cover all of these topics.