Introduction to Service Mesh Interface
Service Mesh Interface (SMI) is a Kubernetes-native specification under active development. It’s defining a standard set of Kubernetes Custom Resource Definitions (CRDs) and APIs for service meshes. The SMI development process is open, relying on community participation, and the specification itself is publicly available under an Apache 2.0 license.
As stated on the SMI website, “the goal of the SMI API is to provide a common, portable set of service mesh APIs which a Kubernetes user can use in a provider agnostic manner. In this way people can define applications that use service mesh technology without tightly binding to any specific implementation.” SMI doesn’t include a service mesh implementation; it’s solely a specification that service mesh implementations can use.
Without a standard like SMI, every service mesh technology would have completely different APIs. That would make things more complicated and time-consuming for developers and for platform and DevOps teams. Instead of using SMI APIs, developers would have to write their own code to provide equivalent functionality. Integration with other technologies and tools would take more effort, as would changing which service mesh technology an app uses.
Here, we’ll take a closer look at four SMI APIs that help to manage, secure, and observe service-to-service traffic within a cluster:
- Traffic Specs
- Traffic Split
- Traffic Access Control
- Traffic Metrics
All of these APIs involve service mesh concepts we examine in this piece about resilience, specifically load balancing; we’ll be building on that knowledge and focusing specifically on examples of what these APIs support and how they can be used.
Note that adoption of the SMI APIs by service mesh technologies is ongoing. Also, SMI itself is rapidly evolving, with updates being made all the time. For the foreseeable future, it is likely that at any given time, common service mesh technologies will offer different degrees of support for each of the SMI APIs.
Traffic Specs API
The Traffic Specs API allows you to define routes that an app can use. This is hard to explain without an example, so let’s look at one to show what that means.
This example illustrates how a resource named m-routes
can be defined using HTTPRouteGroup
from the Traffic Specs API. It’s based on an example in the SMI specification. It will match on any HTTP GET request the app sees with the string “/metrics
” in its path.
By itself, HTTPRouteGroup
doesn’t do anything. It matches, but it doesn’t act on the match in any way. It’s meant to be used by other SMI APIs that do act. So, for example, the Traffic Split API could reference this group in one of its own resources, like declaring that traffic matching the m-routes
definition should be split evenly between two versions of a particular microservice.
kind: HTTPRouteGroup
metadata:
name: m-routes
spec:
matches:
- name: metrics
pathRegex: "/metrics"
methods:
- GET
Traffic Split API
The Traffic Split API allows you to implement traffic splitting and traffic shifting methods like A/B testing, blue-green deployment, and canary deployment.
Here’s an example of defining a TrafficSplit
for a canary deployment. Under backends
, there are two services named: e8-widget-svc-current
and e8-widget-svc-patch
. These are the services (actually microservices) that new requests will be routed to. There’s a weight
assigned to each service. These look like percentages--75 and 25, which add up to 100--but they’re not actually percentages. You could specify 750 and 250, or 3 and 1, or any other pair of numbers that have that 3 to 1 ratio, to achieve the same results.
kind: TrafficSplit
metadata:
name: e8-feature-test
namespace: e8app
spec:
service: e8-widget-svc
backends:
- service: e8-widget-svc-current
weight: 75
- service: e8-widget-svc-patch
weight: 25
Over time, to continue the canary deployment, you would update the weights: for example, you might next set these to 50 and 50. If that goes well, then the next step might be 25 and 75, with 0 and 100 being the last step.
To use TrafficSplit
for A/B testing, you would assign the same weight to each service.
For a blue-green deployment with TrafficSplit
, you would initially have the weight for the blue environment service set to 100 and the weight for the green environment service set to 0. When you’re ready to switch from the blue environment to the green environment, you would swap the 100 and 0 weights so that all new requests are routed to the green environment service.
If you want to do a traffic split with three or more services, you would simply list each of the services under backends
and assign a weight to each of them. Here’s an example where the baseline version of a service gets half of the requests and two other versions testing new features each get one-fourth of the requests:
kind: TrafficSplit
metadata:
name: e8-feature-test
namespace: e8app
spec:
service: e8-render-svc
backends:
- service: e8-render-svc-baseline
weight: 50
- service: e8-render-svc-feature1
weight: 25
- service: e8-render-svc-feature2
weight: 25
Traffic Access Control API
The Traffic Access Control API allows you to set access control policies for pod-to-pod (service proxy to service proxy) communications based on service proxy identity. When you use this API, by default all traffic is denied. You have to explicitly grant permission for any types of traffic you want to allow.
Here’s an example of defining a TrafficTarget
, based on an example from the SMI specification. Under spec
, three things are defined:
sources
, which specifies the pods that may be the sources of the traffic.destination
, which specifies the pods that may be the destinations of the traffic.rules
, which defines the characteristics the traffic must have in order to be allowed to reach its destination.
In this example, traffic is being allowed from pods with a prometheus
service account to pods with a service-a
service account when the traffic is being sent to port 8080. The traffic is only allowed when it matches the rules, which in this case is that the traffic matches the m-routesHTTPRouteGroup
example we looked at in the Traffic Specs API section. m-routes
will match on any HTTP GET request with the string “/metrics
” in its path.
To recap: in this TrafficTarget example,
- HTTP GET requests with the string “
/metrics
” in their path - that are sent from pods with a
prometheus
service account - and that are going to port 8080 on pods with a
service-a
service account - will be allowed.
kind: TrafficTarget
metadata:
name: path-specific
namespace: default
spec:
destination:
kind: ServiceAccount
name: service-a
namespace: default
port: 8080
rules:
- kind: HTTPRouteGroup
name: m-routes
matches:
- metrics
sources:
- kind: ServiceAccount
name: prometheus
namespace: default
Some of the elements in this example are optional. For example, if you don’t specify a port for the destination
, the TrafficTarget
will apply no matter what the destination port is. Additional elements can also be specified, like defining a port
for the sources so this TrafficTarget
would only apply to traffic with that source port. You can also have more than one entry under sources
, destination
, and rules
.
Traffic Metrics API
The Traffic Metrics API allows you to collect metrics on HTTP traffic and make those metrics available to other tools. Each metric involves a Kubernetes resource, either a lower-level one like a pod or a service, or a higher-level one like a namespace. Each metric is also limited to a particular edge, which is another term for the traffic’s source or destination. Note that an edge can be set as blank, which would match all traffic.
Here’s an example of defining TrafficMetrics
, based on an example from the SMI specification, it defines several things:
resource
, which specifies the source of the traffic to collect the metrics for.edge
, which specifies the destination of the traffic to collect the metrics for.timestamp
, which specifies when the definition was created.window
, which specifies the time period to be used for calculating the metrics. In this example, 30 seconds is specified, so metrics will be calculated on the past 30 seconds of activity.metrics
, which list the metrics to be collected. In this example, the metrics will include data on response latency and on successful and failed requests.
kind: TrafficMetrics
# See ObjectReference v1 core for full spec
resource:
name: foo-775b9cbd88-ntxsl
namespace: foobar
kind: Pod
edge:
direction: to
side: client
resource:
name: baz-577db7d977-lsk2q
namespace: foobar
kind: Pod
timestamp: 2019-04-08T22:25:55Z
window: 30s
metrics:
- name: p99_response_latency
unit: seconds
value: 10m
- name: p90_response_latency
unit: seconds
value: 10m
- name: p50_response_latency
unit: seconds
value: 10m
- name: success_count
value: 100
- name: failure_count
value: 100
The Traffic Metrics API is in its early stages at this time, so its support of individual metrics is quite limited.