April 14, 2021

Distributed Tracing : What is the "cost" of instrumentation ?

Distributed Tracing

Ok, what is hell is "distributed tracing" ?

Microservices Architecture is on the rise and is extensively used to power applications and services that we use on a daily basis.  Netflix, Amazon, eBay, to name a few, are based on microservice architecture.

With micro-service architecture a user request will typically span multiple services across different servers before stitching the response and sending it back to the user. The problem with this is monitoring, debug-ability, reduced global visibility.

Distributed tracing, also called distributed request tracing, is a method used to profile and monitor applications, especially those built using a microservices architecture. It refers to methods of observing requests as they propagate through distributed systems. It’s a diagnostic technique that reveals how a set of services coordinate to handle individual user requests. Distributed tracing requires that software developers add instrumentation to the code of an application.

OpenTracing provides API specification allowing to add instrumentation to the application code in a vendor neutral manner.

Cost Of Instrumentation

Services would usually talk to each other through some sort of IPC. There are many frameworks to allow such communication. Many frameworks provide inbuilt support for instrumentation, making it simple to enable distributed tracing. These framework usage and performance are studied extensively. The OpenTracing Blog is probably a good place to start.

So now let's come to the significant question "what is the cost of instrumentation" ?
What is the performance impact of adding instrumentation to the application code ?

For measuring the cost we will use the jaeger tracer and JMH to capture the metrics.

JMH provides an API to consume cpu cycles varying linearly with token value specified. We will use this to mock a long running job which will be instrumented.

// We need to consume or return the result to avoid JVM dead code optimization
public long processLongJob(long token) {
Blackhole.consumeCPU(token);
return token;
}

 

We will measure the metrics without any instrumentation, with NoOpTracer and JaegerTracer with default initialization values.

 

Benchmark

Param: token

AverageTime (ns/ops)

Error (99.9%)

Cost (Impact %)

NoInstrumentation

1000

1664.84177

49.940273

0

NoInstrumentation

5000

8324.19976

107.233129

0

NoInstrumentation

10000

16606.7095

60.983405

0

NoInstrumentation

50000

83249.7561

2113.50206

0

NoInstrumentation

100000

166090.738

1693.40512

0

NoInstrumentation

500000

829908.745

6188.7579

0

 

 

 

 

 

NoOpTracer

1000

1648.86056

4.521151

-0.960

NoOpTracer

5000

8280.19528

86.68164

-0.529

NoOpTracer

10000

16569.8697

134.783752

-0.222

NoOpTracer

50000

83254.7547

645.070768

0.006

NoOpTracer

100000

166211.527

979.291791

0.073

NoOpTracer

500000

833341.912

14473.6203

0.414

 

 

 

 

 

JaegerTracer

1000

2373.87521

2.188993

42.589

JaegerTracer

5000

10059.0192

18.01787

20.841

JaegerTracer

10000

18803.8006

63.222248

13.230

JaegerTracer

50000

87511.7573

651.314285

5.120

JaegerTracer

100000

172132.755

959.72657

3.638

JaegerTracer

500000

846543.298

8904.63822

2.004

As seen above No Instrumentation and NoOpTracer scores are comparable with virtually no impact on performance while JaegerTracer costs ~1.5x but decreases with increase in CPU cycles consumed.

Lets look at the average time per fixed-CPU-cycles. As CPU cycles consumed varies linearly with the token value we can divide the time by the token. Below table summarizes the above table into scores for the different instrumentation techniques


Token

Score (AverageTime / Token)

NoInstrumentation Score

NoOpTracer Score

JaegerTracer Score

1000

1.664841774

1.648860562

2.373875211

5000

1.664839952

1.656039056

2.011803834

10000

1.660670949

1.656986968

1.880380062

50000

1.664995121

1.665095094

1.750235147

100000

1.660907376

1.662115265

1.721327553

500000

1.65981749

1.666683825

1.693086597

Plotting the table with token on X Axis and the scores on Y axis it becomes clear that the more intensive the code being instrumented the lesser it has impact on performance.

Using the above measurements instrumenting any function whose total execution time > 1 ms, the cost of instrumentation is negligible.

That's all folks. Till next time.