As websites grow in scale and complexity there is a high likelihood that the number of services i...
As websites grow in scale and complexity there is a high likelihood that the number of services involved in responding to a given web or API request also grows. This increases the complexity of answering seemingly simple questions such as “why is this request slow?” One approach to gain better insight is to trace through all the service calls made and create a tree representing the overall execution profile.
Etsy recently built such a system, called CrossStitch. In the process we discovered that distributed tracing is not rocket science, in fact there are existing open source projects well-suited to do a lot of the heavy lifting required. You don’t need to adopt something like Zipkin and ingest all the dependencies that implies into your existing infrastructure. One realization, obvious in hindsight, is that building a distributed tracing system will face/solve some distributed problems. Additionally, if you get such a system up and running you are very well placed to roll your own log indexing system like Splunk.