2020 DistributedTracinginPracticeIns

(Parker et al., 2020) ⇒ A. Parker, D. Spoonhower, J. Mace, B. Sigelman, and R. Isaacs. (2020). “Distributed Tracing in Practice: Instrumenting, Analyzing, and Debugging Microservices.” O'Reilly Media. ISBN:9781492056607

Subject Headings: Software System Tracing, Distributed Application Tracing.

Notes

Cited By

http://scholar.google.com/scholar?q=%222020%22+Distributed+Tracing+in+Practice%3A+Instrumenting%2C+Analyzing%2C+and+Debugging+Microservices

2021

Quotes

Book Overview

https://www.oreilly.com/library/view/distributed-tracing-in/9781492056621/

Since most applications today are distributed in some fashion, monitoring their health and performance requires a new approach. Enter distributed tracing, a method of profiling and monitoring distributed applications — particularly those that use microservice architectures. There’s just one problem: distributed tracing can be hard. But it doesn’t have to be.

With this guide, you’ll learn what distributed tracing is and how to use it to understand the performance and operation of your software. Key players at LightStep and other organizations walk you through instrumenting your code for tracing, collecting the data that your instrumentation produces, and turning it into useful operational insights. If you want to implement distributed tracing, this book tells you what you need to know.

You’ll learn:

The pieces of a distributed tracing deployment: instrumentation, data collection, and analysis
Best practices for instrumentation: methods for generating trace data from your services
How to deal with (or avoid) overhead using sampling and other techniques
How to use distributed tracing to improve baseline performance and to mitigate regressions quickly
Where distributed tracing is headed in the future

0. Introduction: What Is Distributed Tracing?
   Distributed Architectures and You
   Deep Systems
   The Difficulties of Understanding Distributed Architectures
   How Does Distributed Tracing Help?
   Distributed Tracing and You
   Conventions Used in This Book
   Using Code Examples
   O’Reilly Online Learning
   How to Contact Us
   Acknowledgments

1. The Problem with Distributed Tracing
   The Pieces of a Distributed Tracing Deployment
   Distributed Tracing, Microservices, Serverless, Oh My!
   The Benefits of Tracing
   Setting the Table

2. An Ontology of Instrumentation
   White Box Versus Black Box
   Application Versus System
   Agents Versus Libraries
   Propagating Context
       Interprocess Propagation
       Intraprocess Propagation
   The Shape of Distributed Tracing
       Tracing-Friendly Microservices and Serverless
       Tracing in a Monolith
       Tracing in Web and Mobile Clients

3. Open Source Instrumentation: Interfaces, Libraries, and Frameworks
   The Importance of Abstract Instrumentation
   OpenTelemetry
   OpenTracing and OpenCensus
       OpenTracing
       OpenCensus
   Other Notable Formats and Projects
       X-Ray
       Zipkin
   Interoperability and Migration Strategies
   Why Use Open Source Instrumentation?
       Interoperability
       Portability
       Ecosystem and Implicit Visibility

4. Best Practices for Instrumentation
   Tracing by Example
       Installing the Sample Application
       Adding Basic Distributed Tracing
       Custom Instrumentation
   Where to Start—Nodes and Edges
       Framework Instrumentation
       Service Mesh Instrumentation
       Creating Your Service Graph
   What’s in a Span?
       Effective Naming
       Effective Tagging
       Effective Logging
       Understanding Performance Considerations
   Trace-Driven Development
       Developing with Traces
       Testing with Traces
   Creating an Instrumentation Plan
       Making the Case for Instrumentation
       Instrumentation Quality Checklist
       Knowing When to Stop Instrumenting
       Smart and Sustainable Instrumentation Growth

5. Deploying Tracing
   Organizational Adoption
       Start Close to Your Users
       Start Centrally: Load Balancers and Gateways
       Leverage Infrastructure: RPC Frameworks and Service Meshes
       Make Adoption Repeatable
   Tracer Architecture
       In-Process Libraries
       Sidecars and Agents
       Collectors
       Centralized Storage and Analysis
       Incremental Deployment
   Data Provenance, Security, and Federation
       Frontend Service Telemetry
       Server-Side Telemetry for Managed Services

6. Overhead, Costs, and Sampling
   Application Overhead
       Latency
       Throughput
   Infrastructure Costs
       Network
       Storage
   Sampling
       Minimum Requirements
       Strategies
       Selecting Traces
   Off-the-Shelf ETL Solutions

7. A New Observability Scorecard
   The Three Pillars Defined
       Metrics
       Logging
       Distributed Tracing
   Fatal Flaws of the Three Pillars
       Design Goals
       Assessing the Three Pillars
       Three Pipes (Not Pillars)
   Observability Goals and Activities
       Two Goals in Observability
       Two Fundamental Activities in Observability
       A New Scorecard
       The Path Ahead

8. Improving Baseline Performance
   Measuring Performance
       Percentiles
       Histograms
   Defining the Critical Path
   Approaches to Improving Performance
       Individual Traces
       Biased Sampling and Trace Comparison
       Trace Search
       Multimodal Analysis
       Aggregate Analysis
       Correlation Analysis

9. Restoring Baseline Performance
   Defining the Problem
   Human Factors
       (Avoiding) Finger-Pointing
       “Suppressing” the Messenger
       Incident Hand-off
       Good Postmortems
   Approaches to Restoring Performance
       Integration with Alerting Workflows
       Individual Traces
       Biased Sampling
       Real-Time Response
       Knowing What’s Normal
       Aggregate and Correlation Root Cause Analysis

10. Are We There Yet? The Past and Present
   Distributed Tracing: A History of Pragmatism
       Request-Based Systems
       Response Time Matters
       Request-Oriented Information
   Notable Work
       Pinpoint
       Magpie
       X-Trace
       Dapper
   Where to Next?

11. Beyond Individual Requests
   The Value of Traces in Aggregate
       Example 1: Is Network Congestion Affecting My Application?
       Example 2: What Services Are Required to Serve an API Endpoint?
   Organizing the Data
       A Strawperson Solution
   What About the Trade-offs?
   Sampling for Aggregate Analysis
   The Processing Pipeline
   Incorporating Heterogeneous Data
   Custom Functions
       Joining with Other Data Sources
   Recap and Case Study
       The Value of Traces in Aggregate
       Organizing the Data
       Sampling for Aggregate Analysis
       The Processing Pipeline
       Incorporating Heterogeneous Data

12. Beyond Spans
   Why Spans Have Prevailed
       Visibility
       Pragmatism
       Portability
       Compatibility
       Flexibility
   Why Spans Aren’t Enough
       Graphs, Not Trees
       Inter-Request Dependencies
       Decoupled Dependencies
       Distributed Dataflow
       Machine Learning
       Low-Level Performance Metrics
   New Abstractions
   Seeing Causality

13. Beyond Distributed Tracing
   Limitations of Distributed Tracing
       Challenge 1: Anticipating Problems
       Challenge 2: Completeness Versus Costs
       Challenge 3: Open-Ended Use Cases
   Other Tools Like Distributed Tracing
   Census
       A Motivating Example
       A Distributed Tracing Solution?
       Tag Propagation and Local Metric Aggregation
       Comparison to Distributed Tracing
   Pivot Tracing
       Dynamic Instrumentation
       Recurring Problems
       How Does It Work?
       Dynamic Context
       Comparison to Distributed Tracing
   Pythia
       Performance Regressions
       Design
       Overheads
       Comparison to Distributed Tracing

14. The Future of Context Propagation
   Cross-Cutting Tools
   Use Cases
       Distributed Tracing
       Cross-Component Metrics
       Cross-Component Resource Management
       Managing Data Quality Trade-offs
       Failure Testing of Microservices
       Enforcing Cross-System Consistency
       Request Duplication
       Record Lineage in Stream Processing Systems
       Auditing Security Policies
       Testing in Production
   Common Themes
   Should You Care?
   The Tracing Plane
       Is Baggage Enough?
       Beyond Key-Value Pairs
       Compiling BDL
       BaggageContext
       Merging
       Overheads

A. The State of Distributed Tracing Circa 2020
   Open Source Tracers and Trace Analysis
   Commercial Tracers and Trace Analyzers
   Language-Specific Tracing Features
       Java and C#
       Go, Rust, and C++
       Python, JavaScript, and Other Dynamic Languages

B. Context Propagation in OpenTelemetry
   Why a Separate Context Model?
   The OpenTelemetry Context Model
       W3C CorrelationContext and the Correlations API
       Distributed and Local Context
   Examples and Potential Applications

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2020 DistributedTracinginPracticeIns	A. Parker D. Spoonhower J. Mace B. Sigelman R. Isaacs			Distributed Tracing in Practice: Instrumenting, Analyzing, and Debugging Microservices						2020