2020 DistributedTracinginPracticeIns
- (Parker et al., 2020) ⇒ A. Parker, D. Spoonhower, J. Mace, B. Sigelman, and R. Isaacs. (2020). “Distributed Tracing in Practice: Instrumenting, Analyzing, and Debugging Microservices.” O'Reilly Media. ISBN:9781492056607
Subject Headings: Software System Tracing, Distributed Application Tracing.
Notes
Cited By
2021
Quotes
Book Overview
https://www.oreilly.com/library/view/distributed-tracing-in/9781492056621/
Since most applications today are distributed in some fashion, monitoring their health and performance requires a new approach. Enter distributed tracing, a method of profiling and monitoring distributed applications — particularly those that use microservice architectures. There’s just one problem: distributed tracing can be hard. But it doesn’t have to be.
With this guide, you’ll learn what distributed tracing is and how to use it to understand the performance and operation of your software. Key players at LightStep and other organizations walk you through instrumenting your code for tracing, collecting the data that your instrumentation produces, and turning it into useful operational insights. If you want to implement distributed tracing, this book tells you what you need to know.
You’ll learn:
- The pieces of a distributed tracing deployment: instrumentation, data collection, and analysis
- Best practices for instrumentation: methods for generating trace data from your services
- How to deal with (or avoid) overhead using sampling and other techniques
- How to use distributed tracing to improve baseline performance and to mitigate regressions quickly
- Where distributed tracing is headed in the future
Table of Contents
0. Introduction: What Is Distributed Tracing? Distributed Architectures and You Deep Systems The Difficulties of Understanding Distributed Architectures How Does Distributed Tracing Help? Distributed Tracing and You Conventions Used in This Book Using Code Examples O’Reilly Online Learning How to Contact Us Acknowledgments
1. The Problem with Distributed Tracing The Pieces of a Distributed Tracing Deployment Distributed Tracing, Microservices, Serverless, Oh My! The Benefits of Tracing Setting the Table
2. An Ontology of Instrumentation
White Box Versus Black Box
Application Versus System
Agents Versus Libraries
Propagating Context
Interprocess Propagation
Intraprocess Propagation
The Shape of Distributed Tracing
Tracing-Friendly Microservices and Serverless
Tracing in a Monolith
Tracing in Web and Mobile Clients
3. Open Source Instrumentation: Interfaces, Libraries, and Frameworks
The Importance of Abstract Instrumentation
OpenTelemetry
OpenTracing and OpenCensus
OpenTracing
OpenCensus
Other Notable Formats and Projects
X-Ray
Zipkin
Interoperability and Migration Strategies
Why Use Open Source Instrumentation?
Interoperability
Portability
Ecosystem and Implicit Visibility
4. Best Practices for Instrumentation
Tracing by Example
Installing the Sample Application
Adding Basic Distributed Tracing
Custom Instrumentation
Where to Start—Nodes and Edges
Framework Instrumentation
Service Mesh Instrumentation
Creating Your Service Graph
What’s in a Span?
Effective Naming
Effective Tagging
Effective Logging
Understanding Performance Considerations
Trace-Driven Development
Developing with Traces
Testing with Traces
Creating an Instrumentation Plan
Making the Case for Instrumentation
Instrumentation Quality Checklist
Knowing When to Stop Instrumenting
Smart and Sustainable Instrumentation Growth
5. Deploying Tracing
Organizational Adoption
Start Close to Your Users
Start Centrally: Load Balancers and Gateways
Leverage Infrastructure: RPC Frameworks and Service Meshes
Make Adoption Repeatable
Tracer Architecture
In-Process Libraries
Sidecars and Agents
Collectors
Centralized Storage and Analysis
Incremental Deployment
Data Provenance, Security, and Federation
Frontend Service Telemetry
Server-Side Telemetry for Managed Services
6. Overhead, Costs, and Sampling
Application Overhead
Latency
Throughput
Infrastructure Costs
Network
Storage
Sampling
Minimum Requirements
Strategies
Selecting Traces
Off-the-Shelf ETL Solutions
7. A New Observability Scorecard
The Three Pillars Defined
Metrics
Logging
Distributed Tracing
Fatal Flaws of the Three Pillars
Design Goals
Assessing the Three Pillars
Three Pipes (Not Pillars)
Observability Goals and Activities
Two Goals in Observability
Two Fundamental Activities in Observability
A New Scorecard
The Path Ahead
8. Improving Baseline Performance
Measuring Performance
Percentiles
Histograms
Defining the Critical Path
Approaches to Improving Performance
Individual Traces
Biased Sampling and Trace Comparison
Trace Search
Multimodal Analysis
Aggregate Analysis
Correlation Analysis
9. Restoring Baseline Performance
Defining the Problem
Human Factors
(Avoiding) Finger-Pointing
“Suppressing” the Messenger
Incident Hand-off
Good Postmortems
Approaches to Restoring Performance
Integration with Alerting Workflows
Individual Traces
Biased Sampling
Real-Time Response
Knowing What’s Normal
Aggregate and Correlation Root Cause Analysis
10. Are We There Yet? The Past and Present
Distributed Tracing: A History of Pragmatism
Request-Based Systems
Response Time Matters
Request-Oriented Information
Notable Work
Pinpoint
Magpie
X-Trace
Dapper
Where to Next?
11. Beyond Individual Requests
The Value of Traces in Aggregate
Example 1: Is Network Congestion Affecting My Application?
Example 2: What Services Are Required to Serve an API Endpoint?
Organizing the Data
A Strawperson Solution
What About the Trade-offs?
Sampling for Aggregate Analysis
The Processing Pipeline
Incorporating Heterogeneous Data
Custom Functions
Joining with Other Data Sources
Recap and Case Study
The Value of Traces in Aggregate
Organizing the Data
Sampling for Aggregate Analysis
The Processing Pipeline
Incorporating Heterogeneous Data
12. Beyond Spans
Why Spans Have Prevailed
Visibility
Pragmatism
Portability
Compatibility
Flexibility
Why Spans Aren’t Enough
Graphs, Not Trees
Inter-Request Dependencies
Decoupled Dependencies
Distributed Dataflow
Machine Learning
Low-Level Performance Metrics
New Abstractions
Seeing Causality
13. Beyond Distributed Tracing
Limitations of Distributed Tracing
Challenge 1: Anticipating Problems
Challenge 2: Completeness Versus Costs
Challenge 3: Open-Ended Use Cases
Other Tools Like Distributed Tracing
Census
A Motivating Example
A Distributed Tracing Solution?
Tag Propagation and Local Metric Aggregation
Comparison to Distributed Tracing
Pivot Tracing
Dynamic Instrumentation
Recurring Problems
How Does It Work?
Dynamic Context
Comparison to Distributed Tracing
Pythia
Performance Regressions
Design
Overheads
Comparison to Distributed Tracing
14. The Future of Context Propagation
Cross-Cutting Tools
Use Cases
Distributed Tracing
Cross-Component Metrics
Cross-Component Resource Management
Managing Data Quality Trade-offs
Failure Testing of Microservices
Enforcing Cross-System Consistency
Request Duplication
Record Lineage in Stream Processing Systems
Auditing Security Policies
Testing in Production
Common Themes
Should You Care?
The Tracing Plane
Is Baggage Enough?
Beyond Key-Value Pairs
Compiling BDL
BaggageContext
Merging
Overheads
A. The State of Distributed Tracing Circa 2020
Open Source Tracers and Trace Analysis
Commercial Tracers and Trace Analyzers
Language-Specific Tracing Features
Java and C#
Go, Rust, and C++
Python, JavaScript, and Other Dynamic Languages
B. Context Propagation in OpenTelemetry
Why a Separate Context Model?
The OpenTelemetry Context Model
W3C CorrelationContext and the Correlations API
Distributed and Local Context
Examples and Potential Applications
References
;
| Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2020 DistributedTracinginPracticeIns | A. Parker D. Spoonhower J. Mace B. Sigelman R. Isaacs | Distributed Tracing in Practice: Instrumenting, Analyzing, and Debugging Microservices | 2020 |