Monitoring
Stackdriver
- It is the tool provided by GCP for monitoring, logging and debugging
- It gives insides to application health, performance and availability
- Stackdriver Logging lets define metrics based on logs. These metrics can be displayed on dashboards
- Stackdirver Error Reporting: tracks errors in applications and it can notify us when new errors are detected
- Stackdriver Trace: report on application latency and sampling
Stackdriver Debugger: it connects application production data to the source code for inspecting application state
- Dynamically discovers cloud resources and application services
- We can have deep visibility into our applications in minutes
- Provides us access to powerful data an analytics tools
- Stackdriver offers services for:
- Monitoring
- Logging
- Error Reporting
- Tracing
- Debugging
- Supports several third party integrations
Stackdriver Monitoring
- Is at the base os SRE
- It dynamically configures monitoring after resources are deployed
- Allows us to monitor platform, systems an application metrics
- Workspace:
- Is a root entity that holds monitoring and configuration information in Stackdriver Monitoring
- We can have as many workspaces as we want, GCP projects can’t be monitored by more than one workspace
- The first monitored project is the Hosting Project and needs to be specified at the workspace creation
- Stackdriver allows us to create custom dashboards and charts based on the monitored data
- Alerting policies: we can create alerting policies based on monitored data. We can create notifications based on these alerting policies
- Update checks: test the availability of the public services
- Monitoring agent: used to access system resources and application services. It can be installed in compute resources and application services
Stackdriver Logging
- Allows us to store, search, analyze and alert on log data on events from GCP and AWS
- Logging includes storage for logs, an user interface Logs Viewer and an API to manage logs programmatically
- Logs are retained for 30 days, we can export logs into Cloud Storage, BigQuery and Cloud Pub/Sub
- Logging Agent: can be installed on VM instances for gathering logs
Stackdriver Error Reporting
- Counts, analyzes and aggregates errors for running cloud services
- Provides a centralized error management interface
Stackdriver Trace
- It is a distributed tracing system that collects latency data from application systems and generates in-depth latency reports
- Can collect data from App Engine, Google HTTP(S) load balances and applications implementing Cloud Trace SDKs
Stackdriver Debugger
- Let’s us inspect the state of a running application in real-time without stopping it or slowing it down significantly
- It can capture debug snapshots, call stack and local variables of a running application