Integrations
Integrations overview
Event Store Cloud offers integrations between internal sources such as cluster health, issue detection, notifications events, EventStore DB logs, EventStore DB metrics and sinks which include external services such as Slack and Amazon CloudWatch.
Integration sources
"Sources" are driven by events or other mechanism inside the Event Store Cloud.
Currently, supported sources include:
Issues - issues represent a potentially problematic condition detected inside a cluster or other Event Store Cloud resource. Issues consist of multiple "open" events and a single "closed" event and have different levels of severity.
Logs - these are the logs generated by EventStore DB itself.
Metrics - these are the metrics generated by both EventStore Cloud issue detection processes running on each cluster node and EventStore DB itself.
Notifications - notifications are noteworthy events detected within Event Store Cloud resources or the backend. For example notifications may be emitted when a cluster fails to provision.
Issues
Issues represent possibly problematic states detected within the Event Store Cloud. Below, you can find several issue examples.
Note
These examples are a subset of issues created by the system. The exact details of why issues are created are subject to change, but the cause of the issue and steps to resolve it will be clear in the messages associated with the related events.
Core load count
For each node of a cluster, the core load average is measured and divided by the number of physical cores on the node. If the result exceeds 2.0 an issue is opened. This issue is closed when the result consistently dips under 2.0.
If this happens consider increasing the size of the instance type for the cluster.
Disk usage
For each node of a cluster, the disk usage is measured several times a minute. If it starts to consistently exceed 80% an issue is opened. The issue is closed when the usage drops below 80%.
If this happens consider either removing data, running scavenge or increasing the disk size for the cluster.
Memory usage
For each node of a cluster, the memory usage is measured several times a minute. If it exceeds 90% an issue is opened. The issue is closed when memory usage consistently drops below 90%.
If this happens consider increasing the size of the instance type for the cluster.
Cluster consensus
Every node on a cluster has it's gossip status queried twice each minute. An issue is opened if either the query fails or if the reported gossip state for each node is not identical on a multi-node cluster.
The issue closes when the gossip status again returns expected values.
Notifications
Notifications represent noteworthy events which occur within the Event Store Cloud. Below you can find notifications examples.
Note
The following represent a subset of events which can lead to notifications.
Cluster provisioning failure
If, for some reason, the instances backing a cluster fail to provision the resource is marked as defunct by the API and a notification is sent with the message Cluster instances failed to provision
.
Volume expansion failure
If the volume fails to expand while expanding an instances size a notification event is created with the message Cluster volumes failed to provision
.
Logs
Note
Logs are currently in beta. The data coming from this source may change over time.
Logs are sent in the form of JSON objects. Each object's "message" field contains the original structured event from EventStore DB itself.
Metrics
Note
Logs are currently in beta. The data coming from this source may change over time.
Metrics are name / value pairs generated by EventStore DB itself as well as detection processes running on the host.
Integration sinks
"Sinks" are services outside the Event Store Cloud which events from sources can be forwarded to.
AwsCloudWatchLogs - BETA Amazon CloudWatch allows you to track metrics, display them with create custom dashboards, and create alarms from them.
AwsCloudWatchMetrics - BETA Amazon CloudWatch allows for logs to be uploaded, viewed and searched, and consumed by other AWS services.
OpsGenie - OpsGenie is an alerting and incidence response tool. It is possible to set up integrations to create OpsGenie alerts when cluster health issues are detected.
Slack - Slack is a communication platform. It is possible to set up integrations which send Slack messages when issues and notifications are created or updated.