DevOps.yoga


A DevOps Wiki

View project on GitHub

This is content about Operational Intelligence tools.

List of Operational Intelligence Tools

Note: This tools list is currently sourced from, and thus linked to, XebiaLabs. Much thanks to them for their valuable DevOps Toolchest.

NameIconDescription
Kibana Kibana is an open source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster. Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data.
Elasticsearch Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine.
AppDynamics AppDynamics, Inc. is an American privately held application performance management (APM) and IT Operations Analytics ITOA company based in San Francisco, CA.
Prometheus Prometheus is an open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
New Relic New Relic is an American software analytics company based in San Francisco, California. Lew Cirne founded New Relic in 2008 and currently acts as the company's CEO. New Relic's technology, delivered in a software as a service (SaaS) model, monitors Web and mobile applications in real-time that run in cloud, on-premises, or hybrid environments
Grafana Grafana is an open source, feature rich metrics dashboard and graph editor for Graphite, InfluxDB & OpenTSDB.
Graphite Graphite is a free open source software (FOSS) tool for monitoring and graphing the performance of computer systems. It was created in 2006 and released as open source software in 2008. Graphite collects, stores, and displays time series data in real time. There are three main components:
Datadog Datadog is a SaaS-based monitoring and analytics platform for IT infrastructure, operations and development teams. It brings together data from servers, databases, applications, tools and services to present a unified view of the applications that run at scale in the cloud.
Zabbix Zabbix is an enterprise open source monitoring solution for networks and applications, created by Alexei Vladishev. It is designed to monitor and track the status of various network services, servers, and other network hardware.
Tableau Tableau Software is an American company based in Seattle. They make several tools for business intelligence and data visualization. Tableau focuses on producing fast data analytics and beautiful graphs and charts. In addition to enterprise-level software, Tableau offers free and low-cost tools for personal use.
Nagios Nagios, an open-source computer-software application, monitors systems, networks and infrastructure. Nagios offers monitoring and alerting services for servers, switches, applications and services. It alerts users when things go wrong and alerts them a second time when a the problem has been resolved.
StackState StackState provides business IT services managers and their Dev/Ops teams a unique insight of the whole IT stack to lower failure repair costs and reduce and shorten downtimes.
Dynatrace The Dynatrace platform enables developers, testers, operations, business colleagues to optimize digital touch points with their customers or application users. Dynatrace captures user transactions (good and bad performing ones) to provide you with actionable results based on facts, not based on mathematical correlations and snapshots. This single system with a common language is development smart but production friendly which is quiet unique in this space.
Sensu Sensu is a free and open source monitoring that handles cloud environments. Sensu allows you to monitor servers, services, application health, and business KPIs. Collect and analyze custom metrics and get notified about failures before your users do.
Ganglia Ganglia is a scalable distributed system monitor tool for high-performance computing systems such as clusters and grids. It allows the user to remotely view live or historical statistics (such as CPU load averages or network utilization) for all machines that are being monitored.
Icinga Icinga is a scalable and extensible monitoring system which checks the availability of your resources, notifies users of outages and provides extensive BI data.
Riemann Riemann aggregates events from your servers and applications with a powerful stream processing language. Send an email for every exception in your app. Track the latency distribution of your web app. See the top processes on any host, by memory and CPU. Combine statistics from every Riak node in your cluster and forward to Graphite. Track user activity from second to second.
Sentry Sentry provides real-time crash reporting that gives your team insight into errors affecting your customers in production.
CAST Application Engineering Dashboard CAST AED provides continuous risk monitoring by detecting structural defects in code before they go into production. CAST AED provides delivery teams with fast feedback and guidance to find, remove and prevent defects early and fast. CAST software intelligence provides organizations with continuous visibility into system performance, safety, and reliability.
Rollbar Rollbar's error monitoring fits right into your continuous delivery and deployment workflows to provide confidence in every code release. Find and fix errors instantly. With real-time aggregation, smart grouping and alerts, detailed stack traces, error trend reports and regression notifications, Rollbar provides the context and insights to help you keep production error free. Rollbar also works alongside your existing monitoring and logging tools to give you greater coverage and insights into broken code, across your stack.
Bosun Bosun is an open-source, Go based, MIT licensed, monitoring and alerting system created by Stack Exchange and designed to work with Scollector, OpenTSDB, Logstash, Graphite, and Grafana. It has an expressive domain specific language for evaluating alerts and creating detailed notifications. It also lets you test your alerts against historical data for a faster development experience. The Scollector monitoring agent has a number of built in collectors for various systems and runs on Windows, Linux, Mac, and ARM based systems.
Cacti Cacti is an open-source, web-based network monitoring and graphing tool designed as a front-end application for the open-source, industry-standard data logging tool RRDtool.
SPM SPM is a SaaS-based monitoring and analytics platform for IT infrastructure, operations and development teams. It provides performance metrics charting, alerting, and anomaly detection, distributed transaction tracing, network discovery, unlimited dashboards, multi-user role-based access. It also captures and graphs events like deployments, restarts, alerts and, along with application and server logs makes them searchable and "correlatable" with performance metrics, thus providing a unified view all key operations data.
jKool Unified Application & Fast Data Analytics for analyzing machine data such as logs, metrics, performance, transactions and other time series machine data.
Check_MK Check_MK is an extension to the Nagios monitoring system that allows creating rule-based configuration using Python and offloading work from the Nagios core to make it scale better, allowing more systems to be monitored from a single Nagios server. It comes with a set of system checks, a mod_python and JavaScript based web user interface, and a module that allows fast access to the Nagios core. On top of Nagios it also adds additional features. It can be used as a front-end and extension of a Nagios, Icinga or Shinken monitoring system, for monitoring performance and health of networking devices, servers and infrastructure systems.
Spotinst Spotinst is a unique cost-oriented cluster, across data-centers and instance types, auto-scaled and auto-optimized. Spotinst can be deployed on Amazon Web Services, Google Cloud Platform, and Microsoft Azure. Spotinst has a designated Machine Learning software that takes DevOps and financial decisions in real time and saves the overhead and management of cloud computing purchasing options such as On-Demand, Reserved and bidding strategies of Spot Instances. Spotinst chooses the most cost-effective compute resources based on your application workload, allowing you to focus on your business growth rather than chasing ghosts
Librato Librato is hosted monitoring platform designed for custom metrics. It consists of a scalable and redundant storage tier that is optimized for time series data, visualization tools, data manipulation capabilites, and a powerful alerting framework. With integrations that collect data from servers, AWS, Docker, Redis, and many other systems, Librato is a complete solution for monitoring and analyzing the metrics that impact your business at all levels of the stack.
Zenoss Zenoss works with the world's largest organizations to ensure their IT services and applications are always on. As the leader in software-defined IT operations, Zenoss develops software that builds comprehensive real-time models of hybrid IT environments, providing unparalleled holistic health and performance insights. This uniquely enables Zenoss customers to predict and eliminate outages, dramatically reducing downtime and IT spend.
ITRS ITRS Group provides enterprise-class monitoring & IT Operations analytics, along with real-time insight into the end-to-end health, performance, and capacity of business transaction workflow.

Prev: Practices | Next: Glossary