Zabbix is an open-source monitoring software tool for IT components that can help you improve operational excellence on AWS – a pillar of the AWS Well-Architected Framework. Zabbix helps you control your infrastructure by collecting any metric type from any metric source. Zabbix automatically provides its users with flexible, intelligent threshold definition options. With Zabbix, […]
View PostIn 2012, SoundCloud engineers realized they had a problem. Their existing monitoring solutions were insufficient for their needs. Development teams need real-time, actionable monitoring data to improve Kubernetes deployment uptime, enhance system performance, and boost resource optimization. In other words, they needed a solution to handle the complexity and distributed nature of their cloud applications, […]
View PostWhy container monitoring is critical for modern cloud environments Modern cloud application environments are complex, running across hundreds or even thousands of compute instances. Because of this complexity, modern applications require container monitoring to continuously collect metrics, track potential failures, and gather granular insights into container behavior. So, it’s not a question of whether or […]
View PostIn this blog, I’ll provide a step-by-step tutorial on automating a runbook to reduce MTTR by using Amazon EventBridge (EventBridge) and Datadog. Datadog is used as a monitoring tool, and EventBridge is used to remediate issues and automatically resolve any alerts. EventBridge is a serverless event bus. It makes building an event-driven workflow for applications […]
View PostSAN FRANCISCO, July 28, 2021 — nClouds (www.nclouds.com), a provider of Amazon Web Services (AWS) and DevOps consulting and implementation services and an AWS Premier Consulting Partner, announced today the expansion of its 24/7 on-call support services to include site reliability engineering services (SRE). A top managed service provider (MSP), the company also announced it […]
View PostRecent studies indicate that the cost of IT downtime is between $9,000 – $12,000 per minute, depending on industry vertical, organization size, and business model. That cost includes business disruption, revenue loss, and end-user productivity. To protect SLAs and mitigate downtime, the first approach is to accelerate the incident resolution process and find the root […]
View PostAt nClouds, many of our 24/7 Support Services customers have some pretty aggressive Service Level Agreement (SLA) deadlines. So, we continuously search for strategies to help them separate the “signal from noise.” In this blog post, I’ll provide tips on the strategies we use to help our customers reduce alert fatigue and avoid recurring incidents. […]
View PostHere at nClouds, we manage the infrastructure needs of many of our customers so that they can focus on building awesome products and delivering value to their customers. Since we are managing the infrastructure of multiple customers, the number of alerts can skyrocket pretty quickly if not managed properly. So we always look for ways to reduce unintended noise to avoid alert fatigue. Alert fatigue […]
View PostTop takeaways: AWS Managed Microsoft AD and Microsoft Active Directory
2022-12-05 15:25:16