Transform IT Operations with ServiceNow AIOps
Many organisations, especially in public sector, spend time and money reacting to IT-related events. IT operations teams are fire-fighting against issues and maintenance, unable to get on a solid footing to make meaningful transformation. In some cases, the fire-fighting is compounded by siloed teams, legacy technologies, and a lack of clear visibility.
As we rely on digital services more for things like our health and finances, the old break-fix model of resolving issues after impact is no longer fit for purpose. All too often teams are trawling through both people and machine generated noise, increasing the amount of time taken to resolve service issues. Downstream, these challenges can lead to staff burnout or retention problems, poor user experience, lost productivity, and a lack of innovation. Operations teams need to move from reactive mode to proactive and self-healing operations.
In this blog post we’ll look at some of the ways ServiceNow IT Operations Management (ITOM) can reduce the volume of incidents from people and machines using Artificial Intelligence (AI) enhanced automation. The first section will discuss converging IT Service Management (ITSM) and IT Operations Management onto a common platform and data model. Moving on, we can identify service issues and automate remediation with AIOps (Artificial Intelligence for IT Operations).
ITOM Overview
IT operations refers to the tools and processes an organisation uses to to manage their IT infrastructure. It encompasses a variety of functions aimed at ensuring the stability, availability, and performance of IT systems.
For completeness, IT Service Management (ITSM) refers to the practices and processes an organisation uses to deliver and manage IT services in accordance with the needs of the business. IT departments use ITSM to manage system outages (incidents), investigations (problems), fixes or updates (changes), and user requests. Without these processes, services provided by IT operations would suffer; degrading customer experience and slowing down or preventing users from working.
Since both service management and operations management are essential functions of IT, having them exist in silos isn’t optimal. ITSM and ITOM work better together. By taking an integrated approach, IT can open up a comprehensive view of services and operations to aid decision making, communication, resource allocation, automation, security, and incident, problem, and change management. Longer term, IT can scale more effectively as requirements and technologies change.
The ITOM journey to AIOps can be broken down into 3 phases; Visibility, Health, and Log Analytics:
1. Visibility
ITOM Visibility provides complete visibility into the IT estate, with additional capabilities to help organisations automate, prioritise, and align resources with their business services. You can see the full feature list in the ITOM Visibility documentation, but some key examples include:
- Agentless and Agent-Based Discovery – ensures full discovery of all on-premises, cloud, and hybrid infrastructure. The Agent Client Collector provides additional monitoring and log collection from a single, unified platform.
- Service Graph Connectors – out-of-the-box connectors that enrich the CMDB (Configuration Management Database) with third-party data sources. A multi-source CMDB benefits from additional context and enhanced accuracy, whilst automatically updating new or dynamic assets.
- Service Mapping – visualises business services with associated relationships and dependencies into service maps. Allows for full business and application context to be applied through all levels of operations, from assessing the impact of change requests to quickly identifying the infrastructure component(s) causing a service outage.
- Firewall Audit and Reporting – extends visibility into firewall configurations, integrating seamlessly with incident and change processes, and offering self-service reporting. Significantly reduces manual troubleshooting and resolution times for firewall related outages.
- TLS Certificate Inventory and Management – extends visibility into certificates used across systems and their expiry dates, reducing the number of certificate related outages. Integrates with workflows for automated replacement and change processes.
2. Health
Both Visibility and Health are included in ITOM Operator Professional. They provide immense value without AIOps, but their effectiveness is enhanced when upgrading to ITOM AIOps Enterprise.
The main features included with ITOM Health are event management and metric intelligence, you can find out more in the ITOM Health documentation. Without AIOps, as standalone functionality, ITOM Health provides:
- Event Management – consolidate large streams of events into a single place for monitoring and alerting. Uses out-of-the-box connectors to integrate with monitoring tools and REST API, SNMP, email, or JavaScript based custom connectors for other event sources.
- Metric Intelligence – visualise and analyse performance of IT infrastructure resources by metric. Pinpoint performance changes causing service degradation.
In both cases, without AIOps users may miss out on automated anomaly detection and predictive analytics. The standalone functionality relies more on manual processes for event analysis and resolution. This level of information streamlined into a single platform still empowers IT operations teams to effectively analyse, troubleshoot, and resolve issues.
3a. Health Log Analytics
Health Log Analytics collects and processes logs in real-time, allowing for immediate detection of anomalies in text patterns. This could be new patterns discovered for the first time, a deviation in behaviour above or below an expected pattern, or an anti-pattern where the expected behaviour didn’t occur.
With a baseline of normal behaviour, Health Log Analytics interprets and extracts meaning from text and dynamic thresholds to predict issues, generate alerts, and provide resolution suggestions. Alerts can be sent to platforms like Slack and Microsoft Teams in real-time, as well fed into the ServiceNow Event Management application for operators to manage and track effectively. Health Log Analytics is included in ITOM AIOps Enterprise.
3b. AIOps
AIOps isn’t a single product, but rather a broader concept that automates IT operations by applying machine learning, observation, and analytics to aggregated and extensive data sets. Led by Health Log Analytics, AIOps is able to identify problems, events, and trends, and automatically respond and remediate issues, often before users are impacted. You can read more in the What is AIOps? page.
The event management and metric intelligence capabilities of ITOM Health are exponentially enhanced by adding Health Log Analytics to detect application behaviour in a way that traditional monitoring techniques do not address. By using unsupervised learning, Health Log Analytics is able to uncover complex patterns unforseen by the IT operations team.
Event floods and alerts from the vast amounts of information in different monitoring tools are correlated, analysed, and replaced with a few actionable insights and alerts. Pattern and performance anomalies are automatically highlighted to detect potential service degradation.
At the time of writing, you can expect ServiceNow AIOps to deliver:
- System monitoring, noise reduction, anomaly detection, and alert correlation to predict outages
- Automated incident resolution with playbooks
- Problem identification and root cause analysis through AI-analytics
- Convergence of data pipelines for real-time data streams between teams and systems
- Consolidation and integration of AI and non-AI tools to reduce fragmentation
Perhaps the most exciting capabilities of AIOps are still to come. Currently decision making guidance and configuration is predefined by humans, i.e. the IT department. In the future, agentic AI will bring in a higher degree of independence and intelligence. For now though, organisations can automate and optimise specific areas of IT operations using human oversight, which arguably gives incredible value without a complete change in operating model.
To enhance the predictive AIOps intelligence brought in by Health Log Analytics, ITOM AIOps Enterprise also includes the following features:
- Express List – triages alerts in real-time, allowing operations teams to stay on top of alert workflows, prioritise accordingly, see link view maps, and probable root cause. Operators can quickly respond and take action without leaving the express list view.
- AIOps Dashboards – provides understanding and benchmarking for AIOps environments. Customisable dashboards help admins, operators, or executives track service health and key performance indicators.
Finally, you can use Now Assist for ITOM to help translate cryptic or technical machine-generated alerts into human-readable explanations. Level 1 engineers can benefit from automated alert summarisation to quickly understand issues and next steps.
Alert analysis and summarisation helps even experienced engineers by reducing the amount of time spent reading through logs and researching potential symptoms. Now Assist is ServiceNow’s Generative AI assistant and is a separate license to those outlined above.
Summary
In summary, AI is already delivering benefits for operations management teams through incident prevention, and for service management teams through incident deflection. When combined, these benefits afford IT teams the breathing room to make meaningful improvements or expansions to the services they deliver. More importantly, user experience is enhanced with less or shorter outages.
We discussed some of the key benefits of predictive AIOps, and how the journey unfolds in 3 phases: Visibility, Health, and Log Analytics. Looking ahead, future developments in agentic AI promise even greater independence and innovation in IT Operations Management. If you found this topic interesting, you might also like these AIOps focused ebooks: