In 2018, operators began to deploy 5G networks globally, which is now in full swing. Compared with 2G/3G/4G, 5G has a significant leap in key performance such as network speed, network latency and connections scale, which would allow it to support new service scenarios and applications. To support the typical 5G service scenario, eMBB, mMTC and URLLC, and guarantee the network performance, various new technologies including Massive MIMO, uplink and downlink decoupling have been adopted. These new technologies improve network performance significantly but also increase the requirement of network agility and network complexity.
Recently, networking has become the focus of a huge transformation enabled by new models resulting from virtualization and cloud computing. This has led to a number of novel architectures supported by emerging technologies such as Software-Defined Networking (SDN), Network Function Virtualization (NFV) and more recently, edge cloud and fog. This development towards enhanced design opportunities along with increased complexity in networking as well as in networked applications has fueled the need for improved network automation in agile infrastructures.
Humans and manual processes can no longer keep pace with network innovation, evolution, complexity, and change. With localized and highly engineered operational tools, it is typical of these networks to take days to weeks for any changes, upgrades, or service deployments to take effect.
As we step into the 5G era and new services and applications continue to emerge, new network technologies and features are being adopted; the traditional network management model is no longer sufficient to support the growing network operation requirements and to guarantee user experience. Also, the ever-increasing complexity makes it challenging to improve operational efficiency and control opex costs effectively. The industry has recognised that a highly intelligent network built upon AI technologies required in the 5G ere.
This new networking environment calls for more automation, as exemplified by recent initiatives to set-up network automation platforms. This can be combined with Artificial Intelligence techniques to execute efficient, rapid, trustworthy management operations.
Higher levels of automation are the only way to handle this complexity, while at the same time, ensuring that network resources are utilised more efficiently than ever, to reduce operating expense (OPEX) and support rapid, agile response. In addition to automation, operators also need simplified processes to reduce cost and increase agility to handle the ever-more complex network.
Artificial intelligence is made up of 3 principal branches, big data, automation and artificial intelligence. Big data gathers large data sets on which analytics are applied to gain insights and enhanced decision making. Automation is where machines follow pre-programmed rules to run processes, generally used for repetitive tasks. The final area is most advanced – Artificial intelligence where machines perform cognitive functions similar to those attributed to humans. AI algorithms take decisions as a consequence of the application of advanced analytical techniques and may be applied in combination with automated advanced feedback loops to solve problems. Artificial intelligence can be further defined by the application of learning that may be undertaken; machine learning and deep learning.
Major operators are in the game as AI creates further opportunities to pursue digital transformation for two core business areas (network operations and customer experience) and provide new services to enterprise customers. Driving greater network automation and digitising customer interactions are the dominant use cases in early AI deployments. Some operators are also leveraging AI to launch new products and services (digital assistants and smart speakers) and platforms (AI-as-a-service). Generating revenues in these areas will depend on the ability to strike the right partnerships, expanding ecosystem presence.
Cisco has debuted a series of software enhancements designed to put AI and machine learning deeper into the network. Key features include new network automation and analytics tools that are meant to help enterprise IT teams glean more insights and visibility from network data. Cisco is also touting new machine-reasoning algorithms for improved troubleshooting, giving IT admins and network engineers the ability to detect and correct issues and vulnerabilities more quickly. Moogsoft uses ML to correlate network events. Splunk has a similar system.
ML may be a technology that helps us proactively identify and correct problems — and maybe even help us get to the point where our network management systems can predict what a network is going to do.
AI/ML applications in Networks
Machine learning uses statistical techniques to perform specific tasks, often requiring a smaller amount of data. In doing so, machine learning can be conducted by low-end systems though usually need labelling and features extraction to perform problem/task breakdown. This means that machine learning applications are faster to train, but testing may be slower to ensure the validity of results. However, these are more readily explainable as the process is understood.
Deep learning, on the other hand, uses artificial neural networks which require a more substantial amount of data to train models. This, in turn, requires high-performance GPUs but allows deep learning to process unlabelled data and solve end-to-end problems. As a result of its reliance upon large data sets, deep learning often is slower to train, however, faster to test, the biggest drawback here is what is referred to as the “black box” – while the inputs and outputs may be understood, the steps taken may not be.
Artificial Intelligence (AI) and Machine Learning (ML) approaches, well known from IT disciplines, are beginning to emerge in the networking domain. These approaches can be clustered into AI/ML techniques for network management; network design for AI/ML applications and system aspects.
AI/ML techniques for network management, operations & automation address the design and application of AI/ML techniques to improve the way we address networking today. The Operation-focused AI, such as AI-based network planning, optimisation and operations is used to assist with fault detection, predictive maintenance, and network planning and optimisation, all of which enables operators to make more efficient use of their physical assets.
Another category is Service-focused AI, such as customer experience AI that is used for more personalised commercial purposes such as pricing promotions, customer care, predictive care and churn reduction, smart retail, and -the deployment of virtual assistants (such as Tobi, the Orange-Deutsche Telekom chatbot on the Djingo smart speaker and the Telefónica Aura virtual assistant) to personalise services and more.
Network design and optimization for AI/ML applications addresses a complementing topic namely the support of AI/ML-based systems through novel networking techniques including new architectures as well as performance models.
A third topic area is system implementation and open-source software development This evolution has drawn particular attention to inter-disciplinary approaches from communication networks and the AI/ML research community. On the one hand, researchers in communication networks are tapping into machine learning and AI techniques to optimize network architecture, control and management, leading to more automation in network operations. On the other hand, researchers in the AI community are working with networking researchers to optimize network architecture and design.
State of Automation, Artificial Intelligence, and Machine Learning in Network Management
The report, “The State of Automation, Artificial Intelligence, and Machine Learning in Network Management,” compiles an analysis based on the survey responses of 388 executive and technical-level attendees at Cisco Live U.S. 2019, one of the biggest networking industry events of the year.
“Humans and manual processes can no longer keep pace with network innovation, evolution, complexity, and change,” said Jim Frey, vice president of strategic alliances at Kentik. “That’s why we’re hearing more about self-driving networks, self-healing networks, intent-based networking, and other concepts. These approaches collectively belong to a growing focus area called AIOps, which aims to apply automation, AI and ML to support modern network operations. Automation and ML, are two of the three major foundational elements of AIOps (the third is data integration and enrichment).
Moving to the cloud, and especially multi-cloud, is one of the driving factors behind the need for network automation. While 76% of our respondents indicated they were using cloud services, nearly a quarter (24%) report that their organization has not yet moved to the cloud. Of those with cloud services, nearly half (47%) are working in a multi-cloud environment, so the complexity ramp is a swift one.
The majority (53%) of respondents are using automation for network configuration – the only area to receive a majority response. Policy management was the second-most automated process, cited by 40% of respondents. Processes such as compliance, incident response, and cloud bursting received lower response rates.
Challenges
Sufficient Volume of Data
The most important step for any of these systems is to train it with enough data of the type that we want it to analyze and recognize. ML’s learning process depends on getting input on what it is observing. To be able refine the weights a network uses internally, ML neural networks require very large volumes of data. The volume of data from one network may not be sufficient for the learning algorithm to generate the results we want. If you only use data from your network, the ML system may not have enough data to properly identify problems until many months after the initial installation.
Training the networks using other networks is also challenging. While networks tend to be designed around similar constructs (at least if you’re following industry best practices), they operate in vastly different ways. The data from a well-run network will be significantly different than the data from a poorly run network. The resulting ML system may not function properly in your own network.
Feedback
The next challenge is to provide feedback that identifies problems. For example, the ML system should be able to detect a duplex mismatch on an Ethernet link by several means. A simple set of rules should suffice for identifying the signature of the problem and providing feedback.
The ML system needs accurate data and feedback as it learns to identify good and bad patterns. A good example is the ML system’s ability to detect configuration issues in a four-port Ethernet channel. The ML system will require a lot of feedback from a network engineer before it has learned enough to identify an improper configuration.
A network management system should be able to provide the feedback to a ML system as the operations team identifies and corrects problems. Part of the change process could incorporate ML feedback. Of course, you would need a way to reverse the effect of incorrect feedback resulting in the problem not being corrected until a second or third examination.