How metadata provides important information on the system health of a cloud

What does bird watching have to do with cloud monitoring?

It would appear that songbirds are able to predict storms some time before they hit. In 2013 and 2014,¹ American environmental researchers spent time recording the flight paths of golden-winged warblers. When a large storm front moved across eastern Tennessee in April 2014, the researchers observed that the birds breeding there fled the area. Amazingly, this happened one to two days before the storm front arrived. And this was despite the fact that there were no local atmospheric changes at the time that could have indicated a storm was approaching.

Although birdwatching is no longer used in modern meteorology, it has a deserved place in folk wisdom. However, what does birdwatching have to do with cloud monitoring?

What is cloud monitoring?

Cloud monitoring refers to monitoring availability and data flow in cloud computing. It is essential to avoid unwanted system crashes or processing errors. System administrators usually monitor a cloud system by recording and evaluating certain system metrics. This approach to cloud monitoring systems has parallels with the modern meteorological measurement methods used in weather forecasting.

In both cases, the techniques used are very reliable, but are very time-consuming and require a high level of interpretation skills. In weather forecasting, birdwatching represents a less complicated alternative. However, what is the counterpart to birdwatching in cloud monitoring? To understand this, it is useful to look at the context in which cloud monitoring is employed.

How can you complement cloud monitoring?

SEEBURGER provides integration capabilities to support integration scenarios for companies on its cloud-based integration platform, the SEEBURGER BIS platform. One of the capabilities on the platform, EDI, enables companies to electronically transmit invoices or orders to business partners as part of an automated business process. From a technical perspective, this is done via a gateway that centralizes the connections between the business partners. This simplifies the maintenance, testing and monitoring of data flows. The gateway is deployed flexibly from a cloud. The messages transmitted via the gateway are recorded by the SEEBURGER message tracking system and metadata such as the time, duration and size of the message transmission is logged.

Figure 1: This is how SEEBURGER message tracking works

Evaluating this metadata is like bird watching to forecast the weather. As with birdwatching, the metadata reflects the cause or symptoms of an error instead of being the actual measured variables of an error. Instead of evaluating information such as flight altitude, bird song or flight direction to predict the weather, the metadata from a message transmission is evaluated to predict the system health of the cloud. Evaluating metadata can be used profitably in real cloud monitoring processes, not as the sole basis for decision-making, but as a complementary technique.

As cloud computing is ubiquitous these days, reliable cloud monitoring really is essential.

What is cloud computing?

Cloud computing is an IT model in which dynamically scalable and virtualized resources are accessed as services via the internet. This means that IT resources such as CPU and storage capacities are available anytime, anywhere.² SEEBURGER, for example, uses cloud technology to provide agile and scalable data exchange capabilities to its customers. Unlimited computing resources on demand not only give users a high degree of flexibility, but also a processing environment that would be difficult to achieve from a single machine. Companies benefit from these dynamic and agile cloud environments; especially as technical infrastructure becomes more complex and tasks require more processing power.³ As customers do not have physical access to hardware resources, they must be able to rely on the availability of cloud environments. Cloud providers must therefore offer high quality service and availability guarantees. Their task is to provide, orchestrate and manage services while ensuring security and compliance with data protection.⁴ Managing cloud services includes the early detection of errors and minimizing downtime.⁵

How can we solve the challenges in cloud monitoring?

Until now, experienced system administrators have been responsible for interpreting system health in cloud monitoring. Their work has been made more difficult by the large amount of recorded data and the complexity of the parameters. By using computing resources and artificial intelligence (AI) to evaluate, pre-sort and assess the data, system administrators can be relieved of this workload. They are then responsible for checking the aggregated analysis.

Conventional monitoring cannot predict future system states and is therefore limited to failure reactions. However, if rules are identified that enable the early prediction of undesirable system states, preventive action can be taken.

As process monitoring currently focuses on real-time monitoring, particularly resource monitoring, troubleshooting measures tend to be implemented on a technology and hardware level. Options for error limitation and load balancing at the organizational and configuration level have been little investigated, less implemented. In contrast to hardware-centric system metrics, the metadata of a message transmission provides insights into the content-related processes and data flows of a cloud. Examining this data therefore lets us draw well-founded conclusions on what we wish to know.

Artificial intelligence and good computing resources help us determine rules for predictions from metadata records. Artificial intelligence is therefore a valuable tool to solving the comprehensive challenges of cloud monitoring detailed above.

What is artificial intelligence (AI)?

In an increasingly digitalized world, immense amounts of data are being generated. These bring both new challenges and unlimited opportunities. The diversity of information, values and metrics available to us opens up detailed insights into data-centric applications and processes, how they work and how they relate to and interact with others.⁶ By carefully analyzing and interpreting real observations, we are able to gain extensive knowledge to inform our decision-making processes.⁷ Processing increasing volumes of often unstructured data is a demanding task that pushes human capabilities to their limits. However, powerful computers can handle huge data processing tasks. In recent years, high computing power of computers has been increasingly employed in the development of methods that mimic human cognitive abilities by statistically evaluating observations.⁸ These artificial intelligence methods have been received with great enthusiasm and are enjoying considerable popularity.⁹ The intelligence that computers use to draw informed conclusions is either based on programmed processes or has been generated by machine learning. While programming an intelligent system requires a human expert to specify algorithms, machine learning models are able to derive these from available data. They use algorithms to recognize observable relationships and map them in a mathematical equation. By applying a learned function, also known as a model, to observable inputs, the machine can then predict unknown target values.

How does machine learning (ML) work?

The learning process for a machine learning model is known as training. The aim of training is to learn a relationship function that maps known input values to an unknown target value. One approach to machine learning is known as supervised learning. This uses historical data that documents both input and target values. By analyzing the relationships between the input and target values in the training data, the machine learns to associate one with the other.

Figure 2: With the help of AI, these parcel characteristics can give us an idea how delivery will go. — Figure 2: With the help of AI, these parcel characteristics can give us an idea how delivery will go

How a machine learning algorithm could be trained can be illustrated with an example from the logistics industry. A parcel delivery involves characteristics such as

the destination
the weight of the parcel
time sent
the driver

A machine learning model can use historical shipment records to map the relationships between certain shipment characteristics and the delivery success of a parcel. For example, such a model could determine that if a parcel weighs more than 100 kg, the delivery always fails. By applying learned patterns to deliveries with an as yet unknown outcome, AI can make predictions on the ultimate success of the delivery. For future deliveries, it is then known that deliveries of particularly heavy parcels will probably not be successful. The shipment properties used to evaluate the delivery success are the model’s input values.

Figure 3: A machine learning model can use input values to make predictions on delivery success

In supervised learning, there are essentially two main tasks: classification and regression. Classification refers to the prediction of nominal values. The example of predicting the delivery success of a parcel shipment is a classification, as the range of result values is limited to two options: successful or unsuccessful. Regression is used to predict metric values. For example, regression can be used to predict the duration of a parcel shipment. In both cases, a mathematical equation is used to model the relationships between the input data and the target values. The main objective is to use machine learning algorithms to enable the most precise categorization or predictions possible.

Incidentally, the illustration using the parcel example can be very easily transferred to predicting transmission success of cloud messages. Basically, both are packets – physical or virtual message packets. And for the virtual packets there is metadata that specifies the properties of the delivery or transmission.

How is machine learning put into practice?

Machine learning solutions need to be planned and implemented individually depending on the problem or objective at hand and the data available. Standardized procedures for processing machine learning problems have also been developed in scientific literature and are collectively referred to as MLOps (Machine Learning Operations.¹⁰

The procedure for machine learning may simply be broken down into data preparation, model training and evaluation.

High-quality data is crucial for successful training, which is why how you generate and prepare the database is so important. Any errors or outliers could distort the results.

When training a model, a learning algorithm is applied to the database to generate a learning model that can then be used for prediction on unknown data. A suitable learning algorithm is selected after assessing the underlying problem and the computing resources and data available.

The model is then tested to verify its quality. If necessary, one can go back and make improvements to the previous steps in order to achieve more accurate results.

How can we use AI predictions in future work?

As described in the process above, we used historical metadata records from the Cloud Gateway to train machine learning models. These can now be used to predict the outcome of a message transmission. We trained a total of three independent learning models, each of which predict different output values.

This means that we can now predict

transmission success,
transmission duration and
the availability of a service.

Figure 4: AI predictions and the resulting recommendations for the transmission success of a message in the cloud

Since predictions depend on the metadata of the message being transmitted, it is only possible to make a prediction during transmission, i.e. when the metadata is known (see “Input” in the above figure).

The predictions can be translated for the user into statements such as “This message will not be transmitted successfully” (see “Prediction” in the above figure). The user could be a customer sending messages through the cloud by themselves or a cloud administrator monitoring these messages.

If the prediction is unsatisfactory, a sensible intervention can lead to a better result. A message may be too large and should therefore have been sent in smaller, separate packets. Perhaps sending a message later on in the day already has a positive influence on transmission success. Recommendations such as these can be extracted from machine learning models and implemented by users (see “Recommendation” in the figure).

Smooth processes that take such predictions into account are only possible by incorporating this into automated processes. This is not yet the case. However, the prediction results of simulated tests are already promising in many ways.

At present, transmission success predictions currently have an accuracy rate of 74%, while the failure of a service can already be predicted with an impressive accuracy of 99.8%.

It would appear that we are actually pretty close to reliably predicting the weather through birdwatching.

¹ Source: Streby, Henry M., u. a., »Tornadic storm avoidance behavior in breeding songbirds. «, Current Biology 25.1, 2015, S. 98-102

² Source: L. Huawei Technologies Co., Cloud Computing Technology, 1. Aufl. Springer Singapore, 2023.

³ Source: S. Marston, Z. Li, S. Bandyopadhyay, J. Zhang und A. Ghalsasi, »Cloud computing—The business perspective«, Decision support systems, Jg. 51, Nr. 1, PP. 176–189, 2011

⁴ Source: F. Liu, J. Tong, J. Mao u. a., NIST Cloud Computing Reference Architecture, en, 2011.

⁵ Source: B. P. Rimal, E. Choi und I. Lumb, »A taxonomy and survey of cloud computing systems«, in 2009 Fifth international joint conference on INC, IMS and IDC, Ieee, 2009, PP 44–51

⁶ Source: H. Aust und H. Aust, »KI: Hype oder Technologie der Zukunft?«, Das Zeitalter der Daten: Was Sie über Grundlagen, Algorithmen und Anwendungen wissen sollten, PP 21–32, 2021

⁷ Source: A. Gadatsch und H. Landrock, »Big Data für Entscheider«, Entwicklung und Umsetzung datengetriebener Geschäftsmodelle. Wiesbaden (essentials), 2017.

⁸ Source: D. Hecker, I. Döbel, U. Petersen, A. Rauschert, V. Schmitz und A. Voß, Zukunftsmarkt Künstliche Intelligenz. Potenziale und Anwendungen, 2017

⁹ Source: W. Bauer, W. Ganz, M. Hämmerle u. a., »Künstliche Intelligenz in der Unternehmenspraxis«, Studie zu Auswirkungen auf Dienstleistung und Produktion. Fraunhofer-Institut für Arbeitswirtschaft und Organisation IAO, Stuttgart, 2019

¹⁰ Source: F. Zhengxin, Y. Yi, Z. Jingyu u. a., »MLOps Spanning Whole Machine Learning Life Cycle: A Survey«, 2023

Thank you for your message

We appreciate your interest in SEEBURGER

Get in contact with us:

Please enter details about your project in the message section so we can direct your inquiry to the right consultant.