The Argus system is the network data source of choice for many prominent Machine Learning and AI based Network Based Anomaly Detection (NBAD) projects. Unsupervised learning using network flow data has been an active research topic for many years, and organizations like Oak Ridge National Lab (ORNL) have had great results using Argus data in their operational system, SITU.
The rich data models, flexible data formats, high performance data generation and processing, metadata enhancement capabilities, streaming and block processing strategies all come together to provide an environment where ML and AI models can be developed, tested, optimized and deployed.
Having a lot of historical transactional network activity data is really important for ML model development and training, and having a lot of attributes associated with the data, is a key component for ML deep learning. Argus is famous for generating lots of network data, well structured, with the right kind of attributes that allow ML to "peek" into network state and condition, in real-time ... data that is actually useful and reliable.
In order to use ML in operational networks, you have to provide the ML with a live stream of network observational data. Argus provides the most mature streaming network situational awareness capability available, providing guarantees on data timeliness, order and state, which makes it a natural choice for ML base NBAD.
Argus is at the core of a number of prominent unsupervised ML projects at US National Laboratories, Universities and private companies. These projects provide a glimpse of how Argus data can be used in large scale operations to provide detection and protection for important assets.
The technology needed to do effective Machine Learning for network based anomaly detection, involves developing / supporting a set of environments for the Data Scientists that support the whole ML life cycle. We're working on Argus data processing in Python, R, Matlab and Mathematica.
Getting data into the platform is just one step in the process, and many use CSV and JSON, both of which are supported in Argus. But getting streaming data into the platform can be complex and difficult for some applications, and getting that data in for a 100G network can be very challenging.
If there are other basic environments that you need, please give us a holler.
There seem to be two basic strategies for effective ML Network analysis, and these are based on whether the ML is processing data streams or whether its processing block / file based data.
Block processing, where the ML reads argus data from files, or a database table, is envisioned to support ML model development and testing.
Stream processing, where the ML is processing real-time streams of data, is envisioned for operational deployments of ML models for network data analysis.