Project Overview 

Project Name: Argus Dashboard
Purpose: The purpose of this project is to evaluate whether the web-based Argus application is suitable as a dashboard for monitoring our Network & Alerts system.

Current State Assessment 

System A (As-Is) 

Overview: Existing Internal Dashboard (Java Application) (Geant) (Closed-Source)

Strengths: 

Weaknesses: 

System B (As-Is) 

Overview: Argus (Sikt) (Open-Source)

Strengths: 

Weaknesses: 

Desired Future State 

Overview:
We also recognise and appreciate the mission of the Sikt team. A common tool could be used among NRENs for OC Alarm visualization adhering to ITIL best practices and standards. 

Argus has positioned itself as a promising candidate for an OC alarms dashboard by adopting an open-source approach and actively promoting its usage and availability at networking conferences.

We don’t want 'a fork' of Argus, but would strongly prefer a unified system that can accommodate extended use cases.  Our rough impression is that the UI “skin” would be a relatively straightforward part to develop on its own, but the fundamental Argus backend use case differs – the main discussion points will be to decide if it’s feasible to have a common backend and/or pluggable architecture that can accommodate both applications. 

Argus meets some of our requirements. However currently misses others to be considered fit for purpose. To be a complete replacement for existing tools it must also achieve at least the following : 

We cover these below a little more extensively in the Gap Analysis and will from this document raise an RFI to give the development team and opportunity to feedback on the feasibility of filling these currently identified gaps. These are not considered exhaustive. 

Gap Analysis 

Functional Gaps 

Feature/Functionality: Alarm States

⁃ Current State (System A): Alarm states are complex and can also be coloured or flashing. The first and second line support teams are trained to recognise these at a glance.  
⁃ Current State (System B): Alarms only appear to have one fixed state 

Feature/Functionality: Multiple stages of Acknowledgment  

⁃ Current State (System A): Dashboard currently differentiates between 1st and 2nd line support teams acknowledgement that Alerts have been recognised  
⁃ Current State (System B): Alarms only have 1 level of acknowledgement. ( ‘Acked’ is a tag or status ) 

 Feature/Functionality: Correlation and Coalescing 

⁃ Current State (System A): Supports coalescing and correlating issues for remediation. 
⁃ Current State (System B): Lacks the ability to coalesce and correlate alerts effectively. 

Feature/Functionality: Status of live systems 

Current State (System A): Shows a status of services (traffic lights) 
Current State (System B): Lacks the ability to show a status of live services 

Feature/Functionality: Priority 

Current State (System A): Alerts can be prioritised by a number 
Current State (System B): Lacks the ability to prioritise. Only has severity which is different 

Technical Gaps 

Integration Points: Modern Technology Stack 

⁃ Current State (System A): Uses a legacy Java code base 
⁃ Current State (System B): Built on a modern web stack 

Integration Points: API Flexibility 

⁃ Current State (System A): Provides a well-established API for integration with other tools.
⁃ Current State (System B): Has limited API flexibility. 

Data Gaps 

Data Flow: History of Alarms 

⁃ Current State (System A): Data retention 
⁃ Current State (System B): No data retention 

Data Flow: Real-Time Monitoring 

⁃ Current State (System A): Supports real-time monitoring and updates.
⁃ Current State (System B): Lacks real-time monitoring capabilities. 

Recommendations 

Action Plan 

Risks and Mitigations 

Conclusion 

A new and modern OC alarm visualisation platform is required by the Geant NOC/SOC first and second line support teams. One that satisfies the needs of the consumers of the service but can be maintained by the development team. 

There is currently a backlog of feature requests coming from the NOC. We must recognise the importance of balancing user needs with development team maintainability. 

Any conclusions must be reached by a consensus within the internal development team and the work package leaders. It should be agreed that cooperation and sharing is a proactive and viable course of action. One that allows us focus on other development requirements like testing, automating deployments, architecture and integration with other services.