...
- Project Name: Argus / Dashboard
- Purpose: We Geant are currently in the process of redeveloping our alert aggregator NOC alarms dashboard and are seeking information as you have a similar product that is open-source.
- Required Delivery Schedule: Prototype Release Candidate in place for side-by-side acceptance testing in OC in Q3 2024. Final release, decommissioning of legacy dashboard in Q4 2024.
Existing Product Overview
- Product name: Dashboard
...
- Years in service : 10+
...
- Purpose:
...
- Correlation of network events and visualisation of Alarms
...
- representing root causes.
- Detailed description : We currently have a Java based application that
...
- displays alarms generated by the backend alarm correlation service. It has some custom features as required by the NOC/SOC and first line support team.
...
- Key features and specifications : Alarm States,
...
- Coalescing of alarms, Blacklisting, Filtering, Connections to other API's
Features required in the system
We have looked at evaluated our existing system vs against the Argus system and completed an internal Gap Analysis. From this we have some questions for the Argus development team on whether specific feature requests are feasible. These are the minimum requirements and non-negotiables as required by the consumers of the system. These features will probably require more discussion to be properly understood alongside a demo of the existing product.
Feature/Functionality | Gap Identified | Feedback/Comments |
Alarm Lifecycle | Our |
alarm states are more complex. We have at least 5 states of which 4 are represented by the GUI. |
For example, by means of flashing or different |
fonts. |
|
Multiple stages of Ack | We have first- and second-line support and their acknowledgements are represented in the GUI |
. Both with checkboxes and with modified colors in the row. |
| |
Correlation | The initial info for a particular alarm will change over time (during the Alarm Lifecycle), or it may be removed quickly. For example, multiple alarms that are immediately reported could be “squashed” into a single alarm after a few seconds. |
|
Coalescing | Multiple instances of an identical alarm need to be “merged” in the gui. But with an indication that this has happened, and how often. |
|
Status of integrations
Possibility of integrating a callback system for monitoring the status of related services? Either this or plugin support?
Priority
Backend health status UI | The gui must contain a panel, or some other indication, of the results of various real-time health checks on backend systems. |
|
History + Search | We need to keep all alarms that have ever happened, and their internal components (for example, individual BGP peerings or link states), to provide reporting for other services on availability and utilisation of service |
. |
| ||
Alarm logical post-processing rules | The OC can specify logical rules and change the characteristics of alarms. For example, if an alarm description contains particular keywords, or is related to a particular location, then the gui severity can be automatically changed, comments added, or perhaps hidden. It should be possible to apply this logic as new alarms enter some particular state, or to apply the logic to the existing database of old alarms. |
|
Filtering | Complex filtering. Filter groups AND / NOT |
|
Alarm internal details | Our OC requires that the internal components of an alarm are easily-browsable from the gui. For example: if an optical cut causes multiple |
L2 and L3 circuit failures, OC must be able to “open” these details (e.g. hostnames, ports, event start/end times, multiple flaps of the same, etc.) by clicking on the alarm in the top-level alarm row. |
|
Please provide feedback on the above technical requirements and if it would be reasonable to add include these features to in your development plan for in the coming yearnear future.
Technical requirements
- Compatibility with existing systems (TODO - list current API calls (i.e. for ticket generation)collector, classifier, correlator, inventory provider, otrs)
- Follow ITIL standards where possible in naming conventions etc
- Developed within the technical stack we have discussed
Support and Maintenance
If security issues become apparent within the system what is the response time that can be expected to patch such issues?
What is the predicted upgrade path for future versions?
If new features are requested after the key features are complete will this be possible? (revise backlog before any SOW)
Delivery and Implementation
We would require this to be developed through 2024 with continuous delivery so we can A/B test against the existing product as well as build integrations and setup of server infrastructure and automated deployments.
The key features would need to be delivered within the current year.
...