General Information
Service URL - http://opsdb.dante.net/
- Usual configuration - Points to primary instance.
- Primary Instance - http://prod-opsdb01.geant.net/,
- Secondary Instance - http://prod-opsdb02.geant.net/
OpsDB runs on two servers, named appropriately xxxxxxVMs on each environment (Prod, UAT and Test)
- xxxx-opsdb01.geant.net and xxxxx-opsdb02.geant.net where xxxxx = prod, uat, or test
OpsDB is written using PHP 5.3.3, HTML, JavaScript, and runs in a Linux system environment (Centos).
Centos - CentOS-6 updates until November 30, 2020
PHP 5.3.3 FINISHED being officially supported, but being supported via centos back porting of PHP security releases – end of life same as centos 6 system.
HTML / Javascript are currently supported and have no future planned support end dates, in fact older versions are more supported than the latest ones!.
First Steps
If for any reason the system becomes unavailable:
- Check if the primary instance available by going to: http://prod-opsdb01.geant.net/
- Suggests DNS issue with opsdb.dante.net - OC should be able to deal with it.
- If Primary instance of OPSDB is not available then check if the secondary instance of the OPSDB available by going at http://prod-opsdb02.geant.net/
- If yes,
...
- switch the DNS entry for OPSDB from the ‘Primary(01)’ instance
...
- to the ‘Secondary(02)’ instance. This will allow the general user to continue working on OpsDB whist we continue with our investigations as to why it initially went down.
If we find that both instances have become unavailable, then contact with IT / SWD is of the upmost urgency as further investigation, steps, and decisions will have to be taken across departments (i.e. IT / SWD / OC) as to the best way forward to resolve these issues
Change the Domain Name System (DNS) entry for OpsDB (i.e. Move from one instance to another)
Currently we direct all the OpsDB public domain URI calls to the ‘01’ instance (the ‘Primary’ instance) of the appropriate OpsDB VM.
If required (i.e. the 01 instance of a VM is down) we can change the DNS to point our public domain URI to the 02 VM (the ‘Secondary;’ Instance) whilst the 01 VM is being fixed – this should ensure they service continues to be available to the public.
To do this will require changes to be made by the systems administrator and is explained in the document below - this may require assistance from IT.
Service Reliability - design document (draft)
OpsDB
- Change the CNAME opsdb.dante.net in Infoblox, to point to prod-opsdb02.geant.net
Once this has been done the system should then be available to the users once again whilst more detailed investigation takes place into why the Primary instance has become unavailable.
Please do not forget to inform the users that OpsDB is back up once this has been done.
Further InvestigationThe following points may help troubleshot any issues that arise with this application.
Check the VM is running
If out of hours, log into VCentre (please use win/adm-xxxx account) and check if the VMs are running. If the server can't be pinged :
eg: log into Frankfurt select top level (fra-prd-vc01.win.dante.org.uk). Select VM's from tab use searchbar at top to search for the VM.
If status of VM is stopped restart it using green button.
If there are networking issues the OC will be able to troubleshoot this.
If the machine is running follow steps below:
Check Apache.
- Has apache failed? Is it running?
...
This should start or restart MySQL on the VM – please perform this on both VMs separately.
Recovery of MySQL Data
Currently MySQL data backups are stored in the /opt/vackups/mysql folder within each VM.
Each day the daily DB dump, from each server, is also copied to an appropriate place on the Data Warehouse machine.
To restore any of these instances of data, locate the appropriate DB dump and go through the mysql restore procedure (documented elsewhere in MySQL documentation)
Security Updates with underlying software and operating systems
OpsDB is, in terms of software, an ‘old lady’ now, awaiting retirement.
It is currently written using PHP 5.3.3, HTML, JavaScript, and runs in a Linux system environment (Centos).
Centos - CentOS-6 updates until November 30, 2020
PHP 5.3.3 FINISHED being officially supported, but being supported via centos back porting of PHP security releases – end of life same as centos 6 system.
HTML / Javascript are currently supported and have no future planned support end dates, in fact older versions are more supported than the latest ones!.
Check disk usage
Is the VM disk full?
Is the allocated OpsDB disk space full.
Check Disk Usage
Follow the steps here: clean up big files
This should already be being monitored and reported upon if it is becoming full , so this scenario should never occur.
Final Step
Please raise ticket with Software Development Support and include the details of the steps taken out of business hours so that detailed analyses of the failure can be carried out.