Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

General Information

Service URL - http://opsdb.dante.net/ 

OpsDB runs on two servers, named appropriately xxxxxxVMs on each environment (Prod, UAT and Test)


OpsDB is written using PHP 5.3.3, HTML, JavaScript, and runs in a Linux system environment (Centos).

Centos - CentOS-6 updates until November 30, 2020

PHP 5.3.3 FINISHED being officially supported,  but being supported via centos back porting of PHP security releases – end of life same as centos 6 system.

HTML / Javascript are currently supported and have no future planned support end dates, in fact older versions are more supported than the latest ones!.


First Steps

If for any reason the system becomes unavailable:

  • Check if the primary instance available by going to: http://prod-opsdb01.geant.net/
    • Suggests DNS issue with opsdb.dante.net - OC should be able to deal with it.
  • If Primary instance of OPSDB is not available then check if the secondary instance of the OPSDB available by going at http://prod-opsdb02.geant.net/
    • If yes,

...

    • switch the DNS entry for OPSDB from the ‘Primary(01)’ instance

...

    • to the ‘Secondary(02)’ instance. This will allow the general user to continue working on OpsDB whist we continue with our investigations as to why it initially went down.

Change the Domain Name System (DNS) entry for OpsDB (i.e. Move from one instance to another)

Currently we direct all the OpsDB public domain URI calls to the ‘01’ instance (the ‘Primary’ instance) of the appropriate OpsDB VM.

If required (i.e. the 01 instance of a VM is down) we can change the DNS to point our public domain URI to the 02 VM (the ‘Secondary;’ Instance) whilst the 01 VM is being fixed – this should ensure they service continues to be available to the public.

...

OpsDB 

Once this has been done the system should then be available to the users once again whilst more detailed investigation takes place into why the Primary instance has become unavailable.

Please do not forget to inform the users that OpsDB is back up once this has been done.


Further InvestigationThe following points may help troubleshot any issues that arise with this application. 


Check the VM is running

If out of hours, log into VCentre (please use win/adm-xxxx account) and check if the VMs are running.   If the server can't be pinged :

eg:  log into Frankfurt select top level (fra-prd-vc01.win.dante.org.uk).   Select VM's from tab use searchbar at top to search for the VM.

Image Added

If status of VM is stopped restart it using green button.   

If there are networking issues the OC will be able to troubleshoot this.

If the machine is running follow steps below:

Check Apache.

  • Has apache failed? Is it running?

...

      This should start or restart MySQL on the VM – please perform this on both VMs separately.

Recovery of MySQL Data

       Currently MySQL data backups are stored in the /opt/vackups/mysql folder within each VM.

       Each day the daily DB dump, from each server, is also copied to an appropriate place on the Data Warehouse machine.

       To restore any of these instances of data, locate the appropriate DB dump and go through the mysql restore procedure (documented elsewhere in MySQL documentation)

 

Security Updates with underlying software and operating systems

        OpsDB is, in terms of software, an ‘old lady’ now, awaiting retirement.

        It is currently written using PHP 5.3.3, HTML, JavaScript, and runs in a Linux system environment (Centos).

        Centos - CentOS-6 updates until November 30, 2020

        PHP 5.3.3 FINISHED being officially supported,  but being supported via centos back porting of PHP security releases – end of life same as centos 6 system.

        HTML / Javascript are currently supported and have no future planned support end dates, in fact older versions are more supported than the latest ones!.

Check disk usage

        Is the VM disk full?

        Is the allocated OpsDB disk space full.


Check Disk Usage

    Follow the steps here: clean up big files

            This should already be being monitored and reported upon if it is becoming full , so this scenario should never occur.


Final Step

 Please raise ticket with Software Development Support and include the details of the steps taken out of business hours so that detailed analyses of the failure can be carried out.