Procedures to be followed for the Preventive Maintenance at NIB nodes

The NIB Nodes and Data Network will be intimated regarding a Preventive maintenance program that is going to be conducted shortly by CG-Informatics. This will be announced a week in advance so that necessary measures can be taken up to inform the nodes-in-charge to be available at the site during the preventive maintenance. The services will be stopped during the preventive maintenance since the servers and networking equipments has to be shutdown for doing the maintenance. The Nodes in charge are requested to organize Vacuum cleaners at the site during the preventive maintenance since the same is required for removing the dust from the equipment. Since the maintenance requires shutdown, the same s will be done in off peak hours in co-ordination with node in charge and Data Networks.

DNW is requested to co-ordinate with node in-charge to setup SYSLOG (in case not done already) prior to the visit of CG engineer. The log level should be kept highest in RAS and Routers, so that all the errors could be recorded. Similarly SYSLOG configuration file on servers should be configured by DNW so that proper logs are available at the time of engineers visit to be able to identify the faults.

The engineer doing Preventive maintenance will be equipped with the necessary spares to bring the system up incase of a critical failure of equipment.

The Networking equipment in all locations will be taken for shutdown from 4.00 am and 6.00 AM and the internet services will be affected during this shutdown.

Each Server in all type A locations will be taken for shutdown for four hours for doing P.M and stress tests (4 Hr/Server).

Activities

1) Removal of dust from the Servers, Networking equipment and Personnel Computers. The dust from the Networking equipment will be blown out using vaccum cleaner through ventilation’s of the equipment where as servers and personnel computers to be done after removing the covers. This is as per the recommendation of the equipment manufacturer on maintenance. All the console/auxillary ports of routers, ras, servers etc will be checked for functionality and verification will be done whether the console ports of the servers are connected to the respective PCs.

Broken/missing covers of Servers, Routers, PCs etc. Loose Connector Strips at the back of PCs. Broken connectors etc will checked and recorded in the observation column of P.M report so that all of these that need to be replaced can be done at the earliest.

2)  The AC Input to the servers, Personnel computers and the DC input to networking equipment and grounding for the equipment will be tested for compliance with the standards (As per the values mentioned in the P.M Report against each equipments).

3)  Proper airflow to the equipment and the temperature at the node(ideal temperature mentioned in the P.M report) will be tested and recorded . The Fan unit in the racks will be checked for functionality and will be replaced if the same is not functional.

4)  The logs of routers and access servers will be check to ensure that the e1 and e3 links are error free. Any notification found on the log will be intimated to the Nodes in-charge and DNW and corrective actions will also be recommended. In addition to recommendation on corrective actions, any implementation required as a part of maintenance support will be provided. The CPU utilization and memory utilization of routers and Access server will tested and the same will be recorded. Show environment command will be used to record the environment details as shown by the router, ras. RAS/ROUTERS will be checked for any misconfigurations. Check for flapping of links will be done through log/SYSLOG and its effect on the performance of router will be observed.

5)  The health of RSP, VIP Cards , Port Adapters ,modem cards etc will be checked. The version of the IOS loaded in the Master flash and slave flash will be checked to ensure that the same is in sync with the recommendation given by the data network. Removing the RSP in slot2 of 7500 Router , the functionality of RSP in slot3 will be checked and the RSP will be swapped after verifying the same. Upgradation if necessary will be done in consultation with Data Networks.

Note: This Process will be done only once in 2 years , not to be followed for every P.M.

6)  The functionality of the TFTP server installed in the node will be tested and checking will be done to ensure that same is having valid IOS images and configuration of all networking equipment is stored on it.

7)  Check for the speed and duplex settings in the switch.The ports which are connected to Router , RAS and Servers will be made to full duplex.

8)  Check the access lists configured on the networking equipment to ensure the security policy followed by BSNL is in place to restrict intruders.

9)  Check the SYLOG on the server and verify the Operating system is working fine. Verify that the necessary patches for the OS is loaded in the system.. The status of the RAID box and health of the all the HDDs will be tested. Check all the tape drives. Ensure that hearbeat circuit is functional and clustering is properly configured. In case fail over is configured test for the functionality of the same will be done .Ensure that all unwanted daemons and applications are stopped.. All unwanted files or lines in the files containted in crontab directory will be removed. sar command will be executed to check the status of CPU, Cache, Memory, SWAP, HDD, FileSystem. IRIX-tuning if required will be done in consultation with data networks.

10)  Run Field Interface Stress test to ensure sanity of the system at hardware as well software installed on the system.

Test procedure: fst is an irix-based tool that is designed to provide a

Mechanism to stress systems for preventive maintenance and stress the

system to ensure that the "sanity" of the software and hardware installed

on the system. fst 'packs' two kinds of stress tests, Pandora and

Application Stress. [Server Verification Program (SVP) will be run to check the sanity of SGI-HW/OS]

a) Application Stress invokes various IRIX commands, demos, applications, etc. simultaneously in increasing stress for CPU/memory/graphics/etc.

b) Pandora stresses the system from system level for Memory, CPU, I/O, Network, and Graphic processes. All the tests are included in test cases files.

The detailed results of the test, some SYSLOG info, core files, etc. are saved in under /usr/diag/stress/RESULTS.

11)  Check the IRIX patches (security and bug fixes) are update [Where is the list of IRIX patches the availability of which would be seen]. Check if all the required applications are loaded on the servers.

12)  Check for the functionality of the applications loaded in the servers and the error logs. Ensure that the application loaded is the latest supplied to BSNL. In case application is not fully operational the installation status of the same will recorded in the log book maintained in the node and necessary guidance and assistance for making the application functional will be given to the node in charges. The performance of application will studied and if there is any need for application performance tuning or upgrade of hardware ,same will be done in consultation with DNW.

13)  Check for the Viruses the personnel computers installed as console and Helpdesk. Check if anti-virus software is loaded and configured on the PCs. Remove any application installed in the same, which is not necessary for the functionality of the application, which is intended to be run on the computer. Check that helpdesk PCs are not used for browsing, email and chat.

14)  Site wise format of PM Report will be prepared and signature of the Node Incharge with his remarks on satisfactory maintenance performed by CG, in addition to that of CG Engineer will be obtained. CG engineer shall meet the DGM, Incharge along with the node incharge and apprise him of the maintenace work before the start of job and after the completion of job. In case there are maintenance issues which cannot be resolved at the site, the will be reported under actions to be done and the same shall be escalated to the concerned agencies along with time frame for their resolution.

15)  Feedback of the Node Incharge on performance of the equipment, application etc and suggestions for improvement in applications and equipment features will be given.

16)  Log book maintained at the node will be checked to see the status of

Previous problems in hardware, software etc.