Nuera GX-8K

Nuera Communications Inc

Nuera Communications Inc. provide Voice over IP, Media Gateway devices to Telecommunications companies.
They have NEBS Compliant telecommunicaiton device for telephone exchanges which act as gateways between the traditional TDM Telephone network and the new IP network. 

Nuera - Failsafe Boot

Enabled Nuera to claim Failsafe and High Availability feature for their Media Gateway product. Designed and implemented Failsafe Boot feature to allow easy maintenance of system on card. Implemented a system where at least one working copy of the system was always maintained on the card resulting in reduction of products being returned to factory and the product gained reputation as being reliable and failsafe.
Mindmap of Failsafe Boot for Nuera

Challenge

  • When new features were added to Nuera's system a new code image was sent to the customers.  These customers had live systems which were servicing live subscribers and Nuera had to make sure that the live systems were never put out of service.
  • The system design was such that the customer would download the code image sent by Nuera and then reboot one of the redundant pair of controllers within the system into the new code image while the standby card took over the system functions. Then the new code would get automatically copied onto the other card so that both cards now had the new software.
  • When new code image is downloaded to a live system, since the control module is very busy there is a distinct possibility that the code download fails or results in corrupt image being written.
  • This meant that whenever Nuera sent a new code image to its customers, the customers were potentially put at risk of bringing their systems down and the cards needed to be sent back to factory for rectification if they had corrupt image of system on them.

Action Taken

  • A system was designed where two code images are maintained on the system at all times - with at least one of them being an old tried and tested code image, while the other could be a new image.
  • It was designed so that when code downloading process is started the system immediately marks the image being overwritten as "Code Bad" so that if the downloading is aborted the next time the code download can overwrite this image without further checks.  When code download is complete the system marks the code status as being "New untried code".
  • When the system is rebooted, the booting process checks the customer preferred image and its status.  It was designed so that if it is a new image then immediately the booting process would write a "Boot Attempt n" status into the flash for this image and then proceeds to boot this image - here 'n' is the number of attempts already made.  If it succeeds in booting the system then the system would write a "Code OK" status into the flash.  If the card reboots due to any problem before it reaches this state then when the card restarts, the booting process modifies the status as "Boot Attempt 2" and so on. If the booting process detects a status of "Boot Attempt 3" it means that three attempts at booting this image have failed and now the booting process marks this code image as "Code Bad" and proceeds to boot the other code image from the back up.  The customer would also be shown a warning message that this has happened.
  • This task was difficult because as in most cases the boot code which decides which image to attempt booting itself is a very small code.  It was required to incorporate all the above logic in this limited code space.  The code was also difficult to debug since the system is as yet trying to come up and no debug system would be available.  In circuit emulators which are normally used to debug code can not be used since the In circuit emulators actually bypass the boot code and they boot the image from the memory of the PC.
  • This fail safe boot system designed, coded and tested  and this was found to be a real life saver at the customers site on several occasions.

Result

One of the major USPs of Nuera was the reliability and ruggedness of their system.  The system that was designed allowed this USP to be provable to the customers.  When the customers compared our system to the ones from other suppliers they found that Nuera's system was indeed more robust and reliable even as new features were added to the system without affecting the service.