Service Issues - 27/09/2021 - [Complete]
Ian Maison avatar
Written by Ian Maison
Updated over a week ago

SUMMARY

Latest Update -

/ RESOLVED /

17:55 PM BST -

As scheduled we started the update of the Signagelive cloud to the latest version at 09:00 AM BST.

This update was scheduled to take up to one hour between 09:00 AM BST and 10:00 AM BST however the update took longer than expected as initially the UI was not correctly loading.

As part of our upgrade process before we re-enable public access to our services we perform some high level testing, after updating the cloud to the latest release the UI was not working as expected so could not be enabled.

During this type of incident we would typically review log files and errors being generated as we like most cloud apps rely upon some 3rd party services to do this.

Unfortunately at this time one of those services (Solar Winds Loggly) were also experiencing an outage, in their case an indexing delay meaning that new errors being sent to their system were not being processed and available for us to review as part of our investigations.

This led us to have to diagnose the update issues manually which unfortunately took longer than we had anticipated however most users' access to our UI was restored by 10.30 AM BST.

Once UI access was restored we were then able to identify that Players were not connecting to the platform to activate, content check or receive content updates.

Again as we did not have access to our logging information we had to diagnose this issue manually. Through the use of one of our other monitoring tools (New Relic) we could see that we had a performance issue with the API but not direct visibility regarding the root cause. Unfortunately during the diagnosis of this issue New Relic started to experience a UI issue as per their public status page which further slowed us down. Through further manual testing of the API we were able to identify an issue with a network appliance at Rackspace that was not routing traffic to our API servers that handle player traffic. We were able to rectify this issue by switching to a different load balancing algorithm at which point activation, content checking and content updates started working.

We now have full logging and diagnostics back and are looking into one remaining issue:

  • Players reporting health checks but not content checks

Once we have restored all services to normal we will be running through each issue we have encountered today and will be determining how we can avoid them in the future.

Once again we sincerely apologise for any inconvenience caused.

PAST UPDATES

15:45 PM BST - We continue to see improvement in the situation across the various issues that have been escalated today. We are continuing to monitor and work on the situation, and will be posting a thorough overview as and when the issues are completely resolved.

Here is a list of the items we are/were aware of/resolved, thus far:

  • SMIL - Intermittent issues with the SMIL API not correctly returning, [RESOLVED]

  • Message Manager - If you have Message Manager selected it will automatically log you out.

  • Licences - Generating licences not working.

  • Thumbnails - Asset thumbnails not showing for media.

Further updates to follow. Thank you for your patience.

13:02 PM BST - We are seeing improvement in the situation and players are starting to connect again. We have implemented a modification to alleviate some of the issues whilst we continue to track this issue down.

Further updates to follow. Thank you for your patience.

12:47 PM BST - We are continuing to investigate issues surrounding UI and Player Connectivity performance.

Further updates to follow. Thank you for your patience.

11:50 AM BST - As per the planned maintenance today and release of Granular User Permissions, we are aware of performance issues around the User Interface (slow experience and logging in) and also Player connectivity experiencing issues with certain networks.

Our team are continuing to investigate as a priority and we will post further updates as and when we have them. Thank you for your patience.

RESOLUTION / CAUSE

ONGOING - Please see the 17:55 PM BST update. This is the current status.

FOR MORE INFORMATION

For current system status information about Signagelive, please check out our system status page. During an issue, you can also receive status updates by following Beamer Posts within the user interface. The summary of our investigation will be posted here when the issues has ended. If you have any concerns, questions, wish to report further issues or are looking for updates - please don't hesitate to contact our Support Team.

Did this answer your question?