Latest Update:
Services are operational and have been stable since 8th August at 09:00 BST / 16:00 SGT
ISSUE ANALYSIS
As of 23:09 BST on Monday 7th August, our read database servers alerted us that they were experiencing some replication lag - this was picked up by our Development team at 02:45 BST (8 Aug). Access to the user-interface would not have impacted at this time.
It was identified that due to an unusually high load on our servers that host our Network API, our read database servers were impacted as a result.
At this stage we took the decision to stop our background processing services to allow our read database servers to recover. This would have meant that Publish Requests, Refinements and Player Configuration writes, among others, were not running for a period of time.
As of 04:00 BST (8 Aug) the reduced load induced by the above action had allowed our read database servers to recover, at which point the background services were systematically brought back online.
As of 06:00 BST (8 Aug) we were happy that the load had been handled and the platform was performing stably again, with all processes up and running. We identified another short period of time where load increased between 08:48 and 08:54 BST but this remained under control going forward.
As of 09:00 BST (8 Aug) Signagelive has been stable. Further analysis will take place over the coming days, and preventative measures will be put in place to ensure impact to our core services are mitigated going forward should such events recur.
STATUS UPDATES
8 Aug @ 13:00 BST / 20:00 SGT - Services are operational and have been since 06:00 AM BST. Our Development Team are continuing with a thorough investigation and analysis; more detailed updates will follow in due course.
8 Aug @ 09:15 AM BST / 16:15 SGT - Issue discussed on Signagelive 2nd Line Support call; updates provided and handed to UK Support Team for future communication.
8 Aug @ 13:05 SGT / 15:05 AEST - The publication queues are all caught up now so users that experienced issues should start to see things return to normal. There are still some other backend processes that are outstanding surrounding player notifications, for example, players reporting they have their latest content. So in some cases player content status updates may be delayed.
8 Aug @ 12:30 SGT / 14:30 AEST - Dev continues to work on bringing background processing back online but its a process that needs to done carefully. Most of the services are running again but we are still looking to get the main publication services caught up. For now, core services remain offline.
8 Aug @ 10:50 SGT / 12:50 AEST - We have stopped ALL of our background processing to allow for the communication lag we are observing across our servers to catch-up. This means any updates made and publications (new or refinements) will not be be processing right now. More to follow ASAP.
8 Aug @ 10:15 SGT / 12:15 AEST - We are aware of an ongoing issue with our servers that is impacting the general performance of Signagelive. This issue has already been escalated as a P1 and is being looked into on priority. Further updates will be posted to this article once available.