tag:status.trackabout.com,2005:/historyTrackAbout System Status - Incident History2024-03-28T20:06:00-04:00TrackAbout Systemtag:status.trackabout.com,2005:Incident/203528192024-03-25T13:42:25-04:002024-03-25T13:42:25-04:00Production Environment Patch Deployment for the AWS0345 release<p><small>Mar <var data-var='date'>25</var>, <var data-var='time'>13:42</var> EDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Mar <var data-var='date'>25</var>, <var data-var='time'>13:30</var> EDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Mar <var data-var='date'>25</var>, <var data-var='time'>13:23</var> EDT</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Monday, March 25th, 2024 starting at 1:30 PM US Eastern Time (Monday, March 25th, 2024 17:30 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.345.27</p>tag:status.trackabout.com,2005:Incident/201098702024-02-29T12:49:06-05:002024-02-29T12:49:06-05:00Production Environment Deployment for the AWS0345 release<p><small>Feb <var data-var='date'>29</var>, <var data-var='time'>12:49</var> EST</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Feb <var data-var='date'>29</var>, <var data-var='time'>12:15</var> EST</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Feb <var data-var='date'>29</var>, <var data-var='time'>12:06</var> EST</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Thursday, February 29th, 2024 starting at 12:15 PM US Eastern Time (Thursday, February 29th, 2024 17:15 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.345.16<br />New TAMobile 6 Version: 6.0.345.1<br />New TAMobile 7 iOS Version: 7.345.1.2602<br />New TAMobile 7 Android Version: 7.345.1.2602</p>tag:status.trackabout.com,2005:Incident/198822352024-02-01T03:05:10-05:002024-02-01T03:05:10-05:00Performance issues<p><small>Feb <var data-var='date'> 1</var>, <var data-var='time'>03:05</var> EST</small><br><strong>Resolved</strong> - The root cause of the performance issue was traced to ... well, let's just say it's not a good idea to attach 24,747,737 Delivered Not-Scanned assets on a single record and then attempt to view the Record Detail page where each NS asset is displayed individually.<br /><br />We're going to look into enforcing a cap on Not-Scanned quantity attached to a single record and see if we can add short-circuit protection to the Record Detail/Summary web page in case a large quantity of Not-Scanned assets do get attached to a record.</p><p><small>Feb <var data-var='date'> 1</var>, <var data-var='time'>02:17</var> EST</small><br><strong>Monitoring</strong> - The system has stabilized. We are monitoring.</p><p><small>Feb <var data-var='date'> 1</var>, <var data-var='time'>02:08</var> EST</small><br><strong>Investigating</strong> - We are currently investigating a performance issue with the site. There appears to be a customer workload that is causing extreme CPU consumption and server RAM growth. We're scaling the environment to accommodate while we trace the root cause.</p>tag:status.trackabout.com,2005:Incident/198678382024-01-30T12:20:58-05:002024-01-30T12:20:58-05:00Production Environment Deployment for the AWS0344 release<p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>12:20</var> EST</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>12:00</var> EST</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jan <var data-var='date'>30</var>, <var data-var='time'>10:42</var> EST</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Tuesday, January 30th, 2024 starting at 12:00 PM US Eastern Time (Tuesday, January 30th, 2024 17:00 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.344.21<br />New TAMobile 7 iOS Version: 7.344.0.2533<br />New TAMobile 7 Android Version: 7.344.0.2533</p>tag:status.trackabout.com,2005:Incident/194727142023-12-18T13:54:38-05:002023-12-18T13:54:38-05:00Production Environment Deployment for the AWS0343 release<p><small>Dec <var data-var='date'>18</var>, <var data-var='time'>13:54</var> EST</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Dec <var data-var='date'>18</var>, <var data-var='time'>13:15</var> EST</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Dec <var data-var='date'>18</var>, <var data-var='time'>13:11</var> EST</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Monday, December 18th, 2023 starting at 1:15 PM US Eastern Time (Monday, December 18th, 2023 18:15 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.343.39</p>tag:status.trackabout.com,2005:Incident/193546802023-12-06T12:46:47-05:002023-12-06T12:46:47-05:00Production Environment Deployment for the AWS0343 release<p><small>Dec <var data-var='date'> 6</var>, <var data-var='time'>12:46</var> EST</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Dec <var data-var='date'> 6</var>, <var data-var='time'>12:30</var> EST</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Dec <var data-var='date'> 6</var>, <var data-var='time'>12:14</var> EST</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Wednesday, December 6th, 2023 starting at 12:30 PM US Eastern Time (Wednesday, December 6th, 2023 17:30 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.343.34<br />New TAMobile 7 iOS Version: 7.343.1.2461<br />New TAMobile 7 Android Version: 7.343.1.2461</p>tag:status.trackabout.com,2005:Incident/191399662023-11-16T13:30:23-05:002023-11-16T13:30:23-05:00Production Environment Deployment for the AWS0342 release<p><small>Nov <var data-var='date'>16</var>, <var data-var='time'>13:30</var> EST</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Nov <var data-var='date'>16</var>, <var data-var='time'>13:01</var> EST</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Nov <var data-var='date'>16</var>, <var data-var='time'>12:49</var> EST</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Thursday, November 16th, 2023 starting at 1:00 PM US Eastern Time (Thursday, November 16th, 2023 18:00 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.342.21<br />New TAMobile 7 iOS Version: 7.342.2.2392<br />New TAMobile 7 Android Version: 7.342.2.2392</p>tag:status.trackabout.com,2005:Incident/191167782023-11-15T09:50:55-05:002023-11-15T09:50:55-05:00Continued performance issues<p><small>Nov <var data-var='date'>15</var>, <var data-var='time'>09:50</var> EST</small><br><strong>Resolved</strong> - We had a clean, well-performing 24 hours. We're going to close this incident while we continue to work with Azure Engineering on the root cause analysis. We'll share our findings in a post-mortem to this and the previous incidents on our status page.</p><p><small>Nov <var data-var='date'>14</var>, <var data-var='time'>18:08</var> EST</small><br><strong>Update</strong> - Today, we've been working on performance tuning and root cause analysis.<br /><br />In recent weeks we have been experiencing anomalous behavior in the Azure environment, and we've engaged with Azure Engineering to get to the bottom of it. We had a good call today with a skilled engineer, but he needs to engage other members of the team to answer some of our questions.<br /><br />For now, we have isolated the busiest database in our environment that was partly to blame for the outage. That database is in a dedicated, standalone server for now.<br /><br />We have also increased our Azure SQL elastic pool service level objective by two full tiers to increase our performance headroom.<br /><br />We are reducing the MAXDOP, or maximum degree of parallelism, across the database fleet. We hope to reduce the number of threads/workers per database. Running out of threads/workers was one symptom of our issues last night.<br /><br />We've found some queries whose performance has gone awry, which can contribute to a cascade failure. We're working on those at present.<br /><br />We are continuing to spend our time working on performance engineering.</p><p><small>Nov <var data-var='date'>14</var>, <var data-var='time'>07:18</var> EST</small><br><strong>Update</strong> - Performance appears to be looking much better at the moment. We don't have any more changes to make at this time. We're going to monitor the situation.</p><p><small>Nov <var data-var='date'>14</var>, <var data-var='time'>06:40</var> EST</small><br><strong>Monitoring</strong> - This is a continuation of the issue from earlier. We closed the incident too soon and could not append to it.<br /><br />We've been getting reports from the field that users are unable to sync handhelds. This appears to be due to continued high database load.<br /><br />Our Azure SQL Database elastic pool performance transition to a higher tier took 2.5 hours, which is 1.5 hours longer than we'd ever seen before.<br /><br />That transition just now completed and we're monitoring performance.</p>tag:status.trackabout.com,2005:Incident/191149802023-11-14T05:10:50-05:002023-11-14T05:10:50-05:00Service outage<p><small>Nov <var data-var='date'>14</var>, <var data-var='time'>05:10</var> EST</small><br><strong>Resolved</strong> - The service has remained stable since the last update, and we still have more capacity coming online shortly.<br /><br />We are meeting with Microsoft Azure engineers tomorrow to discuss the change in performance behavior we are seeing lately with Azure SQL Database elastic pools. The last few weeks we have seen some very anomalous behavior and we need to get to the bottom of it. We've had several back-and-forth exchanges over email and we're not getting the answers we need. We've escalated to having a call tomorrow and we'll continue to escalate until we get to a satisfactory answer.<br /><br />We regret having to take the full outage tonight and we're sure we have some frustrated customers at the moment. We do apologize for the performance incidents of late and we're working hard to do better.</p><p><small>Nov <var data-var='date'>14</var>, <var data-var='time'>04:38</var> EST</small><br><strong>Monitoring</strong> - The service appears stable, although there is some elevated load for a couple big customers. We are bringing additional capacity online now which should help.</p><p><small>Nov <var data-var='date'>14</var>, <var data-var='time'>04:09</var> EST</small><br><strong>Update</strong> - We are now bringing services back online.</p><p><small>Nov <var data-var='date'>14</var>, <var data-var='time'>03:50</var> EST</small><br><strong>Update</strong> - We are going to have to take a full outage. We have too much backlog of work and the infrastructure cannot catch up. We'll report back as soon as possible.</p><p><small>Nov <var data-var='date'>14</var>, <var data-var='time'>03:02</var> EST</small><br><strong>Identified</strong> - We've made a change to the database tier and are waiting for it to complete to relieve the congestion.</p><p><small>Nov <var data-var='date'>14</var>, <var data-var='time'>02:33</var> EST</small><br><strong>Investigating</strong> - We are currently investigating a service outage.</p>tag:status.trackabout.com,2005:Incident/189721512023-10-30T16:44:06-04:002023-10-30T16:44:13-04:00Brief service interruption<p><small>Oct <var data-var='date'>30</var>, <var data-var='time'>16:44</var> EDT</small><br><strong>Resolved</strong> - The TrackAbout service experienced a brief service interruption between 3:56 and 4:20 PM Eastern US Time. We've traced the failure to an external dependency, Auth0/Okta, which TrackAbout relies on for user authentication services.<br /><br />We believe the incident only impacted users attempting to log in or authenticate during this time period (including API calls that authenticate with every request rather than reusing a token).<br /><br />We're reaching out to Auth0 to see what happened.<br /><br />The issue resolved itself.</p>tag:status.trackabout.com,2005:Incident/189693382023-10-30T12:59:18-04:002023-10-30T12:59:18-04:00Production Environment Patch Deployment for the AWS0341 release<p><small>Oct <var data-var='date'>30</var>, <var data-var='time'>12:59</var> EDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Oct <var data-var='date'>30</var>, <var data-var='time'>12:30</var> EDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Oct <var data-var='date'>30</var>, <var data-var='time'>11:55</var> EDT</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Monday, October 30th, 2023 starting at 12:30 PM US Eastern Time (Monday, October 30th, 2023 16:30 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.341.50</p>tag:status.trackabout.com,2005:Incident/187338252023-10-09T13:38:16-04:002023-10-09T13:38:16-04:00Production Environment Deployment for the AWS0341 release<p><small>Oct <var data-var='date'> 9</var>, <var data-var='time'>13:38</var> EDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Oct <var data-var='date'> 9</var>, <var data-var='time'>13:00</var> EDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Oct <var data-var='date'> 9</var>, <var data-var='time'>12:26</var> EDT</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Monday, October 9th, 2023 starting at 1:00 PM US Eastern Time (Monday, October 9th, 2023 17:00 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.341.49 <br />New TAMobile 6 Version: 6.0.341.2<br />New TAMobile 7 iOS Version: 7.341.2.2333<br />New TAMobile 7 Android Version: 7.341.2.2333</p>tag:status.trackabout.com,2005:Incident/186120442023-09-26T11:10:58-04:002023-09-26T11:10:58-04:00Slowness detected<p><small>Sep <var data-var='date'>26</var>, <var data-var='time'>11:10</var> EDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Sep <var data-var='date'>26</var>, <var data-var='time'>10:45</var> EDT</small><br><strong>Monitoring</strong> - The change we made appears to have had the desired effect. Performance has returned to normal. We are continuing to monitor the situation. <br /><br />We have also opened a high severity ticket with Azure support to investigate why this event occurred. We're not seeing a cause from our vantage point.</p><p><small>Sep <var data-var='date'>26</var>, <var data-var='time'>10:20</var> EDT</small><br><strong>Update</strong> - We've detected a potentially unhealthy state in one of our Azure SQL Elastic Pools and are going to invoke a change to attempt to resolve it. This operation can take 10-20 minutes to take full effect.</p><p><small>Sep <var data-var='date'>26</var>, <var data-var='time'>10:13</var> EDT</small><br><strong>Investigating</strong> - We are currently investigating slowness of the database tier affecting all clients.</p>tag:status.trackabout.com,2005:Incident/185458912023-09-14T16:00:00-04:002023-09-19T09:49:26-04:00Browser Login Issues on September 14, 2023 with error NS_ERROR_NET_INADEQUATE_SECURITY<p><small>Sep <var data-var='date'>14</var>, <var data-var='time'>16:00</var> EDT</small><br><strong>Resolved</strong> - On Thursday, September 14th, 2023 at approximately 4:12 PM, we received word from a single customer having login problems with the TrackAbout web site (but not the mobile apps). Their browser was reporting an error of type "NS_ERROR_NET_INADEQUATE_SECURITY" when attempting to log in.<br /><br />After investigating, we could find no cause for such an error. No TrackAbout infrastructure changes had been made that day. We were left to conclude that this must be a local issue with the customer's infrastructure, most likely an HTTP proxy issue.<br /><br />Several hours later, we received the same complaint from a second customer. With two disparate customers having the same issue, we turned our eyes to our new identity provider Auth0 (Okta) and opened a support ticket.<br /><br />After some hours, Auth0 responded that they *had* made an infrastructure change without notification and that they had reverted the change. Their change was the cause of our customers' authentication issues.<br /><br />This morning we received the following post-mortem report from Auth0:<br /><br />----<br />"I want to start by apologizing for the impact this had on you and your clients, I can certainly understand why this would put you in a challenging position and I'm sorry that occurred. Our Engineering team provided a summary of the event that I've copied below:<br /><br />On September 15, 2023, our Engineering team began to investigate a series of TLS cipher negotiation failures impacting our customers across multiple Private Cloud environments as well as our Public Cloud US region. Specifically, a subset of browsers began to experience errors when attempting to use our services through a custom domain.<br /><br />The root cause of this issue was traced back to certain clients negotiating a cipher with our edge provider, then being rejected due to the cipher being on a banned list.<br /><br />No further change is required by our customers and we do not expect this issue to recur following the rollback performed by our Engineering team. We sincerely apologize for any impact this had on you and your users.<br /><br />I can add that the change was made to mitigate a separate certificate related issue that our Engineering team had identified as being a potential problem in the future. This is all of the information that has been provided about this event so far and I hope it's helpful for you and your team. Again I'm sorry for the position this put you in with your clients and please let me know if you have any follow up questions after reviewing."<br />----<br /><br />As a preventative measure, we are adding change monitoring of the TLS/SSL certificates protecting our Auth0 authentication URLs. We need to know as quickly as possible if any change has been made to the configuration that Auth0 is managing. We are writing our own monitoring test as well as employing a popular TLS/SSL site quality checker from Qualsys SSL Labs.<br /><br />Additionally, Auth0 is a relatively new third-party dependency for us, and a critical one at that, as it is the gateway through which all users authenticate.<br /><br />Going forward, if we receive word of users having login problems, we will immediately open a ticket with Auth0.<br /><br />Finally, I have let Auth0 know that making unannounced changes involving endpoint security has the potential to put their customers in an uncomfortable position and have strongly recommended that they publish notifications of changes.<br /><br />Larry Silverman<br />Chief Technology Officer<br />TrackAbout, Inc.</p>tag:status.trackabout.com,2005:Incident/182962762023-08-28T14:29:58-04:002023-08-28T14:29:58-04:00Production Environment Deployment for the AWS0340 release<p><small>Aug <var data-var='date'>28</var>, <var data-var='time'>14:29</var> EDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Aug <var data-var='date'>28</var>, <var data-var='time'>14:00</var> EDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Aug <var data-var='date'>28</var>, <var data-var='time'>13:42</var> EDT</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Monday, August 28th, 2023 starting at 2:00 PM US Eastern Time (Monday, August 28th, 2023 18:00 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.340.37<br />New TAMobile 7 iOS Version: 7.340.1.2197<br />New TAMobile 7 Android Version: 7.340.1.2209</p>tag:status.trackabout.com,2005:Incident/181430672023-08-14T12:43:38-04:002023-08-14T12:43:38-04:00Production Environment Patch Deployment for the AWS0339 release<p><small>Aug <var data-var='date'>14</var>, <var data-var='time'>12:43</var> EDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Aug <var data-var='date'>14</var>, <var data-var='time'>12:30</var> EDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Aug <var data-var='date'>14</var>, <var data-var='time'>12:21</var> EDT</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Monday, August 14th, 2023 starting at 12:30 PM US Eastern Time (Monday, August 14th, 2023 16:30 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.339.88</p>tag:status.trackabout.com,2005:Incident/179474652023-07-24T14:18:14-04:002023-07-24T14:18:14-04:00Production Environment Patch Deployment for the AWS0339 release<p><small>Jul <var data-var='date'>24</var>, <var data-var='time'>14:18</var> EDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jul <var data-var='date'>24</var>, <var data-var='time'>14:00</var> EDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jul <var data-var='date'>24</var>, <var data-var='time'>13:13</var> EDT</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Monday, July 24th, 2023 starting at 2:00 PM US Eastern Time (Monday, July 24th, 2023 18:00 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.339.87</p>tag:status.trackabout.com,2005:Incident/178275642023-07-11T16:17:45-04:002023-07-11T16:17:45-04:00Some users cannot log in<p><small>Jul <var data-var='date'>11</var>, <var data-var='time'>16:17</var> EDT</small><br><strong>Resolved</strong> - The change has now been rolled back.</p><p><small>Jul <var data-var='date'>11</var>, <var data-var='time'>16:06</var> EDT</small><br><strong>Identified</strong> - About 30 minutes ago, we went live with a significant change to TrackAbout login.<br /><br />We're seeing that a small handful of users are having login issues, and we can see why. We're going to roll back the change, investigate our logs, make some adjustments and try again in the near future.</p>tag:status.trackabout.com,2005:Incident/178260532023-07-11T13:41:15-04:002023-07-11T13:41:15-04:00Production Environment Patch Deployment for the AWS0339 release<p><small>Jul <var data-var='date'>11</var>, <var data-var='time'>13:41</var> EDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jul <var data-var='date'>11</var>, <var data-var='time'>13:15</var> EDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jul <var data-var='date'>11</var>, <var data-var='time'>13:02</var> EDT</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Tuesday, July 11th, 2023 starting at 1:15 PM US Eastern Time (Tuesday, July 11th, 2023 17:15 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.339.84</p>tag:status.trackabout.com,2005:Incident/178147382023-07-10T13:15:11-04:002023-07-10T13:15:11-04:00Production Environment Deployment for the AWS0339 release<p><small>Jul <var data-var='date'>10</var>, <var data-var='time'>13:15</var> EDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jul <var data-var='date'>10</var>, <var data-var='time'>11:15</var> EDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jul <var data-var='date'>10</var>, <var data-var='time'>10:44</var> EDT</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Monday, July 10th, 2023 starting at 11:15 AM US Eastern Time (Monday, July 10th, 2023 15:15 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.339.83<br />New TAMobile 7 iOS Version: 7.339.6.2064<br />New TAMobile 7 Android Version: 7.339.6.2064</p>tag:status.trackabout.com,2005:Incident/172939392023-05-18T15:07:58-04:002023-05-18T15:07:58-04:00Production Environment Patch Deployment for the AWS0338 release<p><small>May <var data-var='date'>18</var>, <var data-var='time'>15:07</var> EDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>May <var data-var='date'>18</var>, <var data-var='time'>14:30</var> EDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>May <var data-var='date'>18</var>, <var data-var='time'>14:23</var> EDT</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Thursday, May 18th, 2023 starting at 2:30 PM US Eastern Time (Thursday, May 18th, 2023 18:30 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.338.54</p>tag:status.trackabout.com,2005:Incident/169543202023-04-24T15:24:02-04:002023-04-24T15:24:02-04:00Production Environment Deployment for the AWS0338 release<p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>15:24</var> EDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>13:45</var> EDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Apr <var data-var='date'>24</var>, <var data-var='time'>13:32</var> EDT</small><br><strong>Scheduled</strong> - TrackAbout will release new code to the Production environment on Monday, April 24th, 2023 starting at 1:45 PM US Eastern Time (Monday, April 24th, 2023 17:45 UTC).<br /><br />There will be no downtime or service interruption.<br /><br />New Application Web Site Version: 15.0.338.51<br />New TAMobile 7 iOS Version: 7.388.2.1912<br />New TAMobile 7 Android Version: 7.388.2.1912</p>tag:status.trackabout.com,2005:Incident/165937282023-03-21T13:47:46-04:002023-03-21T13:47:46-04:00Test Environment - Failed Deploy<p><small>Mar <var data-var='date'>21</var>, <var data-var='time'>13:47</var> EDT</small><br><strong>Resolved</strong> - This incident has been resolved.<br /><br />We updated the deployment scripts that failed during the last deploy and redeployed.</p><p><small>Mar <var data-var='date'>21</var>, <var data-var='time'>12:25</var> EDT</small><br><strong>Investigating</strong> - The TrackAbout Customer-Facing Test environment (test.trackabout.com) is currently in a degraded state due to a failed code deploy. We're working to bring it back online ASAP.<br /><br />A new method we created for deploying database schema changes to all customer databases failed due to the higher quantity of databases in the Test environment compared to our development and QA environments.<br /><br />We know where the problem lies and we're working on a fix.<br /><br />We're aware some of our customers use this environment for employee training.<br /><br />Please know that as this is a Test environment, the test.trackabout.com site does not have the same Service Level Agreement (SLA) as our Production environment. Regardless, we're working to get it operational as quickly as possible.</p>tag:status.trackabout.com,2005:Incident/164881962023-03-14T11:02:49-04:002023-03-14T15:52:19-04:00Slow Performance<p><small>Mar <var data-var='date'>14</var>, <var data-var='time'>11:02</var> EDT</small><br><strong>Resolved</strong> - This incident has been resolved. We'll be publishing a post-mortem soon with our findings and improvements we'll be making going forward.</p><p><small>Mar <var data-var='date'>14</var>, <var data-var='time'>05:22</var> EDT</small><br><strong>Update</strong> - The system appears to have stabilized. We are monitoring.</p><p><small>Mar <var data-var='date'>14</var>, <var data-var='time'>05:04</var> EDT</small><br><strong>Update</strong> - We're continuing to see some slow performance here and there. We're currently thinking it's due to backlog of work from earlier. We're monitoring the situation and keeping an eye on individual high-load databases.</p><p><small>Mar <var data-var='date'>14</var>, <var data-var='time'>04:04</var> EDT</small><br><strong>Monitoring</strong> - We believe we have identified the root cause of the sudden worker process spike from 3% to 100% that caused the outage.<br /><br />We have a separate team working on building a data warehouse/BI offering. They had scheduled a number of heavy data migrations for 1 AM Eastern Daylight Time, the exact time we started experiencing problems.<br /><br />We found the data warehouse job and killed it. The system began recovering almost immediately.<br /><br />We expect operations to return to normal shortly.</p><p><small>Mar <var data-var='date'>14</var>, <var data-var='time'>03:41</var> EDT</small><br><strong>Update</strong> - Azure SQL Elastic Pool transition is at 86% complete.<br /><br />We've determined that at about 1:00 AM Eastern Daylight Time, our SQL pool worker percentage (the percentage of available worker processes in the SQL environment) went from 3% right up to 100% and stayed there. We're trying to find the cause of this very sudden spike in load.<br /><br />When the workers in the SQL engine are exhausted, it cannot process further work until things calm down.<br /><br />Increasing the size of our Azure SQL Elastic Pool as we have done increases the maximum number of workers.<br /><br />Finding the cause of this spike is our highest priority.</p><p><small>Mar <var data-var='date'>14</var>, <var data-var='time'>03:09</var> EDT</small><br><strong>Update</strong> - While the Azure SQL Elastic Pool is migrating, we are seeing numerous database connection timeouts which indicate some of our customers are unable to use the application. The capacity migration is at 45% complete at this time. It started at 31 minutes past the hour. We hope to have operations restored in less than an hour.</p><p><small>Mar <var data-var='date'>14</var>, <var data-var='time'>02:48</var> EDT</small><br><strong>Identified</strong> - We've identified the Azure SQL Elastic Pool is again suffering poor performance under load. The pool is currently transitioning to increase power. This may result in periods of inaccessibility while the transition takes place.</p><p><small>Mar <var data-var='date'>14</var>, <var data-var='time'>02:37</var> EDT</small><br><strong>Investigating</strong> - We are investigating a report of slowness of the site. We are boosting capacity of our Azure SQL instance to compensate.</p>tag:status.trackabout.com,2005:Incident/163814702023-03-07T03:00:00-05:002023-03-07T16:34:34-05:00Recent Performance Issues<p><small>Mar <var data-var='date'> 7</var>, <var data-var='time'>03:00</var> EST</small><br><strong>Resolved</strong> - Two times over the past two days at around 3:00 AM Eastern US Time, we experienced a significant increase in load from a small number of large customers. Specifically, we saw very high CPU consumption within our Azure SQL database environment.<br /><br />This resulted in application slowness, inaccessibility at times, and understandable customer frustration.<br /><br />Both times this occurred, we increased capacity, but increasing capacity levels in an Azure SQL Database Elastic Pool can take over an hour to fully complete.<br /><br />When the performance problems appeared, our database systems appear to have hit a critical breaking point resulting in cascade failure. With more demand, database queries were returning more slowly, and users were getting impatient.<br /><br />Impatient users retry. We saw one of our users request the same Load Truck page URL over 100 times in a rather short time span.<br /><br />Every page request represents a workload that must be queued and eventually processed, regardless of whether the user waited around for the response or not. The work queued up, the system slowed down further, and it could not recover.<br /><br />We don't see any one thing to point to that is "broken". We are just getting more load from our customers than we're used to.<br /><br />It is a constant challenge to maintain an optimum level of capacity that is both cost-effective and which yields acceptable performance. In a large, dynamic system with uneven workload, sometimes, unfortunately, you learn where that line is by crossing it.<br /><br />The options at this time are (1) throw money at the problem and increase capacity and (2) tune databases and database queries to perform better than they do, which is a slow, methodical exercise.<br /><br />Since the problem cannot wait, we have already increased our database capacity to a level that should handle the load.<br /><br />We also opened a Severity 1 ticket with Azure Engineering, and they came back with a few worthwhile suggestions to explore in the area of database statistics updates, reindexing, and modifying our MAXDOP (maximum degree of parallelization) in the databases. We are working through all their suggestions.<br /><br />More to come, if needed. Let's see how the next 24 hours goes.</p>