loading...

txtWeb Blog

txtWeb Outage During IPL Final

 

On Sunday (5/27) as the 76th and final match of the IPL was being played, the txtWeb platform experienced an outage. 

 

The duration of the match was about 4 hrs (from 8 pm IST to 12 am IST). At around 8:27 pm we started experiencing issues where many of the user requests started experiencing delays in response or receiving no response, this lasted until the end of the match. We got alerted about this immediately as expected from our end-to-end monitoring devices. 

 

Impact:

 

- Approximately half of our user base accessing txtWeb during the match using our primary aggregator (including our primary number 9243342000 as well as several local numbers). Based on expected traffic this would be close to 125k users. (exact figures are awaited from our SMS aggregator)

 

There were two things that went wrong: 

 

Platform slowness due to database bottlenecks:

  • During the processing of a user requests, we store the users' request details and related information in a database server. Under high loads, the database writes started getting backlogged leading to building up a backlog of requests that were dropped or eventually serviced with delay. High backlog time also resulted in the db connections timing out and becoming unusable. This in turn resulted in more backlog leading to delayed user requests. 

 

Latency on outgoing messaging on our aggregator: 

  • Even when the platform recovered, our SMS aggregator was unable to send out requests on a timely fashion leading to many of the requests getting delayed. We notified the aggregator which investigated and addressed the issue. However, we lost critical time during this period and the impact was felt by the users.  

 

Learnings: 

  • We had done extensive load testing and performance tuning in preparation for the event. However due to the complexity of the ecosystem of sending and receiving the SMSes, several components were simulated. Some of the underlying assumptions broke down during actual load. For the scale we are observing we have implemented a faster write thru server architecture. In the short-term we have replaced direct database writes to a scaleable/faster db (Dynamo DB). This allows us to be able to write to the DB at high scale without facing backlogs. We are running several load tests in order identify various other scenarios.

 

  • We have built the aggregator redundancy in place to allow the requests to be piped to different channels. This will go live soon and will give us redundancy with alternate aggregators.

 

We are overwhelmed with the number of requests we got from our users on that day, seeing peaks of 18K per minute from across India. Our sincere apologies to our users who tried accessing txtWeb during the final IPL match. This was an important learning experience for us and one which will help us improve and continue to strive to delight our users.  

 


 

 

New to txtWeb?

txtWeb provides FREE sms based mobile apps for ALL mobile phones and telecom service providers. ‘Imagine Internet on SMS’ with txtWeb. Every day users make more than 500,000 requests from across India to use txtWeb apps for information and services. txtWeb is FREE. NO extra / hidden charges applicable. There is No download required. Simply SMS relevant keywords to 9243342000. Finally there is no need for an internet data plan or GPS on your phone.

 

Download List of Apps - Excel Format or PDF Format

 

Follow us on:

f_logo.png twitter_newbird_boxed_whiteonblue.pngyoutube_icon.pnglinkedin-logo.png

 

Comments (3)Add Comment

suhastech | June 25, 2012

Awesome! 18k per min is amazing. Scaling is a very good problem to have. :)

Gaurav Bhatia | June 12, 2012

Arjun, Thanks. The platform does support caching for app responses. Please see: http://www.txtweb.com/tutorial...-your-apps Feel free to post in the forum in case you have questions on how it works.

Arjun S Bharadwaj | June 12, 2012

Wow, 18k per min is a LOT of traffic. It's pretty impressive. I am curious to know if there was any delay in the response from the apps and if caching of response is done by txtWeb Platform? If there were some delays from the apps caching (on txtWeb platform) can help reduce the bottleneck. Cache can be invalidated after 30-45 sec (after each delivery is bowled). In this way the response for 9000-13500 users will be same and will improve performance. If caching is already supported then It'd be great if APIs for developers are opened up, so that we can set the time interval for cache invalidation.

busy

Gaurav Bhatia

Currently: Others

Additional Info: Other

Related Post

Enjoy Monsoons with txtWeb Apps

    In this issue, we bring you a basket of apps that can make those rainy days really special. W ..more >

Soothe your Soul with txtWeb’s Music Apps

  Shakespeare had said, “If music be the food of love, play on..” and we couldn’t agr ..more >

txtWeb recommended apps for this holiday season

  Tis the season to be jolly! Well December is already here and Christmas and the New Year is just a fe ..more >

You can bookmark it!

Lately, I have been using @omegle app a lot. To send a message, I had to precede my text with "@omegle.msg" e ..more >

txtWeb Developer Bytes -- Parth Lalcheta

    Shot to fame in App2Fame owing to his app @smsgps (which detects current location of a user ..more >

Platform and Community Updates

  When we spoke about our last platform and community updates in June this year, we promised you that t ..more >

View All