Saturday, September 13, 2008

s3 as a CDN

I worked for a company that provides widgets as a primary resource for our product the chumby. These widgets are purely static content in the form of Flash SWF files and an associated jpeg thumbnail. This content is provided from our servers from both dynamic (database) and static (file servers) resources. These resources are ready to scale to certain calculated amount before we have to worry about more servers, bandwidth, etc... We try to stay ahead of the curve with growth.

The scaling numbers show that we can do one of two things -- expand our servers and utilize more bandwidth or use a CDN to provide our content utilizing caching. In short, the most cost effective solution is S3. Our content, widgets that can change instantaneously when someone uploads a new one, needs to be provided to all users with in a reasonable time. A normal CDN could take minutes-hours to propagate and take time for integration. Expanding our servers would mean more time and maintenance on our end.

The architecture we have decided is to have a two tier distribution, which will provide with redundancy for widgets. The widgets will exists on our servers in the database and on S3. Our database server is used to hold the widgets because its easy to backup, restore and replicate. With our current system, when a user uploads/updates a widget, it is saved in the database directly, so the newest version can be pushed to users as soon as it gets approved.

Transferring files to S3 has proven to be quite simple to implement. The main problem has been adjusting our architecture to adapt to external URLs. Frontend (website) facing, obviously changing URLs is pretty trivial and all browsers support cross domain loading of content.

Pushing widgets to the database is easy. A simple create/update with ActiveRecord and you're done. When a user uploads a widget, in the same POST request the file is saved to the database, so there is no delay and problems and errors with the file are reported in real time. A blocking operation for Rails, but with size limits imposed on the database, model, and web server it shouldn't be too slow.

To transfer the widgets to S3 from our database in 'real time' is a tricky part. This is a blocking that depends on factors beyond our control. The S3 servers could be done, our bandwidth pipe could be saturated with web hits so upload to outside server is slow, etc. This is a blocking operation no matter what, but one we don't want the the user to have to wait for when they upload a new widget. The solution was to push the transfer of a widget to S3 to a job server, whose main purpose is to queue long running tasks. The job server was built using BackgroundRB that integrates well with Ruby On Rails.

This post is to be continued in follow up posts. There is still so much more to cover with the problems we had with Flash and the framework built to white label CDNs.


Roger said...

Hi jt,

Interesting post on using a CDN but I am not sure what 'normal' CDN's you have used/tested/spoken to that take minutes/hours to propigate your files. The whole point of a CDN is speed and redundancy, this includes pulling from origin and caching on the edge of the network. Any CDN worth its salt delivers from origin and propigates its edge cache as fast if not faster than either your existing setup or from S3. You can also employ origin storage or permanent cache services, naturally these add to the costs.

From your description of using S3 as a 'CDN' it sounds like this process is taking you minutes/hours to setup. Whereas with a CDN you just create a CNAME that the CDN assigns to their system and then add that CNAME to any Object URLS that you want the CDN to deliver and cache for you.


Anonymous said...

you are a rockstar programmer

Pooran said...

i dont understand why u were facing problem with s3, but 1 thing is very sure, u can not use s3 as CDN atleast at this level of development.