Sunday, November 16, 2008

Sinatra speaks a different language, Perl

So, if it has become obvious to some, I have a thing for web frameworks. I like to understand how and why they work, and for me that usually involves more than just reading the source code. My first attempt was different variations of Puddy -- Rails/Merb like framework in Perl. The second iteration has led me down the path of the Ruby Sinatra framework.

Sinatra is a minimalist framework for creating web applications. Its scope really only extends into realm of controllers. It does support views, but it is still far from the enormous support that Rails would provide in options. The controller aspect allows you to define the route in method call of the action -- often causing your entire application to be just one file. I have used it to experiment with building APIs for some of my projects.

In short, its a light, quick, and not at all a memory hog, so of course I wanted to dissect it. This lead me to just reading the source code, but I wanted to do more to understand how it worked. I should inform you that I do indeed know Ruby. I've used it on a daily basis for the past 2 years for my job(s) and have used well beyond just the Rails environment. Now, I have also been a big supporter of Perl, mainly because it was my first language beyond good old Q-BASIC.

With that in mind, I would like introduce a project that I have been working on Sinatra for Perl. Yes, it is a work in progress, but I have put a lot of effort into making sure that this code base was some what solid before I released it to the world. In the repository, a working example can be found and run on your local machine of the feature set.

I like the ease of of extending Sinatra in either Ruby and now Perl. I have added simple extensions to support page caching, running tasks from command line, and a simple background job server. Many of these are lacking optimizations, but I just wanted to show the ease of extending the framework.

The simplest example of a Sinatra app:


#!/usr/bin/env perl
require 'lib/sinatra.pl';

use strict;

get('', {}, sub{
return 'Hello World';
});

get(':name', {}, sub{
my $r = shift;
return 'Hello, ' . $r->params->{name};
});
A simple DSL that defines what action to perform on a route. LOVE IT!

This has been an educational experience (albeit a nerdy one) and I hope to continue on this project. I think I will probably have to go through a name change in the near feature as not to confuse people on the Ruby version.

Monday, November 10, 2008

rails plugin to white label CDNs

This is a continuation of my previous post talking about using S3 as a CDN. This post covers some of the issues that were faced with making Rails work nice with a CDN.

Rails is made to do great things, but as many before have me said, handling concurrent connections is not one of them. Once a request comes into Rails, that process (Mongrel, FastCGI, etc.) is blocked till the request is done. Actions like sending emails, transferring files, large calculations need to be pushed away from the user request into another process. There are many solutions such as BackgrounDRb, Starling, etc, which allow you to load long running tasks from blocking Rails.

The task of handling files on a remote server is always a tricky one. Each CDN has there own interface on how to interact with the files -- delete, update, move, etc. This proved to be a problem when trying to test which one would work cleanly with our setup -- widgets stored in the database, which are updated immediately to all users on the Chumby network.

I took a top down approach to the problem. Designing how I wanted the widgets to move from our servers to a another server by building the API of methods. These methods were just a skeleton and did nothing. It just allowed to me to write the code I expected instead of working around another CDN or modules API. The methods could then be filled with the appropriate code to make it work with the CDN.

Originally, the CDN of choice was S3 -- not a true CDN, but suited our purposed of unloading our servers of loading dynamic content from our servers. The necessary API calls were filled to support the functionality of S3. Its then I realized that with the top down approach that the work I had could be easily modified to work with any CDN method. I've written extensions for rsync, ssh, and just S3, but it has supported our needs to be able to test multiple environments and services quickly.

This plugin data_fu is currently a work in progress, but I hope to modify for wide spread use and finalize it.

NOTE: At the time of writing this software that there are many solutions of CDNs that handle transparent proxy and caching of content. The reason these solutions were not used because the content of widgets need to be updated instantly. Since the solution was to use vanilla CDN mechanism so we could change CDN solutions if one went down and switch over was needed immediately.

Monday, September 15, 2008

project euler repo

I have recently gotten back into the Project Euler problems. The website provides a bunch of math and logic problems that can be solved any way -- paper and pen or programming. I like the challenge every once and awhile, so I thought I might share some of my results of the code I have written to help solve the problems. There are comments in most of these. They are just quick dirty hacks to help me get the answer. Sometimes the output will not be the answer and you might need to go look for it, but the logic is there.

Saturday, September 13, 2008

new job and location

I have neglected my duties as the maintainer of this blog for a few months. Its has been for the better though. About a month ago, I left my position at Chumby and moved to San Francisco to start working at [context]. I had a great time and experience working at Chumby, but I felt that San Francisco is where I needed to be for both work, but also experience. I grew up in one of the largest (and best) cities in the world, and I missed the lifestyle that came with it. San Diego has great weather, but I hated driving everywhere.

Now I am in San Francisco. Trying to sell my car. And enjoying my new job, people, and experience.

If you are in the area, please contact me, nice to meet some readers. :)

Flash on S3

This is a continuation of my previous post talking about using S3 as a CDN. This post discusses some of the issues that occurred with hosting Flash content on S3, and the solutions for them.

Problems started to occur once Flash SWFs were loaded from the Chumby device and the Chumby website. SWFs have a built in security policy known as cross domain policies that allow the owner of a domain to specify what domains have access to the domain. Think of it as a robots.txt for Flash SWFs.

With S3 there are two ways to access content from a bucket -- AWS based URL or a CNAME from your domain that points to AWS (Amazon Web Services). When the Flash content is on S3, the Flash player looks for the crossdomain going through the AWS URL path. We setup a CNAME 'swf.chumby.com' and placed a crossdomain.xml that could be accessed via http://swf.chumby.com/crossdomain.xml and also one on the top level http://chumby.com/crossdomain.xml. This allowed to control what SWF movies could load the widgets.

Playing the SWF as a stand alone Flash movie never showed any problems. When it was loaded via http://swf.chumby.com the SWF would claim its domain to be swf.chumby.com instead of an AWS domain. From the chumby website there is a way to preview the content that will appear on your Chumby -- the Virtual Chumby. The SWF for the Virtual Chumby exists on the main chumby.com website. With the crossdomain, it was able to load the widgets from swf.chumby.com no problem, but when it wanted to send parameters to the widget a problem occurred.

Flash apparently has various sandbox models for the SWF files. This is good because allows SWF to maintain a state security and ensures your data is protected. This bit us in the ass though. Since a SWF can grant only certain (sub)domains to ability to send it parameters we had 1000s of widget that we could play, but they didn't have access to any information that made it work well within the Virtual Chumby. There were two possible solutions. Change every widget to have the code allowDomain, which would take weeks to contact 3rd party developers, countless resources, etc. The second solution is even tougher it would require moving the Virtual Chumby SWF over to the swf.chumby.com domain and updating the links to it on our website. :)

s3 as a CDN

I worked for a company that provides widgets as a primary resource for our product the chumby. These widgets are purely static content in the form of Flash SWF files and an associated jpeg thumbnail. This content is provided from our servers from both dynamic (database) and static (file servers) resources. These resources are ready to scale to certain calculated amount before we have to worry about more servers, bandwidth, etc... We try to stay ahead of the curve with growth.

The scaling numbers show that we can do one of two things -- expand our servers and utilize more bandwidth or use a CDN to provide our content utilizing caching. In short, the most cost effective solution is S3. Our content, widgets that can change instantaneously when someone uploads a new one, needs to be provided to all users with in a reasonable time. A normal CDN could take minutes-hours to propagate and take time for integration. Expanding our servers would mean more time and maintenance on our end.

The architecture we have decided is to have a two tier distribution, which will provide with redundancy for widgets. The widgets will exists on our servers in the database and on S3. Our database server is used to hold the widgets because its easy to backup, restore and replicate. With our current system, when a user uploads/updates a widget, it is saved in the database directly, so the newest version can be pushed to users as soon as it gets approved.

Transferring files to S3 has proven to be quite simple to implement. The main problem has been adjusting our architecture to adapt to external URLs. Frontend (website) facing, obviously changing URLs is pretty trivial and all browsers support cross domain loading of content.

Pushing widgets to the database is easy. A simple create/update with ActiveRecord and you're done. When a user uploads a widget, in the same POST request the file is saved to the database, so there is no delay and problems and errors with the file are reported in real time. A blocking operation for Rails, but with size limits imposed on the database, model, and web server it shouldn't be too slow.

To transfer the widgets to S3 from our database in 'real time' is a tricky part. This is a blocking that depends on factors beyond our control. The S3 servers could be done, our bandwidth pipe could be saturated with web hits so upload to outside server is slow, etc. This is a blocking operation no matter what, but one we don't want the the user to have to wait for when they upload a new widget. The solution was to push the transfer of a widget to S3 to a job server, whose main purpose is to queue long running tasks. The job server was built using BackgroundRB that integrates well with Ruby On Rails.

This post is to be continued in follow up posts. There is still so much more to cover with the problems we had with Flash and the framework built to white label CDNs.

Thursday, August 28, 2008

very accurate description

Its not to often that I read something that describes a certain type of people. This blog posting describes a particular type of programmer personality. Programmers come in many shapes, sizes, and mentalities. I am not saying this is a perfectly accurate description of who I am, but there is one paragraph that I read and was like woah! I am not going to quote the paragraph because I really think this is a blog posting all should read about me. :)

Monday, May 26, 2008

s3 logs with webalizer

I recently saw the site s3stat.com, which is a simple service that takes your s3 logs and pushes them through webalizer to get the nice graphics and stats. The service is $2/month. I thought to myself that this can surely be done for free.

An hour or so later, I think I have something that's pretty comparable in features to s3stat. I am in no way trying to put them out of business. They maintain their website and are improving the s3stat with more features. I purely wanted and needed a way to view s3 stats, and didn't want to pay for it.

This script requires that a bucket on s3 has logging enabled. Please do so before using this script, or you will just get an error that logging is not turned on. The rubygem AWS::S3 is required to run this script. Please look at the options hash for required parameters.


#!/usr/bin/env ruby
require 'rubygems'
require 'aws/s3'
require 'getoptlong'
require 'tempfile'
require 'date'

#something wrong in the s3 gem
Date::ABBR_MONTHS = Date::Format::ABBR_MONTHS

#default arguments
options = {
:access_key=>'', #the Amazon access key
:secret_key=>'', #the Amazon secret key
:bucket_name=>'', #bucket name to pull logs from
:folder_name=>'webalizer' #foldername for webalizer output
#:clear_webalizer_folder => true #delete local webalizer data
}

#establish connection the s3
AWS::S3::Base.establish_connection!(
:access_key_id => options[:access_key],
:secret_access_key => options[:secret_key]
)

#find the bucket specifying the log files
puts "Checking for logging for bucket #{options[:bucket_name]}"
if AWS::S3::Bucket.logging_enabled_for?(options[:bucket_name])
new_log = File.new('bucket_log.log', 'w+')
log_status = AWS::S3::Bucket.logging_status_for(options[:bucket_name])
puts "Processing log files"
AWS::S3::Bucket.logs(options[:bucket_name]).each do |log|
#convert the lines of amazon s3 log to CLF (Common Log Format)
log.lines.each do |line|
new_log << "#{line.remote_ip} - - [#{line.time.strftime("%d/%B/%Y:%H:%M:%S %z")}] \"#{line.request_uri}\" #{line.http_status || '-'} #{line.bytes_sent || '-'} \"#{line.referrer}\" \"#{line.user_agent}\"\n"
end
end
new_log.close()
#make sure webalizer folder_name exists
if options[:clear_webazlier_folder] && File.exists?(options[:folder_name])
Dir["#{options[:folder_name]}/*"].each{|f| puts f; File.delete(f)}
Dir.delete(options[:folder_name])
end
Dir.mkdir(options[:folder_name])
#run webalizer on current log file
webalizer_output = `webalizer -o webalizer/ -D dns.db -N 5 -F clf bucket_log.log`
puts "output from webalizer:"
puts webalizer_output
#update webalizer bucket with newest info
puts "updating webalizer to s3 bucket #{log_status.target_bucket}"
Dir["#{options[:folder_name]}/*"].each do |filename|
puts "uploading file #{filename}"
AWS::S3::S3Object.store("#{filename}",open(filename),log_status.target_bucket,{:access=>:public_read})
end
end



If you have any improvements please let me know in the comments.

Tuesday, May 20, 2008

Rails top 100

This wiki maintains a list of the top 100 websites (by Alexa ranking) that use Rails. Chumby is ranked number 67 on the list.

NOTE: This is a shameless plug because I work for Chumby doing the RoR development. :)

Tuesday, April 29, 2008

criteria for a recruiter

The most dangerous part of finding a job online is having to put your contact information out there. When I first graduated college, I posted my resume on every job website that existed. This is what the (useless) career center told me to do. The next day my inbox was flooded with so many job descriptions from companies and recruiters. To a college graduate this is the most misleading part of a job search. I thought I would have my choice of the perfect job.

Each email was an automated message sent base on the criteria that I had an email address and on certain keywords in my resume. They described competitive salaries, with great benefits, awesome work environment, and a great company. Obviously no human has ever read my resume because they would have made the immediate observation that I had just graduated. So my amazement from the emails turned to disappointment, which now has turned to annoyance.

There have been a few exceptional recruiters that have emailed me. Most recruiters follow the same tactics of trying to lure you in, which are the spamming recruiters. I would like to point out some guidelines for recruiters that would make both our lives easier.


  1. Read my resume

    It obvious to me when you haven't. When the job description has nothing to do with my previous work, experience, skills, or interests.


  2. Be Personal

    If you want to use me as a possible commission be a little more personal and address me -- not the masses. I tend to ignore the recruiters that send me and email with my name obviously copied and pasted in the Dear field. The biggest give away towards those emails is when at the end of the message ask for updated resume, desired salary, newest contact information, current employer, etc. Asking for updated information just seems really tacky to me -- especially when you have no idea what I am looking for.


  3. Describe the Job

    Tell me more than the requirements of the job. I can see they want PHP/Perl/Java/C++ experience, but what does the job entail. Will projects becoming my way willy nilly and I have to be able to organize my time? Will I be working with someone else on the same project? What is their development style (extreme|agile|pair)? A job description should describe what a typical day might be for the employee.


  4. Describe the company

    Tell me something positive about the company. Please don't copy and paste their description from the corporate profile. If I was a potential investor that might mean something, but I am potential employee. Tell me they have pick up games of basketball, allow flexible time, provide lunch, allow employees to be open about their ideas, have a turnover rate of 10 years, etc. I want to know what the employee culture is like for the company.


  5. Understand the different titles

    I have learned that an engineer, developer, programmer, and architect are totally different things. Then there are different levels from junior-senior and I-V. Understand that I might be an engineer, with experience of building a software system abstractly based off an idea someone had. More often than not I have a job description that read it was for someone who picked up PHP in 24 hours -- with no previous computer experience ever. I am a software engineer. I enjoy the challenge of creating something new and putting the puzzle together.




Two years later, after removing my resume and contact information from said job sites, I still get the same emails. My email and resume are being passed around (like wine coolers at a middle school party) to recruiter contact database everywhere. I have learned to ask to be unsubscribe from these lists.

Don't get to discouraged. There are good recruiters out there. I had one excellent recruiter ask me to fill out a questionnaire asking geeky questions of what I have done, when I started, and what I mess around with on computers plus a few logic problems. Usually that is something that you get in an interview, but thats what the recruiter should be doing interviewing you to help you find the best possible job and candidate for that job. They will either find you or you will find them.

Monday, March 17, 2008

dirty fields in ActiveRecord

Managing databases from a programming environment is always an ordeal. Rails has an ORM (ActiveRecord) that transparently provides a programming environment that maps directly to the database and generates all SQL. ActiveRecord sacrifices efficiency over ease of use in some cases. I believe one of the most neglected ones is how ActiveRecord handles updating records in a table.

When ActiveRecord updates a record in a table, it updates all fields whether or not those fields have actually been changed. In most cases this is fine, which a record is so small that setting the fields again takes a trivial amount of time. I happen to have one those rare cases where this is not a valid solution. I happen to use my database for file storage with files from 10KB to 1MB, so changing the name of the file causes the large chunk of data to be set again.

Only the field that has changed (a dirty field) should be updated. ActiveRecord has no flags for what fields have changed or not. I took the time to extend ActiveRecord to support dirty flags for fields by modifying write_attribute method to mark a hash that an attribute of a record has changed.

With the knowledge of dirty fields, generating the appropriate UPDATE statement needed to be done. The UPDATE statement is currently generated by converting all fields and their values in the an SQL assignment by the function attribute_with_quotes. I just extended this function to check that an attribute had been dirtied before it was added to the assignment statement. Since an UPDATE is syncing the database with the current model all the dirty fields are flagged as no longer being dirty.

Show code:


module DirtyAttributes
def self.included(base)
base.class_eval do
alias_method_chain :write_attribute, :dirty
alias_method_chain :attributes_with_quotes, :dirty
after_update :reset_dirty
end
end
def reset_dirty
@dirtied_attrs = nil
end
def write_attribute_with_dirty(attr_name, value)
dirtied(attr_name)
write_attribute_without_dirty(attr_name,value)
end
def dirtied(attr_name = nil)
@dirtied_attrs ||= {}
#logger.info "dirtied #{attr_name}"
@dirtied_attrs[attr_name.to_s] = true if attr_name
end
def dirty?(attr_name = nil)
@dirtied_attrs ||= {}
#logger.info "dirty? #{attr_name}"
return (attr_name.nil? && !@dirtied_attrs.empty?) || @dirtied_attrs.has_key?(attr_name.to_s)
end
def attributes_with_quotes_with_dirty(*args)
quoted = attributes_with_quotes_without_dirty(*args)
quoted.delete_if {|key,value| !dirty?(key) } unless (self.new_record?)
return quoted
end
end

ActiveRecord::Base.send(:include, DirtyAttributes)

I would just like to emphasize how easy this was to do. No more than 30 minutes. This is why I love Rails and Ruby. Yet, what gets me is why this wasn't done before. Sorry to say, but this is really one of those DUH! things that should have been implemented from start. If there is a reason that there wasn't please let me know because I am really curious.

NOTE: I have not made this a plugin yet. This code has not been tested on a production environment either.

UPDATE #1: Because of the way ActiveRecord handles method chaining with alias_method_chain there are some edge cases that need to be solved. Mainly I need to find a way to override the original define update function ActiveRecord::Base. You think it would be easy, but alias_method_chain renames functions, and since it is used to add support for 'update_at/updated_on' timestamps, it has proven to be difficult. I have found ways to make it work, but the potential for other plugins doing alias_method_chain to update could potentially cause problems. I think I will just submit a patch for ActiveRecord.

UPDATE #2: I have been able to edit the code to work correctly in all 1.2.3/2.0 version of ActiveRecord successfully. I had to learn a little more about the internals, but the final code shows that it is possible and pretty easy to do.

June 5, 2008 - It looks like Rails 2.1 is supporting dirty fields. Hurray, but its weird, I had it first. ;)

Thursday, January 10, 2008

Puddy framework ideology

In case you have no idea what I am talking about, familiarize yourselves here. I have been working on my own little mini framework for awhile. By observing some of the things that I have seen in mainly on Ruby on Rails and other frameworks and incorporating them.

The thing that I have realized about many frameworks is the integration they with so much of the web application. I am not complaining at all. Having so many different options available for you is great, but have always gotten me is the overhead of such framework. Having the kitchen sink scenario is nice, but a minimalist framework would work just as well, which allow you to pick and choose just the features that you need. Rails 2.0 have gone in this direction, and I have seen smaller Ruby frameworks like Camping and Sinatra, which are great examples of minimal work.

Yet, what I have always wanted was access to each individual module of a framework and be able to pick and choose which get used. I believe that each module should not wholly depend one another. This can always be the case, but the dependency should be minimal.

I have tried to keep this mentality with Puddy. If anyone has actually kept up with it, the modules are each individually coded, so one doesn't require functionality from another. I guess I am doing an MVC for a MVC framework.

Anyways, a quick updated on the framework. I am making progress on it. It has been a learning experience. There are some parts that I have completely written myself and there are some that I have copied from Rails verbatim, but only because they did such a good job with it. I hope to be able to create an example app real soon to actually demonstrate the framework. I really just want to show something convenient, lite, and fast.

Update: I forget to mention Merb, which seems to be a ncie framework and follows this ideology too. Just when I think I am being innovative -- someone else has already done it. ;)