Saturday, November 24, 2007

Last post on this blog, starting my own at http://rajith.2rlabs.com/

I had some web space available for over an year and decided to use that space to run my own blog at http://rajith.2rlabs.com/.

I have imported all my previous posts into my new blog. I found Word Press easy to work with yet flexible and powerful enough to customize the way I want. I am using blueprint as my theme and found it to be very elegant. It's built on the Blueprint CSS framework. If anybody is interested you can download it from here.

Many thanks to the Blogger team for improving blogger over the last few months and I thorughly enjoyed my blogging experiance here. I like to be in control of my everyday life, including blogging. Hence my move to my own thing.

Friday, November 16, 2007

Scaling your system - What I learnt from Dan Pritchett's (eBay) talk

I was lucky enough to attend both talks given by Dan Pritchett during the Colorado Software Summit this year. The first one was, "You Scaled your what". If you want to know what scalability is, here is an excellent introduction by Werner Vogels. After listening to what Dan had to say, the following points stood out in my mind
  • You need to understand and take complete control of your architecture. Read my post on Architecture is your responsibility for more thoughts on this.

  • You need to have scalability in mind right from the beginning. Trying to achieve scalability later can be time consuming and very costly. Quoting from Werner Vogels post
  • Why is scalability so hard? Because scalability cannot be an after-thought. It requires applications and platforms to be designed with scaling in mind, such that adding resources actually results in improving the performance or that if redundancy is introduced the system performance is not adversely affected. Many algorithms that perform reasonably well under low load and small datasets can explode in cost if either requests rates increase, the dataset grows or the number of nodes in the distributed system increases.
  • Transactional scaling is just one dimension. You need to think about Scalability of data, operational,deployment, power ..etc. This is a minimalistic set. Try to figure out what dimensions are important to your organization.

  • All scalability dimensions are related and impacts each other. Any dimension ignored can could evolve into a problem for your application

  • Prefer vertical over horizontal scaling. Vertical scaling is better for your vendors and is not a viable long term strategy. There is so much you can get by increasing memory, CPU etc..

Transactional Scaling
Usually measured in TPS and is a traditional indicator for application performance.

  • Keep asking the question "How long can the business survive?" based on,

    • Time-to-live on current resources.

    • Time-to-live on maximum plausible configuration.

  • These metrics should be taken regularly to anticipate possible production bottle necks and identify issues before they become a crisis.

Data Scaling
How well does your data scale? Think about,

  • Functional Decomposition, group data by logical relationships, business importance,transactional volumes etc.

  • Think about partitioning data (sharding).

  • Is all data equally important? prioritizing your data and allocating resources accordingly will help you scale better.

Operational Scalability
How hard is it to run your software? Operational scalability is a software problem and you need to think about operational concerns right from the beginning. Pay attention to,

  • Logging metrics, Monitoring.

  • Controlling/updating/tuning live apps without disrupting traffic.

Deployment Scalability
You need to design/architect your systems while keeping the following in mind,

  • Ability to do incremental roll outs (and rollback if there are problems) without disrupting live traffic.

  • Managing component dependencies during deployment without disrupting live traffic.

  • Your architecture shouldn't assume or decouple itself to any hardware,network topology or data center topology. This allows you to take advantage of new hardware, network topologies ..etc without significant changes.

Power Scalability
Power can be a limiting factor in a data center and may put bounds on transactional scaling.

  • How efficient is your software?, wasted clock cycles == wasted watts.

  • Consider vitalization for best utilization of your hardware resources.

Some good tips I managed to note down

  • Run old and new schema parallelly and then take out the old schema after a while when you gain enough confidence.

  • Prioritize services, willing to take a hit on certain services over others.

  • Incremental rollouts is a very good way to roll out new features while managing risk and also prevents taking the system offline.

  • Schedule deployment during working hours instead of weekends/nights as this enables your developers, support staff to attend to problems while they are alert and without being distracted by non work issues.If you do incremental rollouts this is possible as you are not disrupting traffic.

Wednesday, November 14, 2007

The value of principled design - REST is just one example

To me the value of Roy Fielding's dissertation goes beyond REST. Steve Vinoksi summarised it very well in one of his comments while answering a comment I made on his blog.
It’s(Roy's dissertation) not really primarily about REST; rather, it’s about principled design. Much of his dissertation is about architectural elements, principles, constraints, properties, and the relationships between them all. REST is used as a very clear example in chapter 5 of what principled design is all about.
Why can't we use a principled design approach when we do SOA or for that matter any other architecture?

When we add contraints or relax constraints we induce certain properties in our architecture. As an architect you make an educated desision as to what constraints make sense in your environment and what doesn't. When designing systems don't we go through decisions like "should we make these services stateless or statefull ..etc" during our design meetings ?

I think in what ever system you design as an architect you should think through and note down the constraints you want to impose on your system. This will provide a proper foundation to your system and an excellent guideline to your developers which will clearly communicate the desired goals of your system. Then later on when somebody else wants to relax any of these constraints or add more constraints they already have a guideline and can see how the "relaxing of an existing constraint" or the "addition of a new constraint" can impact the overall system.

REST is just a name coined by Roy to identify a set of constraints, and they are not the only constraints, nor the best combination of constraints in every situation. As Steve mentioned Roy spends the first few chapters providing an excellent analysis about "architectural elements, principles, constraints, properties, and the relationships between them all" and of course the value of a principled design approach.
To me the value of Roy's thesis goes beyond REST and I hope most people would realize the same.

Thursday, November 01, 2007

Global Warming ...is it really?

Dwight Hornbecker, the only geologist I've even known, had some very interesting facts about global warming. They sound kinda crazy, but I wonder if they are true. Here is what he told.

  • One volcanic eruption can contribute to global warming more than what humans can do in an year
  • The earth was actually warmer than what it is now.
Dwight went onto explain that there is evidence to support that, a few million years ago the earth was a lot more warmer than what it is today. Then came the ice age. And he maintains that we are still recovering from the ice age and earth is slowly returning back to it what it was. Not sure whom to believe. Is it Al Gore and his crew or Dwight?

All I know is that humans haven't really figured it out yet. We think we do and try to mess around with nature, but I don't think we are even close at guessing/figuring out the real situation.

Monday, October 29, 2007

Architecture is YOUR responsibility

I read this post on Steve Vinoski's blog that quoted Ron Schmelzer of ZapThink, who makes an excellent point. "Architecture is YOUR responsibility". Well guess what, as much as vendors would like to say it is not, the reality is that you need to make the critical decisions about the architecture. Instead of some vendor, you need to be in charge of the direction and overall vision in terms of the architecture. Instead of choosing a vendor/product and building your strategy/architecture around it, you need to think through your strategy/architecture and choose the right vendor/product that can help you achieve your vision. If anybody was lucky enough to attend a talk given by Dan Pritchett (eBay), you would have realized that companies who understood this reality and took responsibility for the architectural decisions eventually made it big.

Dan's comments on architecture was very insightful (I want to write a separate post on what I learned from his talk at the recently concluded Colorado Software Summit). The underlying truth of everything he said, was that they understood and took responsibility for the architectural decisions they made, instead of relying on some vendor to provide direction and overall vision.

There is no vendor out there, that can provide you with some ESB that can magically transform your enterprise into a SOA platform or some messaging middleware that can help you scale your enterprise to whatever limits you want unless you know what you are doing and take ownership of the overall vision. You need to understand the overall architecture, make decisions and take responsibility for them. An ESB or a messaging middleware are merely a bunch of tools that help you get there or in other words they are just a means to an end not the end itself.

There is no framework out there that can force architectural decisions on your solutions that you are not willing to make yourself. During my REST in peace talk, there was a surprising number of folks who asked me about a framework that can help them develop RESFTful services. Guess what, the road to a RESTful approach (or for that matter any architectural style) starts with the architectural decisions you make (the way you think/design your services) and not with some framework where you have to flip a switch or use a bunch of annotations that turns your code into a RESTful service. That is precisely why the contract first approach is recommended over a code first approach when you do web services. You need to think about how you design your service first and then use some framework to generate your WSDL and your code from that, not the other way around.

We all remember how the EJB mania deceived us. Many companies paid millions of dollars to App Server vendors to solve their architectural problems. The whole notion of "you only need to think/write the business logic, and we will take care of the remoting, transactions, persistence, scalability ..etc" was just an illusion. Neither did it preclude people from making extremely stupid architectural decisions nor did it provide anymore scalability than the simple tomcat web server for most of the use cases.

You need to think carefully about the architectural decisions you make and understand the impact it has on the overall goals/vision of your enterprise. You need to be aware of operational, load, managerial and geographical scalability from day one. You cannot offset your lack of architectural vision by using some framework, product or vendor. It will only make your vendor happy, but not your customers.

Competition in open source is healthy

I read Sanjiva's post on the $subject and like to add my own observations. It is true that the Apache Web Server is the statue of liberty that stands tall among all the commercial web servers out there and has no peers in open source. However that is an isolated use case. We need to think more pragmatically. The predictions are such that, most companies, in the future will have some form of involvement with Open Source. Naturally there will be some form of competition, and it is unavoidable. But the more choice (open source or otherwise) a user has the better it gets, especially if there are several open source alternatives instead of one. If there is only one open source alternative and the user ends up with a bad experience with that solution, it can color the perspective of how that company, will look at open source in general.

Competition provides choice and facilitates continuous growth and innovation in open source solutions. It drives a community to be more responsive and responsible towards it's end users. This results in better support in the form of fixing bugs or answering questions on the list. Bcos if you are not growing or innovative or if you are not responsive or responsible towards the end users then they will look elsewhere. One could argue that there are companies that provide support. However one should not forget that these companies are built on top of the community and rely heavily on the community for it's success. And any fixes/patches they make usually go upstream. Companies that don't usually have problems and fade away.

Sometimes you would find that some community members are unhappy with the current direction of a project and they go ahead and form another project. The difference in direction or focus is perhaps an integral part of the evolutionary process. Some of these projects eventually create a company behind it. One could also argue that these companies fragment a community and promote competition. As long as this competition is both ethical and within the norms of standard industry practice, then the end users benefit from it. Why?? Bcos these companies will drive innovation, creativity and quality of the solutions they support, as their business model is based on it.

Therefore some form of competition that is ethical (not mud slinging or cut throat competition) is healthy for making open source a viable option in enterprise software. The process of evolution will weed out inferior solutions and ensure the survival of the fittest. However this should not be based on how much marketing muscle a project/company behind it has, but rather be based on the community aspect and technical merits.

Saturday, October 13, 2007

AMQP in 10 mins : Part4 - Standard Exchange Types And Supporting Common Messaging Use Cases

AMQP defines four standard exchange types (routing algorithms) that covers most of the common messaging use cases. All AMQP brokers are required to support each of these exchange types and pre declare an instance of it identified by a standard name. The idea is to provide a simple out-of-the-box solution to most users. Users are free to create more instances of these exchange types with their own names. Also as mentioned in the previous post, users can create different exchange types and instances of them.

It is important to note that with any exchange type, a message can be matched with more than one queue if two or more queues are bound with the same routing criteria.

Direct Exchange
The exchange does a direct match between the routing key provided in the message and the routing criteria used when a queue is bound to this exchange.






(Click on image)

The most common use case is to bind the queue to the exchange using the queue name. However it is important to note that you could use any value for the binding.

A broker is required to provide an instance of this exchange named "amq.direct". The Nameless Exchange is a special instance of the above exchange type where all queues are bound to this exchange automatically using the queue name as the routing criteria. This exchange instance has no public name, hence messages sent without specifying an exchange name are directed to this exchange.

Topic Exchange
The exchange does a wildcard match between the routing key and the routing pattern specified in the binding. The routing key is treated as zero or more more words, delimited by '.' and supports special wildcard characters. "*" matches a single word and '#' matches zero or more words.






(Click on image)

A broker is required to provide an instance of this exchange named "amq.topic".

Fanout Exchange
Queues are bound to this exchange with no arguments. Hence any message sent to this exchange will be forwarded to all queues bound to this exchange.


(Click on image)
  • One use case, is to use exchange chaining in a tree like hierarchy that can be used to push messages to a large number of subscribers.
  • Another use case is where a direct exchange or a topic exchange can do the initial filtering which then forwards the message to a fannout exchange which will push the messages to all it's queues.
A broker is required to provide an instance of this exchange named "amq.fanout".

Headers Exchange
Queues are bound to this exchange with a table of arguments containing headers and values (optional). A special argument named "x-match" determines the matching algorithm, where "all" implies an AND (all pairs must match) and "any" implies OR (at least one pair must match).


(Click on image)

A broker is required to provide an instance of this exchange named "amq.match".

How AMQP Supports Common Messaging Use Cases
The most common messaging use cases are point-to-point (or store and forward) and publisher/subscriber models. These models can be easily built on top of AMQP.

Point-to-Point
routing_key == queue_name

Pub/Sub
routing_key == topic_heirarchy_value

Next Part : Part5 - Lets look at some code - Python examples

Prev Part : Part3 - Flexible Routing Model

AMQP in 10 mins : Part3 - Flexible Routing Model

Background
Most pre-AMQP models had several issues with their routing models.
  • Opaque routing models that were not explicitly defined.
  • Since the semantics are not visible or explicit manipulating the routing model through the protocol was difficult.
  • Rigid monolithic routing engines that had limited or no extensibility or compose-ability.
The AMQP Routing Model
One of AMQP 's primary goals was to define a flexible, extensible and transparent routing model where the semantics are explicitly defined. This permits the definition of management commands to manipulate the routing model. The AMQP model consists of three components
  • Exchange
  • Queue
  • Binding
AMQP defines a set of rules on how to compose these components in to processing chains. The routing model is analogues to how email works. The following diagram illustrates the routing model from a publisher and consumer's point of view.








(Click on image)



Exchange
This is analogues to a Mail Transfer Agent. Queues (or other exchanges) are bound to an exchange using a 'Binding'. A publisher sends a message to an exchange. The exchange will accept the message and routes it to one or more queues (or another exchange) based on the bindings. An exchange completely decouples a publisher from queues and the consumers that consumes from those queues.

An exchange type defines a routing algorithm to match the bindings with a given message. Hence an exchange type represents a class of routing algorithm. An instance of an exchange type can be thought if as an instance of a routing algorithm. A broker can have multiple instances of an exchange type which are identified by there name. An exchange instance can have the following properties.
  • Durable/Temporary
  • Auto-Delete
Queue
This is analogues to a mail box. A queue will store the messages in memory or disk and deliver them to consumers. A queue binds itself to an exchange using a 'Binding' which describes the criteria for the type of messages it is interested in. Queues can have the following properties,
  • Durable/Temporary
  • Shared/Private (exclusive)
  • Auto-Delete
Binding
This is analogues to a Routing Table. A binding defines the relationship between an exchange and a queue. In other words it defines the routing criteria. The most simple case is where the binding equals the queue name. A binding decouples a queue from an exchange. The same queue can be bound to any number of exchanges using the same criteria or different criteria. Different queues can be bound to the same exchange using the same routing criteria as well.

Routing Key
Is a special field (Header) present in the Message Delivery Properties. It can be thought of as a virtual address, analogues to a 'To' field in an email. An exchange may use this field to route a message. The standard exchange types defined in AMQP use the routing key in different ways to route messages.

Standard Exchange Types
AMQP defines several standard exchange types that are described in detail in the next blog entry.

Extending The Routing Model
One can define new exchange types with arbitrary routing criteria (routing algorithms). For example one can define an exchange that routes messages based on content (content based routing). Thus AMQP provides a standard way of extending the routing model without impacting interoperability.

Next Part : Part4 - Standard Exchange Types And Supporting Common Messaging Use Cases

Prev Part : Part2 - Achieving Interoperability And Avoiding Vendor Lock-in

Friday, October 12, 2007

AMQP in 10 mins : Part2 - Achieving Interoperability And Avoiding Vendor Lock -in

Background
One of the key issues with any software is non-interoperability and vendor lock in. Most messaging systems prior to AMQP did not interoperate with each other. For example messages from Tibco's Rendezvous couldn't be routed through IBM's MQSeries. If two messaging systems need to be connected, there are two options.
  • Using a message bridge you could convert from one format to the other. However a bridge would be slow as the conversion adds latency. Also you would need to understand the wire format of each of those systems.
  • Replacing one system with the other, which is costly and risky. Downtime can have a severe impact on the company's revenue model.
Therefore once a messaging system is chosen, users are reluctant to change and locked in with the same vendor while spending large sums of money as licensing costs.

What if we have messaging systems(from different vendors) that can understand each other? If so connecting two messaging systems or replacing one system with the other can be done with minimum costs and risk. Since the semantics (behaviour) are the same the chance of something going wrong is relatively low. This is a key goal for the AMQP protocol.

So what does it take to achieve interoperability and avoid vendor lock-in?
  • All brokers need to behave the same way
  • All clients need to behave the same way
  • Use a standard for commands on the wire
  • Use a language neutral type System
  • Use open standards and permit royalty free usage of such a protocol.
AMQP satisfies the above requirements by
  • Defining a network wire-level protocol
  • A defined set of messaging capabilities (The AMQP Model)
  • A simple language neutral type system
  • Using open, existing, unencumbered, widely implemented standards
  • Providing royalty free usage of the protocol
Broker semantics are defined explicitly. One can partially imply the semantics from the wire-protocol. However we need to define the semantics explicitly in order to guarantee exact behaviour in each broker/client implementation. So the protocol defines a set of commands to manipulate state in a peer. These commands are grouped by functionality into classes. For example the Queue class has various methods to manipulate state within a broker.
Ex. queueDeclare, queueBind, queuePurge, queueDelete & queueQuerry

More details on the wire protocol will be discussed later on. The next few posts will focus on discussing the semantic model.

Next Part : Part3 - Flexible Routing Model

Prev Part : Part1 - Introduction