Some Background

At Wayfair, we are working on a next generation of systems to power our business. The decade old  systems that currently keep us running in stride have allowed Wayfair.com to vault from nothing to where it is today. But as with all systems, they have started to show their age.

Up until just a few years ago our entire engineering team was less than 40 people.  At this team size, we could all easily work on a single shared code base that our entire business ran on. This monolithic architecture was a conscious decision given our bootstrapped startup beginnings.

The good news is that the ‘single code base’ approach successfully bootstrapped our way to millions of customers and products, 400 million dollars in revenue, and ‘a zillion things home’ (wink) over 9 years. We’re proud to say that we accomplished all of this with a small, sharp team of engineers.  We are a successful startup, and it is only a matter of time until any startup needs to evolve through technical step function changes. For Wayfair.com, that time is now.  You can call it technical debt, re-architecture, version 2.0, whatever you want.  The reality is we need to change our approach. If we want our large business to grow even more, we need to have teams who work in parallel, and in order for that to work, we need to decouple and distribute our organization.

One of our initial steps was to reorganize the Engineering department into ‘platform teams’ – small and efficient groups of domain-specific spec op developers. These platform teams are responsible for the maintainability, scalability, and evolution of discrete segments of the business.  We are excited about this change, and can already see the scaling advantages to this organization model.

This change has exposed an issue that gets to our core software architecture.  Prior to the Great Platforms Restructure of 2012, we could think of our code base as one big system that was either working or not working . The complexities of graceful degredation and multi modes of failure could be ignored. But no more! We need to ‘un-ignore’ these complexities and take them into account as we develop.

We can look around at all sorts of large internet success stories and see the obvious solution is to move to a distributed services architecture.  The high level plan turns out to be the easy part.  The hard part is the detail.

Look at What Others Have Done

There are two leading paradigms of web services in production today:  SOAP (Simple Object Access Protocol) and REST (REpresentational State Transfer).

SOAP

SOAP is tantamount to Remote Procedure Call (RPC), a concept in computer science that allows a machine to invoke a function or subroutine on a separate machine. With RPC, programmers create innumerable specific procedures that are accessible at any node in a distributed system – a truly attractive concept. Despite the physical gap between two machines, RPC hides the ‘distributed’ part of a ‘distributed system’ through a highly malleable, customized protocol. One downside to this customized protocol is the tight coupling of the client and service, which reduces serendipitous reuse and impedes total scalability of a system.

We all felt that SOAP was too heavy weight and came with too much overhead. Plus, what is this, the 20th century? Just kidding. Wayfair is a fast-paced business, so this client-service coupling could come at a very high price. Since these interfaces are specialized, they would constantly need to change! ‘RPC impedes total scalability of a system’ is a bold conclusion to make, but also is a conclusion partially provided to us by the largest distributed system known to man-kind: the World Wide Web.

REST

REST is an architectural style that characterizes what makes the WWW the world’s largest distributed system.  To our team, it felt lightweight and approachable. We did, though, struggle with the notion of a resource in the realm of our application, and keep in mind that resources are core to REST architectures.

Any information that can be named can be a resource: a document or image, a temporal service (e.g. “today’s weather in Los Angeles”), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author’s hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.

-Fielding

With this in mind our initial reaction to implement REST, given our database centric history, was to make every database table a resource.  Historically we’d been extremely thoughtful about our database structure.  We had very clean data, with very minimal duplication.  However, we’ve got 2 Million lines of code, and over 6,000 database tables.  ‘Resource’-ing our application with a relational database attitude quickly started to feel overwhelming and misdirected.  It quickly became apparent that we do a ton of complex joins in SQL. So a resource-to-table approach would mean we’d be trying to re-produce the highly tuned join capabilities of the best SQL databases in PHP.  Not a good idea.

Go with your Gut

Our thinking was something to the effect of:  Let’s start by putting our core functionality (mainly Classic ASP functions and Stored Procedures in SQL Server) behind URLs.

We’ll exemplify with one of the first services we created, the inventory service.

Millions of times a day we will lookup stock status of products in our catalog.  The basic use-case is given a SKU, return stock status.  Our service-less version of this was done as a SQL Procedure called like this:

EXECUTE spGetInventory @SKU=’TH1254’

This methodology that we grew comfortable with is super simple to use as in-line code.

Our first pass at moving this behind a service looked like this:

http://services.wayfair.com/get_inventory.php?sku=TH1254&format=json

At this point we were pretty proud of ourselves.  We felt like we were moving in the services direction. We agreed, as a team, that any services would support some common query string parameters like: format, debug, verbosity, logging, help, etc…

Fast forward a few months and we start to adopt this approach in earnest.  As we move from talking directly to the database, to talking through a service we find ourselves basically wrapping what had previously been stored procedures.  This starts to look like this:

http://services.wayfair.com/create_customer.php
http://services.wayfair.com/get_customer.php
http://services.wayfair.com/update_customer.php
http://services.wayfair.com/delete_customer.php
http://services.wayfair.com/create_basket.php
http://services.wayfair.com/get_basket.php
http://services.wayfair.com/update_basket.php
http://services.wayfair.com/delete_basket.php
http://services.wayfair.com/create_order.php
….

You get the gist of it.  We provide the standard CRUD operations through URLs. However, our business is complicated, so on top of the CRUD operations we start doing things like this:

http://services.wayfair.com/does_order_have_returns.php
http://services.wayfair.com/is_order_cancellable.php
http://services.wayfair.com/apply_promotion_code.php

On one hand, we were happy because we were clearly separating our data storage from our application logic. We were creating a path where we could at some point change our database structure without affecting the application directly.  On the other hand, we weren’t actually making the code any simpler, improving the organization, or separating of functionality into more discrete areas.  We were just moving complexity to different areas.

So we decided to give our services a pinch of RESTful-ness.

One of the stages of adopting something new seems to always involve fighting to make it look like the old thing you were comfortable with. Refer to our first step towards a more RESTful approach in our past blog: “REST, REST, REST, you can never get enough REST”.

The path we were on was “REST like” because it sent HTTP messages, was stateless, and returned loosely structured data. We were still not happy that we were on the right long term path with our “REST variant”. We did not comply to the RFC that formally describes HTTP so we weren’t RESTful.  We then re-visited our approach and as we constantly do in software engineering, we evolved.

We have evolved

At some point ‘comfortable’ paradigms wear you out, and you have that ah-ha moment of clarity where you start to understand a different paradigm.

We’re happy to say we’re over the hump of fighting some of the “pure” concepts of REST and chose to imbibe them.  In particular, we’ve made a few very important changes from our home-grown REST approach.  We’ve embraced HTTP and are using a broader set of the functions it was intended to enable.  Namely we’ve standardized on:

  1. Identification of resources in our system (Resource Addressing): We have started to create resources that conform to our business domain. E.g. Orders, Order Products etc. That is, resources have started to represent our business domain objects and business logic.
  2. Uniform Interface (GET, POST, PUT, DELETE): We are conforming to the boundaries provided to us by HTTP/1.1
  3. HATEOAS (Linking it all together):  Resources are now defined across platform boundaries and links to these resources insures that we are not duplicating data when there isn’t any need (e.g. Order representation links to customer resource representation).
  4. Self-descriptive messages (Headers, Response Codes): Utilizing properties of HTTP allows our messages to describe themselves.