The SQL Developer’s Guide to REST Services

This is a practical guide (with lots of examples) to help SQL developers quickly learn the basics of RESTful Web Services.

Data Storage: Tables versus Resources

Both SQL and RESTful Web Services are centered around data.

In SQL, data is normally stored in tables, but in REST Services it is stored in resources.

For example, in a database you could have a customer table:

SQL> SELECT * FROM customer;

ID FIRST_NAME LAST_NAME OCCUPATION
-- ---------- --------- -----------
1  Luke       Skywalker Jedi Master
2  Leia       Organa    Princess

In REST Services, you would have a /customers resource instead of a customer table.

For example, if you want to get all customers (similar to the SQL statement above), you do it like this:

GET /customers

The response to this request would be a JSON array with an object for each customer:

[
  {
    "id": 1,
    "firstName": "Luke",
    "lastName": "Skywalker",
    "occupation": "Jedi Master"
  },
  {
    "id": 2,
    "firstName": "Leia",
    "lastName": "Organa",
    "occupation": "Princess"
  }
]

CRUD Operations

The HTTP methods, which are used for RESTful Web Services, map neatly to the common SQL statements:

CRUD Operation HTTP Method SQL Statement
Create POST INSERT
Read GET SELECT
Update PUT PATCH UPDATE
Delete DELETE DELETE

The following sections will explain each of them in more details.

Create

To create a new customer, you use the INSERT statement in SQL. For example:

INSERT INTO customer (first_name, last_name, occupation) 
     VALUES ("Han", "Solo", "Smuggler");

In REST, you create a new customer by sending a POST request with the new customer as a JSON object:

POST /customers

{
  "firstName": "Han",
  "lastName": "Solo",
  "occupation": "Smuggler"
}

Read

To read data in SQL, you use the SELECT statement.

For example, to get a complete list of all customers, you simply call:

SELECT * FROM customer;

The corresponding HTTP command is GET, which you can call like this to the same result:

GET /customers

If you want to lookup a specific customer using the primary key, you would do it like this in SQL:

SELECT * FROM customer WHERE id = 2;

In REST you would append the id to the REST resource:

GET /customers/2

But what if you want to lookup something using a non-primary key?

In SQL you would just add a WHERE clause to your SELECT statement:

SELECT * FROM customer WHERE first_name = "Luke";

In REST, you append a query parameter to the GET statement:

GET /customers?firstName=Luke

Note: The specific query parameters available depend on the REST service you are using.

You may want to limit the number of fields returned by a query, because you don’t need to display all the fields, or because you want to improve performance.

In SQL, you just specify what columns should be returned:

SELECT first_name, last_name FROM customer;

In REST, you request a partial response:

GET /customers?fields=firstName,lastName

Note: Partial responses are not available in all RESTful Web Services, but usually in those where performance is key. For example, mobile apps that may need to operate in an environment with limited bandwidth.

Update

If you want to update all columns on a customer via SQL, you use the UPDATE statement:

UPDATE customer
   SET id = 2, 
       first_name = "Leia", 
       last_name = "Organa", 
       occupation = "General"
 WHERE id = 2;

In REST, you do the same by using the PUT method:

PUT /customers/2

{
  "id": 2, 
  "firstName": "Leia", 
  "lastName": "Organa", 
  "occupation": "General"
}

But what if you only want to update some of the fields?

In SQL you simply limit the fields to those you want to update:

UPDATE customer
   SET occupation = "General"
 WHERE id = 2;

In REST, you use the PATCH method:

PATCH /customers/2

{
  "occupation": "General"
}

Note: The difference between PUT and PATCH is that PUT must update all fields to make it idempotent. This fancy word basically means that you must always get the same result no matter how many times it is executed. This is important in network traffic, because if you’re in doubt whether your request has been lost during transmission, you can just send it again without worrying about messing up the resource’s data.

Delete

If you need to delete a customer, you use the DELETE statement in SQL:

DELETE FROM customer WHERE id = 2;

Similar, in REST you use the DELETE method:

DELETE /customers/2

That’s it! This is my attempt to map the key concepts in RESTful Web Services to the corresponding key concepts in SQL. If you understand these, you already got a pretty good headstart towards learning REST Services.

What are RESTful Web Services?

To put it mildly, the World Wide Web was an unexpected success.

What had started out as a convenient way for research labs to connect with each other suddenly exploded in size. Jakob Nielsen estimated that between 1991 and 1997 the number of web sites grew with a staggering 850% per year each year!

This incredible growth worried some of the early web pioneers, because they knew that the underlying software was never designed with such massive amount of users in mind.

So they set out to define the web standards more clearly, and enhance them so that the web would continue to flourish in this new reality where it was suddenly the world’s most popular network.

One of these web pioneers was Roy Fielding, who set out to look at what made the internet software so successful in the first place and where it was lacking, and in his fascinating PhD dissertation he formalized his findings into six constraints, which he collectively called REpresentional State Transfer (REST).

Fielding’s observation was that if your architecture satisfies these six constraints then it will exhibit a number of desirable properties (like scalability, decoupling, simplicity), which are absolutely essential in an Internet-sized system.

His idea was that the constraints should be used as a checklist to evaluate new potential web standards, so that poor design could be spotted early, and way before it was suddenly deployed to millions of web servers.

He successfully used the constraints to evaluate new web standards, such as HTTP 1.1 (where he was one of the principal authors) and URI (where he was also one of the authors). These standards have both stood the test of time, despite the immense pressure of being essential protocols on the web and used by billions of people each day.

So a natural question to ask is that if following these REST constraints lead to such great systems, why only used them for browsers and web sites? Why not also create web services that conform to them, so we can enjoy the desirable properties that they lead to?

This thinking led to the idea of RESTful Web Services, which are basically web services that satisfy the REST constraints, and are therefore well-suited for Internet-scale systems.

So what are these 6 REST constraints?

1. Client-Server

The first constraint is that the system must be made up of clients and servers.

Servers have resources that clients want to use. For example, a server has a list of stock prices (i.e. a resource) and the client would like to display these prices in some nice graphs.

There is a clear separation of concerns between the two. The server takes care of the back-end stuff (data storage, business rules, etc.) and the client handles the front-end stuff (user interfaces).

The separation means that there can be many different types of clients (web portals, mobile apps, BPM engines, etc.) that access the same server, and each of these can evolve independently of the other clients and the server (assuming that the interface between the clients and server is stable).

The separation also seriously reduces the complexity of the server, as it doesn’t need to deal with UI stuff, which improves scalability.

This is probably the least controversial constraint of REST as client-server is so ubiquitous today that we almost forget that there are other styles to consider (like event-based protocols).

It important to note that while HTTP is almost always used when people develop RESTful Web Services, there is no constraint that forces us to use it. We could use FTP as the underlying protocol, if we really wanted. Even though intellectual curiosity is probably the only good reason for trying that.

2. Stateless

To further simplify interactions between clients and servers, the second constraint is that the communication between them must be stateless.

This means that all information about the client’s session is kept on the client, and the server knows nothing of it (so no cookies, session variables, or other naughty stuff!) The consequence is that each request must contain all information necessary to perform the request (i.e. it cannot rely on any context information).

The stateless constraint simplifies the server as it no longer needs to keep track of client sessions, resources between requests, and it does wonders for scalability because the server can quickly free resources after requests have been finished.

It also makes the system easier to reason about as you can easily see all the input data for a request and what output data it resulted in. You no longer need to lookup session variables and other stuff that makes the system harder to understand.

In addition, it will also be easier for the client to recover from failures, as the session context on the server has not suddenly gotten corrupted or out of sync with the client. Roy Fielding even goes as far as writing in an old newsgroup post that reliance on server-side sessions is one of the primary reasons behind failed web applications and on top of that it also ruins scalability.

So far nothing too controversial in the constraints. Many RPC implementations could probably satisfy both the Client-Server and Stateless constraints.

3. Cache

The last constraint on the client-server communication is that responses from servers must be marked as cacheable or non-cacheable.

An effective cache can reduce the number of client-server interactions, which contributes positively to the performance of the system. At least, from a user’s point of view.

Protocols, like SOAP, that only uses HTTP as a convenient way to get through firewalls (by using POST for all requests) miss out on the improved performance from HTTP caching, which reduces their performance (and also slightly undermines the basic purpose of a firewall.)

4. Uniform Interface

What really separate REST from other architectural styles is the Uniform Interface enforced by the fourth constraint.

We don’t usually think about it, but it’s pretty amazing that you can use the same Internet browser to read the news, and to do your online banking. Despite these being fundamentally different applications. You don’t even need an extension to the browser to do any of this!

We can do this because the Uniform Interface decouples the interface from the implementation, which makes interactions so simple that it’s easy for somebody familiar with the style to understand it, even automatically (like Googlebot).

The Uniform Interface constraint is made up of 4 sub-constraints:

4.1. Identification of Resources

The REST style is centered around resources. This is unlike SOAP and other RPC styles that are modeled around procedures (or methods).

So what is a resource? A resource is basically anything that can be named. From static picture to a feed with real-time stock prices.

But in enterprise software the resources are usually the entities from the business domain (i.e. customers, orders, products, etc.) On an implementation level, it is often the database tables (with business logic on top) that are exposed as resources. But you can also model a business process or workflow as resource.

Each resource in a RESTful design must be uniquely identifiable via an URI (Uniform Resource Identifier) and the identifier must be stable even when the underlying resource is updated (i.e. “Cool URIs don’t change”).

This means that each resource you want to expose through a RESTful web service must have its own URI. Normally, you would use the first URI below to access a collection of resources (i.e. several customers) and the second URI to access a specific resource inside that collection (i.e. a specific customer):

1) https://api.example.com/customers
2) https://api.example.com/customers/932612

Some well-known APIs that claim to be RESTful fail this sub-constraint. For example, Twitter’s REST APIs uses RPC-like URIs like statuses/destroy/:id and it’s the same with Flickr.

The problem is that they break the Uniform Interface requirement, which adds unnecessary complexity to their APIs.

4.2 Manipulation of Resources through Representations

The second sub-constraint in the Uniform Interface is that resources are manipulated through representations.

This means that the client does not interact directly with the server’s resource. For example, we don’t allow the client to run SQL statements against our database tables.

Instead, the server exposes a representation of the resource’s state. It can sound complicated, but it’s not.

It just means that we show the resource’s data (i.e. state) in a neutral format. This is similar to how the data for a web page can be stored in a database, but is always send to the browser in HTML.

The most common format for RESTful web services is JSON, which is used in the body of the HTTP requests and responses:

{
  "id":12,
  "firstname":"Han",
  "lastname":"Solo"
}

When a client wants to update the resource, it gets a representation of that resource from the server, updates the representation with the new data, send the updated representation to the server, and ask the server to update its resource so it corresponds with the new representation.

The benefit is that you avoid a strong coupling between the client and server (like with RMI in Java), so you can change the underlying implementation without affecting the clients. It also makes it easier for clients as they don’t need to understand the underlying technology used by each server that they interact with.

4.3 Self-Descriptive Messages

The third constraint in the Uniform Interface is that each message (i.e. request/response) must include enough information for the receiver to understand it in isolation.

Each message must have a media type (for instance, application/json or application/xml) that tells the receiver how the message should be parsed.

HTTP is not formally required for RESTful web services, but if you use the HTTP methods you should follow their formal meaning, so the user won’t rely on out of band information to understand them (i.e. don’t use POST to retrieve data, or GET to save data).

So for the Customer URIs, which we defined earlier, we can expose the following methods for the client to use:

Task Method Path
Create a new customer POST /customers
Delete an existing customer DELETE /customers/{id}
Get a specific customer GET /customers/{id}
Search for customers GET /customers
Update an existing customer PUT /customers/{id}

The benefit is that the four HTTP methods are clearly defined, so an API user who knows HTTP, but doesn’t know our system can quickly guess what the service is doing by only looking at the HTTP method and URI path (i.e. if you hide the first column, a person who knows HTTP can guess what it says based on the two new columns).

Another cool thing about self-descriptive message is that (similar to statelessness) you can understand and reason about the message in isolation. You don’t need some out-of-band information to decipher it, which again simplifies things.

4.4 Hypermedia as the Engine of Application State

The fourth and final sub-constraint in the Uniform Interface is called Hypermedia as the Engine of Application State (HATEOAS). It sounds a bit overwhelming, but in reality it’s a simple concept.

A web page is an instance of application state, hypermedia is text with hyperlinks. The hypermedia drives (i.e. engine) the application state. In other words, we click on links to move to new pages (i.e. application states).

So when you are surfing the web, you are using hypermedia as the engine of application state!

So it basically means that we should use links (i.e. hypermedia) to navigate through the application. The opposite would be to take an Customer ID from one service call, and then use it as an input parameter to another service call.

It should work like a good web site where you just enter the URI and then you just follow the links that are provided on the web pages. You don’t need to know more than the initial URI.

For example, inside a customer representation there could be a links section with links to the customer’s orders:

{
  "id":12,
  "firstname":"Han",
  "lastname":"Solo",
  "_links": {
    "self": {
      "href":"https://api.example.com/customers/12"
    },
    "orders": {
      "href":"https://api.example.com/customers/12/orders"
    }
  }
}

The service can also provide the links in the Link HTTP header, and W3C is working on a standard definition for the relation types, so we can use standardized meanings which further helps the user.

An enormous benefit is that the API user doesn’t need to look in the API documentation to see how to find the customer’s orders, so he or she can easily explore while developing without having to refer to out-of-band API documentation.

It also means that the API user doesn’t need to hardcode (and manually construct) the URIs that he or she wants to call. It might sound like a trivial thing, but Craig McClanahan (co-designer of The Sun Cloud API) wrote in an informative blog post that in his experience 90% of client defects were caused by badly constructed URIs.

Roy Fielding didn’t write that much about the hypermedia sub-constraint in his PhD dissertation (due to lack of time), but he later wrote a blog post where he clarified some of the details.

5. Layered System

The fifth constraint is another constraint on top of the Uniform Interface, which says that the client should only know the immediate layer it is communicating with, and not be aware of any layers behind it.

This means that the client doesn’t know if it’s talking with an intermediate, or the actual server. So if we place a proxy or load balancer between the client and server, it wouldn’t affect their communications and we wouldn’t need to update the client or server code.

It also means that we can add security as a layer on top of the web services, and then clearly separate business logic from security logic.

6. Code-On-Demand (optional)

The sixth and final constraint is the only optional constraint in the REST style.

Code-On-Demand means that a server can extend the functionality of a client on runtime, by sending code to it that it should execute (like Java Applets, or JavaScript).

I have not heard of any RESTful web services that actually send code from the server to the client (after deployment) and gets it executed on the client, but could be a powerful way to beef up the client.

A really nice feature of the simplicity that is enforced by these six constraints (especially, uniform interface and stateless interactions) is that the client code becomes really easy to write.

Most modern web framework can figure out what to do, if we follow the conventions above and they can take care of most of the boilerplate code for us.

For example, in the new Oracle JET toolkit, we simply need the JavaScript below to create a customer (and it would be just as easy in AngularJS):

// Only code needed to configure the RESTful Web Service
var Customer = oj.Model.extend({
  urlRoot: "http://api.example.com/customers",
  idAttribute: "id"
});

// Create a new customer representation
var customer = new Customer();
customer.attributes.firstName = "Han";
customer.attributes.lastName  = "Solo";

// Ask the server to save it
customer.save().then(function() {
  console.log("Saved!");
});

And it’s just as easy to call the other HTTP methods.

So the front-end engineer just need to add a few more lines to add an HTML form where the user can enter the values, and voila we have a basic web app!

And if we use one of the many gorgeous UI frameworks (like Twitter’s Bootstrap or Google’s Materialize), we can quickly develop something really nice looking in a really short time.

That’s it for now. Thank you for reading and take good care until next time!

Boost Your REST API with HTTP Caching

It’s a core part of the REST architectural style to use caching!

That’s nice, you might think, but why should I use it?

Because it will allow you to show off against other API Designers by claiming that your REST services are twice as RESTful as theirs 😉

But more seriously, Roy Fielding, who invented the REST architectural style, didn’t add caching as a requirement just for the fun it! He added it because it can seriously boost performance, which is also shown in the numbers in Tom Christie’s great post on performance tuning of Django REST services.

So, how do you get started with HTTP caching?

It’s only for HTTP GET requests!

At first it may seem like an overwhelming task to implement HTTP Caching; especially if you have already developed a huge number of services.

The good news is that it’s only GET methods where you need to think about caching as it doesn’t really make much sense to cache POST, PUT or DELETE responses.

The even better news is that if you simply specify the right HTTP header then the browser will do all the heavy lifting for you!

Code, please!

That’s all very nice! But can you please show us the code?

Definitely, but the only code you need is the Cache-Control header in your HTTP response. There are a number of directives in this header you can use to control the caching:

Directive Description
max-age The maximum time that the cached response should be used (in seconds). The maximum value is 1 year.

Example:

Cache-Control: max-age=3600

Kyle Young writes that a rule of thumb is to use between 60 seconds and 1 hour for most content, but for pseudo dynamic content, use less than 60 seconds (or don’t cache it at all).

s-max-age This directive overrides max-age for shared cache, such as proxy servers. You usually have more control over the proxy cache than the client’s local cache, so you can add longer values here.

Example:

Cache-Control: max-age=0, s-max-age=3600

Thuva Tharma has some interesting thoughts on why s-max-age may be better than max-age.

public private Is the response specific to the client, so it cannot be used for other clients? For example, /tasks/myTasks is client-specific.

If the response is client-specific, use private. Otherwise, use public, which is also the default.

Examples:

Cache-Control: private, max-age=3600
Cache-Control: public, max-age=3600
no-store This is used for sensitive data (like credit card details) that must not be stored in caches or proxies under any circumstances.

Example:

Cache-Control: no-store
no-cache The client must not use cached responses. Unless, it first sends a conditional GET (with an ETag) to the server to check if the data has been updated in the meantime.

Example:

Cache-Control: no-cache
must-revalidate If the cached response has expired, it must be revalidated at the server.

HTTP might under some circumstances serve cached responses that have expired (for instance, under poor network connectivity), but using this directive ensures that this won’t happen.

Example:

Cache-Control: max-age=3600, must-revalidate
proxy-revalidate Same as must-revalidate, but for proxy servers.

Example:

Cache-Control: s-max-age=3600, proxy-revalidate

So let’s say that the client sends a request for some metadata, and we want the client to cache it for 1 hour:

GET /customers/metadata HTTP/1.1
Host: api.example.com
Accept: application/json
Accept-Language: en

To do this, we just add the Cache-Control header to our response:

HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: max-age=3600
Content-Length: 88
Etag: "6d82cbb050ddc7fa9cbb659014546e59"

{
  "languageCodes": [
    {"da":"Danish"},
    {"no":"Norwegian"},
    {"en":"English"}
  ]
}

As you can see it’s pretty easy to add caching to your RESTful services…

So if you have performance issues then HTTP caching could be the power tool you are looking for to seriously reduce your response times!

Avoid Data Corruption in Your REST API with ETags

There are few things worse than a really nasty data corruption issue.

Especially if it has occurred silently over a long period of time, so when it’s discovered it’s too late to rollback to a backup before the defect was introduced. It’s even worse if it has also occurred randomly, so there is no pattern to base your fix upon.

Yet an awful lot of REST APIs ignore concurrency control, so if they are used by multiple clients who modify the same data at the same time then it can lead to lost updates and stalled deletes, which slowly ruins the data in the database.

This is totally unacceptable in most enterprise applications where data integrity is something you just don’t fool around with…

So how can you avoid this messed up situation?

You use the concurrency control designed into the HTTP protocol as a simple, yet effective way to protect the integrity of your data.

Meet the ETag Header

If you want to use the concurrency control in the HTTP Protocol, you need to use the optional Entity Tag (ETag) header in the HTTP request.

The ETag is kind of like a version stamp for a resource and it’s returned as part of the HTTP response.

For example, if you send a request to get a specific customer:

GET /customers/987123 HTTP/1.1

Then the ETag (if used) is included in the header in the response:

HTTP/1.1 200 OK
Date: Sat, 30 Jan 2016 09:38:34 GMT
ETag: "1234"
Content-Type: application/json

{
  "id":987123,
  "firstname":"Han",
  "lastname":"Solo",
  "job":"Smugler"
}

Each time the resource is updated on the server, the ETag header will be changed to reflect the content of the new version of the resource.

So to avoid lost updates, you simply take the value of the ETag header and put into the If-Match header on the PUT request:

PUT /customers/987123 HTTP/1.1

If-Match: "1234"
Content-Type: application/json

{
  "id":987123,
  "firstname":"Han",
  "lastname":"Solo",
  "job":"General"
}

The PUT request above says that you want to update the customer resource on the server, but only if the ETag matches 1234 to make sure that the customer hasn’t been updated since you sent the GET request. In this way, your request won’t incidentally overwrite other users’ updates.

Over at the Server

When the server gets the PUT request, it will execute the logic below:

First, the server checks that the If-Match header is included in the request. If not, it will tell the client that you cannot update this resource without the If-Match header.

Second, the server checks if the resource actually exists as it must exists before it can be updated.

Third, it checks if the ETag supplied in the If-Match header is the same as the latest ETag on the resource. If not, it tells the client that the precondition has failed.

Finally, if the request passes the three validations, then the server updates its resource.

If the server uses the same approach on DELETE request, you can also avoid stalled deletes.

Note: The logic above assumes that the server doesn’t allow you to create new resources with a PUT request. If this is allowed then the first step should be to check if the resource exists, and if not then branch out and create it and skip the other steps.

Implementation Hints

There are several ways to implement ETags on the server.

One way is to make a hash of the resource, and put in the ETag header. But you need to make sure that the hash includes all updatable fields in the response, be sure that there cannot be hash collisions, and find a hashing algorithm that doesn’t impact performance too much.

Another really simple way to implement ETags is to add a read-only etag column to the underlying database table and add a trigger that increases the value each time the row is updated. Of course, the server needs to be aware that the database changes this value behind the screen, so the server always uses the latest version.

Why not timestamps instead of ETags?

An easier way to implement concurrency control in HTTP is to use the Last-Modified and If-Unmodified-Since headers instead of ETag and If-Match. This difference is simply that these two headers use timestamps instead of ETags.

So, if it’s easier why not use it?

The problem is that the timestamps use seconds as their finest precision, so if you have fast, high-frequency updates then there is a risk that two updates occur within the same second and you lose one of them.

So in enterprise software where data integrity is an absolute requirement, Etags are the safe choice.

7 Tips for Designing a Better REST API

If you need to develop a REST API for a database-driven application, it’s almost irresistible not to use the database tables as REST resources, the four HTTP methods as CRUD operations, and then simply expose your thinly-wrapped database as a REST API:

Operation HTTP SQL
CREATE POST INSERT
READ GET SELECT
UPDATE PUT UPDATE
DELETE DELETE DELETE

The problem is that one of the foundations of the REST architecture is that the client-facing representation of a resource must be independent of the underlying implementation, and implementations details should definitely not be leaked to the client, which is all too easy with the database-driven approach.

It’s also important to ask yourself if an almost raw database is the best interface you can offer your API users? I mean there is already a near-perfect language for doing CRUD operations on database tables, it’s called SQL… And you probably have some business logic on top of those tables that your API users would appreciate not having to re-implement in their own code.

So how do you move beyond this database-oriented thinking and closer to a more RESTful design for your API?

Let’s find out…

1. Begin with the API User in Mind

Bestselling author and architect Sam Newman’s great book on microservices provides a powerful alternative to the database-driven approach for designing REST web services. It’s useful even if you don’t plan to use microservices.

Newman suggests that you divide your application into bounded contexts (similar to business areas). Each bounded context should provide an explicit interface for those who wish to interact with it. Implementation details of the bounded context that don’t need to be exposed to the outside world are hidden behind the interface.

You should use this explicit interface as the basis for your API design. Start by asking yourself what business capabilities do the API user needs, rather than what data that should be shared. In other words, ask yourself what does this bounded context do? and then ask yourself what data does it need to do that?

The promise is that if you wait with thinking about shared data until you know what business capabilities you need to offer, it will lead to a less database-oriented design.

I think his approach is a good way to jolt you out of the database-driven mindset, but you need to be careful that you don’t end up designing a REST-RPC hybrid.

What I also like about this approach is that it minimizes the interface and doesn’t expose all data by default, but hides internal data (like logging and configuration tables) from the client and instead focuses on what the client actually needs.

This also fits beautifully with veteran API designer Joshua Bloch’s maxim saying that When in doubt, leave it out (from his highly popular presentation on API design), and it also harmonizes with the REST principle that a representation of a resource doesn’t need to look like the underlying resource, but can be changed to make it easier for the client.

So feel free to think about what would be the easiest interface for the API user, and then let your resource take data from multiple tables and leave out columns that are irrelevant to the job that clients need to perform.

2. Use Subresources to Show Relationships

An attractive alternative to only using top-level resources is to use subresources to make the relationships between resources more obvious to the API user, and to reduce dependencies on keys inside the resource representation.

So how do you decide what resources should be subresources? A rule of thumb is that if the resource is a part of another resource then it should be a subresource (i.e. composition).

For example, if you have a customer, an order and an order line then an order line is a part of an order, but an order is not a part of a customer (i.e. the two exists independently and a customer is not made up of orders!)

So the URIs would look like this:

/customers/{id}
/orders/{id}/lines/{id}

A different rule of thumb is to also include aggregations as subresources. That is, a belongs to relationship. If we use this rule then an order belongs to a customer, so the path would look like this:

/customers/{id}/orders/{id}/lines/{id}

So what rule should you pick?

The idea with subresource is to make your API more readable. For example, even if you don’t know the API you can quickly guess that POST /customer/123/orders will create a new order for customer 123.

However, if you end up with more than about two levels then the URI starts to become really long and the readability is reduced.

You also need to be aware that subresources cannot be used outside the scope of their parent resource. In the second example, you need a customer id before you can lookup a order, so if you want a list of all open orders (regardless of customer) then you cannot do it in the second example.

Ehh, so what to pick?

If you want a flexible API, aim for fewer subresources. If you want a more readable API, aim for more subresources.

The important thing is that whatever rule of thumb you pick then be consistent about it. I mean the API user might disagree with your decision, but if you are using it consistently throughout your API, he or she will probably forgive you.

3. Use Snapshots for Dashboard Data

A deal-breaker for using subresources is that the client might need to access data across subresources to get data for a dashboard or something similar. For example, a manager might want to get some statistics about orders across all customers.

Before you go ahead and flatten your whole API, there are two alternatives you should consider.

First, remember that there is nothing that prevents you from having multiple URIs that point to the same underlying resource, so beside /customers/{id}/orders/{id}, you could add an extra URI to query orders outside of the customer scope:

/orders/{id}

To minimize the duplication of functionality, you can limit the top-level /orders URI to only accept GET requests, so if clients want to create a new order, they will always do it in the context of a customer.

Of course, you need to be careful not to duplicate resources unnecessarily, but if there is a customer need, then there is a customer need, and we need to find the best possible solution.

In RESTful Web Services Cookbook, Chief Engineer at eBay (and former Yahoo Architect) Subbu Allamaraju suggests an alternative approach called snapshots.

For example, if an order manager wants to see some specific statistics (5 latest orders, 5 biggest clients, etc.) then we can create a snapshot resource that finds and returns all these information:

/orders/snapshot

I personally like the snapshot approach better, because it doesn’t feel like querying a database. But with that said, the snapshot approach requires an intimate knowledge of the API user, and the extra order top-level resource will offer more flexibility.

4. Use Links for Relationships

Another way to show relationships between resources, without falling back on using keys in an SQL-like manner, is to embed links inside your responses:

{
  "id": 123,
  "title": "Mr.",
  "firstname": "Han",
  "surname": "Solo",
  "emailPromotion": "No",
  "_links": {
    "self": {
      "href": "https://api.example.com/v1/customers/123"
    },
    "contactDetails": {
      "href": "https://api.example.com/v1/customers/123/contactDetails"
    },
    "orders": {
      "href": "https://api.example.com/v1/orders?customerId=123"
    }
  }
}

A cool thing about links is that they allow autodiscovery by clients! When the client gets the response back then it can see in the _link section what other actions it can follow from here. This is just like when you are surfing the web where you come to a page and then you can follow its links to new pages.

Another nice thing is that clients will have fewer hard-coded links in their code, which will make the code more robust. If the client wants to see the customer’s orders, it can just follow the orders link to get them.

However, there are different opinions about if you should use links, or not…

Vinay Sahni writes in his excellent blog post Best Practices for Designing a Pragmatic RESTful API that links are a good idea, but we are not ready to use them yet. On the other hand, the RESTful maturity model says that when you start using links you have reached the highest level of REST maturity.

So what to do?

Well, Dr. Roy Fielding, an expert on software architectures and the inventor of REST architectural style, flatly said on his blog that if you don’t use links it ain’t REST services, and he kindly encourages you to use another buzz word for your API!

5. Hide Internal Codes

In an earlier post, I incidentally leaked internal codes in the job_id column on the employees table:

{
  "jobId":"SH_CLERK" 
} 

Needless to say, this is an implementation detail that gives away that we are using a relational database, and experienced Oracle users would instantly spot that it’s Oracle’s sample HR schema. This leaking makes it harder to switch to a document-oriented database, like MongoDB, where there is no concept of foreign keys.

But even if the chance of switching to MongoDB is zero, it still makes the response harder to read.

So a better approach is to let the REST API translate the internal code to the human-readable value that the code represents (i.e. “Shipping Clerk”) and then also remove the Id part of the field name.

{
  "job":"Shipping Clerk" 
} 

This version is definitely more readable, but a fair concern is if the service will be slower now that it needs to lookup the value? I used to be an avid reader of Tom Kyte, the Oracle DB expert, and still remember that you should always optimize from measurements. I mean there’s a good chance that the HTTP cache will help us out and make it less of a bottleneck than it appears at first glance.

As a rule of thumb, if performance means everything to you (or you have a lot of lookup fields) then you might consider leaking the internal codes. Otherwise, you should provide a more readable API by hiding them.

6. Translate Automatically

But what about translations? What if you have a multilingual application that has translations of the internal code in multiple languages? How do you handle that?

Simple! You let the API user specify its preferred language in the Accept-Language HTTP header. For example, Accept-Language: da and then the REST API should automatically translate it into Danish:

{
  "job":"Shippingmedarbejder" 
}

7. Create a Resource for Metadata

So what if the API user needs to show a drop-down list that shows all possible jobs? How will he or she get a complete list of all possible values for the job field?

The easy solution for you is simply to write the list of possible values in your API documentation and then the API user can hardcode them in the drop-down list. But this will lead to fragile client apps that need to be updated when new job types are added, and it just doesn’t feel very web-like to rely on offline metadata.

A more robust solution is to create a metadata subresource that provides list of values and other metadata that are needed when using the resource. For example, a /employees/metadata subresource could provide the API user will all the metadata needed to interact with the employees resource.

This solution is similar to how Atlassian is doing in the JIRA API. If you make sure that the response from the metadata subresource is cached properly it shouldn’t affect performance adversely and you will provide a more flexible API that leads to more stable client apps.

That’s it for now. I really hope that some of these tips will help you design better REST APIs. Thanks for reading and take good care until next time!

6 Ways to Increase Your Industry Knowledge

Ever since the Tower of Babel, complicated projects have failed due to poor communication. If you have a basic understanding of an industry, you can communicate much more effectively with business people, and reduce the risk of failure.

You can even create an attractive niche brand by combining your technical expertise with a deep understanding of how your industry works.

That’s nice, you might think, but how do you get this industry knowledge?

Here are six tips to get you started:

  1. Create a mental model: Make a boxes and arrows diagram of your current mental model of the domain. The remaining tips are all about growing and refining your model.
  2. Google it: A simple Google search, like “domain knowledge banking”, will give you a lot of good reads to get started with.
  3. Find a mentor: Find someone who really understands the domain and listen curiously to what they say about their work. People usually enjoy talking about what they are good at.
  4. Follow the experts: Find out who are the thought leaders of the industry, and follow their Twitter accounts and blogs.
  5. Read the trade journal: Find out what the major publication is in the industry and read it.
  6. Watch an employee at work: See somebody perform one of the key jobs in that industry. For example, spending a few hours with a call center employee will give you a whole new understanding of their fast-paced job.

The good news is that the fundamentals of most industries remain pretty stable over time, so once you have a basic understanding it won’t require a lot of maintenance.

And since most software developers don’t bother to learn much about the industries they work in, you can easily get ahead with only a small investment of your time.

How to Boost Your Productivity

A simple yet effective way to increase your productivity is to divide your work into 90-minute blocks and take relaxing breaks between them.

In each 90-minute block you focus intensely on a given task, so no social media, chatting, emailing, news reading, etc. You just work, work, work, and then work some more on the task at hand.

Some people has experienced that they can get as much done in a focused 90-minute block as they normally get done in a whole day!

Research shows that 90 minutes is the upper limit for how long we can stay focused. After that our minds start wandering and become less productive.

This is why you need a break after you have worked intensely for 90 minutes. It needs to be a real break away from the computer, so your brain can recover and become ready to performance at its maximum level again.

So how does such a 90-minute schedule looks in practice?

For a blogger it can look like this:

  • 08:00-09:30 Block 1: Bulldoze through first draft.
  • 09:30-10:00 Break: Go for a run.
  • 10:00-11:30 Block 2: Rewrite and make ready for publication.
  • 11:30-12:00 Break: Enjoy lunch.
  • 12:00-13:30 Block 3: Publish and promote.

While 3×90 minutes doesn’t sound like a lot, you must remember that this is highly effective work without any procrastination. Bestselling author Tony Schwartz wrote in Harvard Business Review that he used to write for 12 hours each day and finish a book in about a year. He then switches to write in three 90-minute blocks per day and was able to finish a book in less than 6 months!

But if you really want to push it, personal development expert Steve Pavlina puts down the challenge of doing 5×90-minute blocks per day to get a whole week’s work done in a single day!

For me it works best if I have a goal for each 90-minute block. It’s not needed with a specific and measurable goal, but just a rough idea of what I would like to achieve within the 90 minutes.

I also discovered that I need a physical kitchen clock with a countdown feature to make this work for me. At first I used the timer on my iPad, but after some time the screen goes dark, and I must be able to see the clock counting down all the time for maximum motivation.

My secret weapon, a kitchen timer!

Another essential element for success is to take breaks seriously(!) So don’t sit down and “do” Facebook for half an hour. Take a real break and give your mind a chance to relax and renew itself.

Here’s a quick list of ideas for break activities:

  • Take a run, stretch or some other exercise.
  • Go for a walk.
  • Meditate, pray or visualize.
  • Enjoy a healthy snack.
  • Take a nap.
  • Talk with a friend.

If you work in a management position (or another job with many interruptions) it can be difficult to organize your work into uninterrupted 90-minute blocks. But you should still carve out one 90-minute block per day to get the important, but not (yet!) urgent, stuff done. Everybody can do that!

9 Tips for Effective One-on-Ones

The purpose of a one-on-one meeting is to listen to what your people has to say.

In this way, you can learn about potential issues before they turn into serious problems.

Your managerial reward of effective one-on-ones is a lack of drama in your team, which enables your people to spend their time on doing great work!

So how do you do effective one-on-ones? Here are 9 tips to get you started.

1. Be consistent and never-ever cancel it

You should schedule one-on-one meetings as a reoccurring event in your calendar. The meeting should be 30-60 minutes. You should schedule it for every week, but if you have more than 6 employees then consider scheduling it biweekly due to the time involved.

You should never-ever cancel it, as that will basically communicate that your people are not that important to you.

If you keep it consistent your people will know that this is the time to discuss issues. This will make your schedule more predictable as you are less likely to be interrupted at other times if your people know that there’s a one-on-one where they can bring up their concerns.

2. Start with “How are you?”

Michael Lopp, a veteran software engineering manager, writes in The Update, The Vent, The Disaster that you start the one-on-one meeting with a simple, “How are you?

It’s a soft opener, but the idea is that whatever the reply, you can learn something useful and take the conversation from there.

3. Listen, Listen and Listen Some More

The one-on-one should be a bottom-up meeting where your people share information with you, rather than a top-down where you are pushing information down their throats!

While most managers are extremely busy, and honestly don’t want to hear about more problems, which will only make them more busy and perhaps even shake the boat! But the hard truth is that the earlier a problem is detected, analyzed and dealt with, the cheaper it will be to solve. And all practical experience shows that ignoring problems will (unfortunately!) not make them disappear.

4. It Ain’t a Status update!

The one-on-one is not for operational stuff. There are project meetings and stand-up meetings for handling the daily status of tasks.

One-on-ones are for strategic stuff; like, looking into how we are doing stuff and how we can do it better.

If your employee starts to go through status, interrupt after some 5 minutes and say, ”It sounds like you got the project/task under control, but how about the [insert strategic activity] that we talked about earlier?”

5. Ask Powerful Questions

Sometimes it can be difficult to get the conversations rolling. For example, when you have new people, or very quiet people, or people who have had a poor manager in the past.

So to get the talking started it always help to have some powerful questions prepared. For example:

  1. What excites you at work?
  2. Do you have ideas for how we can do things better?
  3. Where is your job satisfaction on a scale 1-10? Follow-up with: What would it take to make it a 10?
  4. Do you have ideas for the next release of our product?
  5. How’s the family?

A one-on-one is also an excellent opportunity to debrief recent work. For example, to discuss what went well, what didn’t went well, and what are the lessons learned for next time.

6. Ask for Gossip (really!)

It’s always good to ask for gossip. Especially, if there has been a recent re-org or any other major event.

Ask your people about their opinion about the event, rather than starting with your own opinion. Then, afterwards, you can always come with your interpretation of the event and explain how it fits into the big picture.

7. Follow-up (action speaks louder than words!)

If there are any actions coming out of the meeting, be sure to follow-up and execute them.

If actions are agreed and nothing ever happens with them, it will destroy trust in the relationship.

Be the manager who shows that good ideas are acted upon. Show with your actions that what’s being said in a one-on-one is important to you. Build trustworthiness through your actions.

8. A Dedicated Day for One-on-Ones

Software development manager and blogger Ed Gibbs writes in One-on-Ones in a Single Day that he prefers to have all his one-on-ones in a single day (Wednesday).

The benefits are that it’s easier to defend a full day (than large chunks of time throughout the week), it frees large chunks of time on the other days for other activities, you are better prepared for the one-on-ones, and it gives you a better sense of how the whole team is doing.

9. Try a Different Location

Executive coach Beth Miller recommends in Revamp Your Approach to Monthly One-to-One Meetings that you sometimes move away from the office to shake up the one-on-one, add a more casual theme to it and get to know your people on a different level.

For example, by going out for lunch or by going for a walk outside, which was a favorite of Steve Jobs.

At the end of the day, a one-on-one is an opportunity for your people to share their ideas, concerns and need for personal development. Your task as a manager is to ask good questions, listen to what is being said, provide sensible feedback, and follow-up constructively.

Some managers will undoubtedly read this and think that it sounds like a humongous time stealer from another dimension! But they must remember that their job is management and that one-on-ones are one of the most effective management tools out there.

In the words of Johanna Rothman, a bestselling author of software management books, “Managers who use one-on-one meetings consistently find them one of the most effective and productive uses of their management time.”