Friday, January 12, 2018

When a REST Resource should get its own Address?

Background 

Author's note

In a purist REST approach, all endpoints (except the starting endpoint) are opaque and their various details shouldn't need to be published.  Even if this approach is being used, the points in this article are relevant as Server logic will have to determine when something requires a end point or not. 

Introduction 

In a REST architecture an entity or a resource (for the rest of the article the term entity will be used) may or may not have its own address.  For example, suppose we have an inventory application merchants use to sell their products. Immediately it is possible to see a Product entity.  It's URL will look something like: /product/{id}

Now, it is possible for the merchant selling the Products to add his / her own comments to the Products.  For example, "Sells very well on Fridays" or "Consider changing price if product doesn't start selling".   A Product can have 0..* Comments.  As stated, the Product has its own address: /product/{id} for example /product/1231233

and a response payload like this
{

    "id":"1231233",

    "type":"Beer",

    "comments": [{

             "id":"1",

             "comment":"Sells very well on Fridays"                 

     }, {

             "id":"2",

             "comment":"Consider changing price if product doesn't start selling"  

     }]

}
As can be seen, the payload returns a collection of Comment Objects. Should the individual comments each have their own address as well or is it okay that they are just embedded into the Product response? To help the answer this question the following should be considered.

Does the Entity have any meaning outside the Containing Entity Context? 

If an Entity (for example Comment) has meaning outside their containing Entity (for example Product) then they should have their own address.  For example, suppose the Entity was Student and the Student returned a list of Universities he / she had studied.   These Universities have their own meaning outside the Student. So obviously the University should have its own address. In the Activity / Comments scenario, the Comments only exist for the activity.  No other Entity will ever reference them or need to reference them. Therefore further aspects needs to be considered.

Is it desirable to perform actions on the individual entities? 

Should the client be allowed to create, read, update or delete the individual entity? These have to be considered separately.

Writes: Create, Update, Delete 

In the Product / Comments scenario, a Comment would never be created outside or without an Product. It is essentially added to an Product. This could be considered as a partial update to the Product.  However, an update or delete to an existing Comment could also be considered a partial update to the Product.   This creates complexity on how to differentiate between Create / Updates and Deletes of a Comment using a partial update on the Product.  If this is required, it would be much simpler to create a contextual address for the Comment (which indicates the hierarchical nature of the Product / Comment) then allow the Client sent POST, PUT, PATCH, DELETES to that.

Example URL: /product/1231233/comment/1

Reads 

In some scenarios the parent containing Entity may not return all the information about the child Entities. For example, again consider the Product --> Comment scenario. Suppose the comment was very large. This would mean the payload for the Product was also very large.  In such cases, it might be more prudent for the Product to just return a summary of the Comment and if the client wants the full Entity to make an individual request.  Similarly, if there's a big performance cost to get an individual Entity (for example a 3rd party API has to be invoked to get all the information about the comment), it can make more sense to just send a URL link to the Entity rather the than the actual entity contents.

N+1 Problem 

If individual Reads are required, be careful that the N+1 problem doesn't then get introduced. For example, suppose a Product could have 100 Comments. the Product API will only return a summary of the Comment and a link to each individual comment if the client wants all the information. However, if the client wants every single comment, this means there will now be 100 HTTP requests. If this is a potential scenario, then a secondary endpoint which aggregates all the comments into the Product should be considered. This is similar to the API Gateway pattern.

Surface Area of Endpoints

In any architecture when contracts are published, if there are too many it can become very unwieldy for developers to understand. Most well known APIs (e.g. PayPal, Amazon, Twitter, Google) usually only have about 20 - 30 addresses. This is a good aim to have. If there are 5,000 different addresses it can become way too large and difficult to control etc.

In summary, the decision diagram provides guidance on what you should do.


Sunday, January 7, 2018

Are you forgetting your Agile values?

A while back I wrote why sometimes Agile will fail.  In this post, I will focus on the specific misunderstandings of Agile values.   When people ask if you're Agile, they basically think:
  • Do you have stand ups?
  • Do you have retrospectives?
  • Do you have stories?  
  • Do you use yellow post its?
  • etc
Such ideas belong to an Agile process called Scrum which is the most popular Agile process but isn't the only one.   There are other processes: Kanban, RUP, XP, etc...  You don't necessarily have to be Scrum.

For me the most important thing that came from Agile was the actual manifesto.  It should be read by everyone working in software at the beginning of every year and at the beginning of every project. Then it should be discussed with team members and people should be encouraged to give specific examples they have of the values and principles detailed in the manifesto.    In this blog post,  we'll have a look at the values.

8 Values with emphasis on 4

There are four points which detail 8 values in the manifesto
  1. Individuals and interactions over processes and tools
  2. Working software over comprehensive documentation
  3. Customer collaboration over contract negotiation
  4. Responding to change over following a plan
Then there is the very important sentence: "That is, while there is value on items on the right, we value items on the left more."  It is worth saying that sentence twice.  Maybe three times... four times... whatever it takes so you never forget it.  Why? Well let's take them one by one.
 

Individuals and interactions over processes and tools

So does this mean if you are Agile do you stop worrying about process?  No.  No project can ever be successful without process.   From an Agile perspective, it means, you value process. But, you value Individuals and interactions more.   So for example,  say you spend lots of man hours in a process that isn't really adding value to the project or customer. When you look at this critically it is because developers and testers don't talk to each other.  Instead they have a convoluted process so they think they can blame each other when a production defect happens.  Lots of man hours goes into this.  But, it would be much more efficient if they talked regularly to each other which then negates the aspects of the process which are making it convolutedSo the challenge is two fold:
  • Ensure people talk to each other regularly
  • Ensure your processes are efficient 
So for example, it makes more sense to have a regular show and tell then no interaction whatsoever and a massive delivery at the end where you are hoping you will get customer satisfaction because of some long complex process.   Aim to make individuals, interactions in the heart of your processes.

Working software over comprehensive documentation

This is a classic misunderstanding.  Somebody wants comprehensive documentation for some complex functionality and a developer retorts: "Hey Dinasour, this isn't waterfall, there is no need for detailed documentation".   Wrong.  Documentation is still required. The point here is that with Agile you shouldn't be spending massive amounts of man hours on documentation when your software is riddled with bugs.   It is more important that your software works.  This means more time should be spend on excellent automated tests that have sufficient functional coverage.  This isn't always easy to do but you should ensure this is done. 

Customer collaboration over contract negotiation

Thirdly, do you stop doing contracts agreements with customers on your projects.  No. The point here is you spend more man hours with meetings collaborating with customers than you do teasing out a contract with lawyers.  Instead of spending a massive amount of time to get sign off on a massive project, you should collaborate thru the life cycle of the project, breaking it up into small increments, take on board feedback and work together towards a common goal: project success. 

Responding to change over following a plan

Lastly, do you stop planning?  Of course not.  But again, what is the point planning down to the nitty gritty if you cannot even respond to change? Could you imagine a customer asked for a tiny change to how a UI was displaying data and the developer team respond with: "Sorry that wasn't in our 6 month plan"? It is paramount that architecture facilitates reasonable change and if it can't the project will struggle to really embrace an agile philosophy. 

Summary

Agile is actually about putting more constraints on your software methodology.    As an analogy, the REST architecture style puts constraints on your architecture for example the Uniform Interface, Statelessness and Caching capabilities.  The idea is then by sticking to these constraints (and that's a technical challenge) you get benefits.  In the case of REST, your architecture will be more scalable, extensible and lead to much higher developer productivity for API consumers.  In the case of Agile, by sticking to the constraints of valuing 8 key concepts but putting even more emphasis on 4, your project will have greater chance of success. 




Wednesday, November 15, 2017

More Fail early - Java 8

Fail fast or Fail early is a software engineering concept that tries to prevent complex problems happening by stopping execution as soon as something that shouldn't happen, happens.   In a previous blog post and presentation I go more into detail about the merits of this approach, in this blog post I will just detail another use of this idea in Java 8.
In Java, Iterators returned by Collection classes e.g. ArrayList, HashSet, Vector etc are fail fast. This means, if you try to add() or remove() from the underlying data structure while iterating it you get a ConcurrentModificationException. Let's see:
import static java.util.Arrays.asList;
List ints = new ArrayList<>(asList(1,2,3,4,5,6,9,15,67,23,22,3,1,4,2));
    
for (Integer i: ints) {
    // some code
    ints.add(57);  // throws java.util.ConcurrentModificationException
}
In Java 8u20, the Collections.sort() API is also fail fast. This means you can't invoke it inside an iteration either. For example:
import static java.util.Arrays.asList;
List ints = new ArrayList<>(asList(1,2,3,4,5,6,9,15,67,23,22,3,1,4,2));

    
for (Integer i: ints) {
    // some code
    Collections.sort(ints); // throws java.util.ConcurrentModificationException
}
This makes sense. Iterating over a data structure and sorting it during the iteration is not only counter intuitive but something likely to lead to unpredictable results.  Now, you can get away with this and not get the exception if you have break immediately after the sort invocation.
import static java.util.Arrays.asList;
List ints = new ArrayList<>(asList(1,2,3,4,5,6,9,15,67,23,22,3,1,4,2));

    
for (Integer i: ints) {
    // some code
    Collections.sort(ints); // throws java.util.ConcurrentModificationException
    break;
}
But, that's hardly great code. Try to avoid old skool iterations and you use Lambdas when you can. But, if you are stuck, just do the sort when outside the iteration
import static java.util.Arrays.asList;
List ints = new ArrayList<>(asList(1,2,3,4,5,6,9,15,67,23,22,3,1,4,2));
Collections.sort(ints);
    
for (Integer i: ints) {
    // some code
}
or use a data structure which sorts when you add.

This new behaviour of the Collections.sort() API came in Java 8 release 20.   It is worth having a look at the specific section that details the change in the API:
"
Area: core-libs/java.util.collections
Synopsis: Collection.sort defers now defers to List.sort
Previously Collection.sort copied the elements of the list to sort into an array, sorted that array, then updated list, in place, with those elements in the array, and the default method List.sort deferred to Collection.sort. This was a non-optimal arrangement.
From 8u20 release onwards Collection.sort defers to List.sort. This means, for example, existing code that calls Collection.sort with an instance of ArrayList will now use the optimal sort implemented by ArrayList.
"

I think it would have helped if Oracle were a little more explicit here on how this change could cause runtime  problems.   Considering everybody uses the Collections framework if an API that previously didn't throw a exception now can for the same situation (bad code and all that it is), it is better if the release notes made it easier for developers to find that information out.

  

Thursday, October 5, 2017

Book Review: RESTful Web Clients

RESTful Web Clients is written by guru Mike Amundsen who amongst other things co-authored RESTful Web APIs with REST guru Leonard Richardson and Sam Ruby.

The book's primary focus is on the hypermedia aspect of REST, particularly from the client's perspective.   As Roy Fielding detailed in this famous blog post "if the engine of application state (and hence the API) is not being driven by hypertext, then it cannot be RESTful and cannot be a REST APIand let's face it, we have all seen APIs purporting to be REST with no hypermedia whatsoever with lots of coupling between client and server.  Some of this is just down to basic ignorance and some of it probably down to misunderstanding the Richardson Maturity Model

Rather than begin with a summary of Fielding's dissertation like most material on REST, this book  begins with details of a simple web application that uses JSON RPC APIs.  From the simple example Amundsen shows that while the JSON RPC approach functionally works, it results in a lot of coupling between client and server meaning that if the APIs need to change it will be difficult to do that easily as the client(s) with all its hardcoded of contracts will be impacted.  And we know software does need to change from time to time right?

Amundsen distills the coupling with the JSON RPC approach into three distinct types which can be considered and assessed individually:
  • Objects - the JSON objects that appear in API responses.  Clients need to be able to understand them to handle a simple a response to a GET request
  • Addresses - the URLs clients needs to know to invoke requests
  • Actions  - details methods and arguments for all non-trivial operations. Again clients need to know this before invoking requests.  
With the coupling clearly demonstrated, the scene is nicely set to move onto one of key advantages of a REST style archictecture: reducing coupling through hypermedia.

To explain this advantage, Amundsen again uses the approach of specific examples.  Firstly, by detailing the  JSON hypermedia type HAL.   Using this approach reduces the Address coupling and examples of how generic response handling can be written on the client to leverage and take advantage of this decoupling are detailed.  However HAL doesn't solve everything.  Without a custom extension there is still coupling to the JSON Objects and the possible Actions available to the client.  A work around to this is given and I would highly recommend anyone considering using HAL to read Chapter 4.

Next up is another JSON hypermedia type known as SirenKevin Swiber designed Siren and registered it with IANA in 2012.

Siren splits response entities into:
  • class  - this is an array, the values of which indicate what the current resource represents e.g. Customer, Person
  • properties - set of name-value pairs
  • entities - a list of linked and representational sub entities
  • actions - contains a set of valid operations of the associated entity and how to invoke those operations including a list of fields which match HTML5 input types (hidden, text, number).  This is something not in HAL that helps reduce client-server coupling further
  • links - links to other resources.  Each link has a class, href, rel, title, type property
Siren  reduces coupling to Addresses and Actions, however it does not reduce coupling to Objects.  There is no meta-data specification for the class type meaning the client has to hardcode the structure of the object somewhere.   Like HAL it is possible to create a custom extension but this is not part of the Siren specification.
The third hypermedia type detailed is Collection+JSON format (Cj). Interestingly, this format was designed by the author himself.    The basic elements of a Cj message are:
  • Links - Simlar to HAL and Siren links
  • Items - Similar to HAL and Siren properties and also includes meta data about the properties
  • Queries - Information on how to construct various reads (HTTP GETs)
  • Templates - Information on how to construct various writes (HTTP POSTs, PUTs, DELETEs...)
  • error - information ref errors
The key point here is that since Cj includes the metadata about the items, it decouples the client from the Objects in the JSON responses something both HAL and Siren could only achieve with custom extensions.
So which format? Well two good points to make here:
  1. That can be a practical decision and not just a technical one.  You may prefer Cj because out of the box it achieves most decoupling, but your customer may be used to and prefer HAL.
  2. Rather than trying to support every possible format, think about architecting so it possible to support extra formats if you need to.  The approach suggested is described in the Amundsen's Representor pattern - which is inspired from the Message Translator  Pattern
So in summary, this is another great REST book from O'Reilly.  The style of the book in general is pragmatic rather than academic.  It really emphasizes and demostrates the importance of hypermedia in REST APIs and is backed up with practical examples.  The central argument in the book is that Cj achieves the most decoupling.  Even if it was written by the author, the argument is well made and I don't think it would be fair to make accusations of any selection bias since he does detail how you can extend Siren and HAL to achieve the same level of decoupling.

Bottom line - if you want to understand the hypermedia aspects of REST, read this book.