Identifying Resources and URI design.

Identifying Resources

The key parts of a Resource oriented architecture are resources, identifiers, representations and the links between them.Designing a RESTful system starts with identifying resources. Resource identification is generally the most flexible aspect of designing a REST based system. There is no exact science to identifying resources and there is no right or wrong with resources identified. We can generally  identify resources from domain nouns. A resource could be a document, a video, a business process or even a device. A resource is any entity that can be uniquely identified and manipulated. Other factors such as resource granularity , resource composition also play a key role in identifying resources.

Resources thus identified can be classified as documents, collections, and controllers. A document resource is a singular concept and is the base for all the other types of resource classifiers. Examples of document resource types are

http://api.onlinestore.com/customer/1234
http://api.onlinestore.com/order/o3576

A collection is a server managed set of resources. Clients can request addition, removal and modification of resources in the collection. The server can accept or deny such requests based on the domain/server logic employed. The collection also defines the identifiers for each resource in the collection. A collection resource can be used to retrieve a paginated list of contained resources. It can also be used to search and retrieve a filtered list of resources. A collection resource can be used as a factory to create member resources and perform bulk operations on them. Examples of collection resources are

http://api.ordermangement.com/orders
http://api.customermanagement.com/customers

A controller models a processing function or a procedural concept as a resource. Controllers provide separation of concern between the client and the server and allows the server to perform atomic operations. A controller is used to perform Domain specific actions that cannot be mapped to the standard CRUD operations. Examples of a controller resource types are

http://api.ordermangement.com/customer/1234/semdmail
http://api.customermanagement.com/pricingengine/order/o3576/calculateprice

Every resource thus identified exposes the same interface and works the same way. To get the value of a resource you send a GET request to that resource’s URI. To delete the resource you call Delete on the resource URI. The standard HTTP verbs described in  Rest – Communicating with Verbs and status codes suffice for all interactions with these resources without the need to invent a new vocabulary.

Designing Identifiers

REST uses URI to identify resources uniquely. URI’s are opaque identifiers. They identify the name and the address of a resource. URI’s ideally should be designed to logically convey the resource model to the client developers. URI’s should enable intuitive identification of resources and every URI designates exactly one resource. This allows each resource to be addressable which is key in a resource oriented architecture. The  generic URI syntax defined by RFC 3986 is below

The authority portion (authority Part) of the URI is used to identify the service owner of the API. The authority part is generally represented by the top level domain and subdomain identifiers of the API service owner.(e.g. http://azure.microsoft.com/api ).The rest of the URI path identifies the API’s resource model. Each forward slash separated path segment within the URI corresponds to a unique resource in the model’s hierarchy. Some URI path segments can be static or variable. The static segments are specified by the API designer while the variable segments have identifiers which are initialized by the client during runtime. URI templates provide a way to parameterize URIs with variables that can be substituted at runtime. The URI Template syntax allows designers to clearly identify and name both the static and variable segments of the URI. A URI template includes variables that must be substituted before resolution.The mark-up in curly braces, {orderId}, provides an indication to client developers to “fill in the gaps” with sensible values. The URI template example below has three variables (customerId, orerId and productId):

http://api.ordermangement.com/customer/{customerId}/orders/{orderId}/products/{productId}

However be warned that when used poorly, URI templates increase coupling between systems and lead to brittle integration.

The query component of a URI contains a set of parameters which corresponds to a variation or derivative of the resource that is hierarchically identified by the path component. The query component can provide clients with additional interaction capabilities such as ad hoc searching and filtering.A URI’s query portion is also a natural fit for supplying search criteria to a collection.A REST API client should use the query component to paginate collection and store results with the pageSize and pageStartIndex parameters.

For example:  GET /orders?pageSize=10&pageStartIndex=1

Richardson’s Maturity Model

The Richardson Maturity Model was developed by Leonard Richardson. It specifies a model to grade REST services according to their adherence to the REST constraints.This model identifies three levels of service maturity based on the service’s support for URI’s, HTTP and Hypermedia.The level 0 represents a basic

rmm

Level Zero Services – Level zero services are characterized by services having only one URI and using a single request type mainly POST. The message contains both the operation to be performed and the data needed for that operation. At this level services do not have the concept of representations or resources that are uniquely identifiable. They also do not use the HTTP verbs and status codes for providing rich  interaction between the client and the server. For e.g Most of the WS-* based web services are level zero since they only use the POST request to tranmit the SOAP based message body.This is also called as swamp of POX (plain old XML)model. HTTP constructs are not used to communicate between the client and the server.

Level One Services –  A level one service has many URI’s but uses a single HTTP verb. Level one services expose multiple resources through unique URI’s. However operations on these resources are performed by a using a single HTTP verb primarily POST.For e.g URI tunneling is only at level one in Richardson’s maturity model

Level Two Services – Level two services host representations and resources at uniquely identifiable URI’s and also use the gamut of HTTP verbs for comunicating beween the client and the server. The URI specifies the resource being operated on  and the operation is performed using the standard HTTP verbs GET, POST, PUT, DELETE  etc.It also uses the standard HTTP status codes for responses and adheres to the Idempotency and safety principles of the HTTP verbs.

Level Three Services – Level three services implement the concept of Hypermedia as the engine of application state ( HATEOAS).Representations hosted at unique URI’s contain URI links to other representations that may be of interest to the client or indicate a choice of actions that can be performed by the client.These choice of actions lead the client through the application resources causing state transtions to occur based on the action chosen by the clinet.

Services that are at Level three are truly RESTful and adhere to the REST principles as defined by Roy Fielding in his thesis.

REST architectural constraints

 

The above constraints describe what a truly RESTful API sould look like.

REST - Idempotency and Safety

Rest – Idempotency and Safety

Implementing HTTP’s uniform interface as discussed in the previous posts has a surprisingly good architectural side effect. If it is Implemented as specified in the REST specifications (HTTP specification – RFC 2616), you get two useful properties namely Idempotency and Safety .

Safety is the ability to invoke an operation without any side effects arising out of the client invoking that operation. It means that the client can invoke an operation with the explicit knowledge that it is not going to change the state of the resource on the server.However it does not mean that the server should return the same response every time.The server can also perform addtional actions when these methods are invoked such as logging calls or incrementing counters etc but these should not change the state of the resource being acted upon. Generally read only methods are safe methods. GET, HEAD and OPTIONS are safe methods.

Idempotency means that the effect of doing something multiple times will be the same as the effect of doing it only once. A simple example from maths  would be the effect of multiplying any number by One. In math the number 1 is an idempotent of multiplication. e.g. 5 x 1 = 5 which is the same as 5 x 1 x 1 = 5.Similarly an API operation that sets a user’s name is a typically idempotent operation. Whether it is called once or multiple times, the effect of the operation is that the user’s name will be set to the target value. Deleting a resource is an example of this distinction; the first time you invoke the delete, the object is deleted. The second time you attempt to delete it, there is no change in the state of the system, but the object remains deleted as requested. An idempotent operation generates no side effects.

Idempotency results in improved reliability, concurrency, prevents data loss and provides the ability to automatically retry /recover from failures. It improves reliability by providing the ability to safely retry requests that may or may not have been processed. This helps tide over network glitches and load spikes by replaying requests. Load balancers like HAProxy can retry requests when the server disconnects abruptly providing automatic recovery from failures. Since API calls are idempotent multiple API calls can be run concurrently without locks and mutexes to synchronize operations on data. This increases concurrency and system throughput resulting in better performance.

Safety and Idempotency let a client make reliable HTTP requests over an unreliable network. If you make a GET request and never get a response, just make another one. It’s safe even if your earlier request was fulfilled since it didn’t have any real effect on the state of the resource server. If you make a PUT request and never get a response, just make another one. If your earlier request got through, your second request will have no additional effect since PUT is idempotent and the operation can be repeated.

The following table lists shows you which HTTP method is safe and which is idempotent

HTTP MethodSafeIdempotent
 GET Yes Yes
 POST No No
 PUTNo Yes
 DELETENo Yes
 HEADYes Yes
 OPTIONSYes Yes

GET, HEAD, OPTIONS, PUT and DELETE requests are idempotent. If you DELETE a resource, it’s gone. If you DELETE it again, it’s still gone.The response codes in the above two requests can differ to indicate that the resource representation being deleted is gone. Two simultaneous Delete requests may result in the first request getting a 200 (OK) and the second request getting a 204 (NO CONTENT).If you create a new resource with PUT, and then resend the PUT request, the resource is still there and it has the same properties you gave it when you created it. Making an absolute update to a resource’s state or deleting it outright has the same outcome whether the operation is attempted once or many times.Again a PUT request can have differing return codes based on the validation done. It can be a 200 (OK) for a successful PUT or a 409 (Conflict) for a PUT where the server resource state is different from the one referenced by the client. A GET or HEAD request should be safe: a client that makes a GET or HEAD request is not requesting any changes to server state. Making any number of GET requests to a certain URI should have the same practical effect as making no requests at all. The safe methods, GET and HEAD, are automatically idempotent as well.POST is neither safe nor idempotent. Making two identical POST requests will probably result in two subordinate resources containing the same information.

Developing API’s requires us to adhere to the REST semantics which specifies the safety and idempotency requirements for the various verbs as shown in the table above. API consumers will and should expect GET to be safe and idempotent. Similarly API consumers will incorporate logic to manage additional factors since POST is neither safe nor idempotent.

Rest - Verbs and Status Codes

Rest – Communicating with Verbs and status codes

Overview of HTTP

In my previous post  we talked about REST resources, Identifiers and Representations. In this post we move forward with how we can connect the various resources and their representations through their identifiers using the HTTP protocol.

In a RESTful system, clients and servers interact only by sending each other messages that follow a predefined protocol.The REST architectural style is primarily associated with designs that use HTTP as the transport protocol.  Even though we always associate the web with HTTP, it is but one of the long lived and dominant protocols. Protocols such as FTP, Gopher, Archie, Veronica, Jughead , prospero etc. were part of the ecosystem but gave way to HTTP as it began to emerge as the dominant protocol. Some of the goodness of these protocols were also folded into the HTTP specification.

HTTP is an open protocol that defines a standard way for user agents to interact with both resources and the servers that produce the resources.It is an application-level protocol that defines operations for transferring representations between clients and servers .HTTP is a document-based protocol, in which the client puts a document in an envelope and sends it to the server. The server responds by putting a response document in another envelope and sending it back to the client. As an application protocol, HTTP is designed to keep interactions between clients and servers visible to libraries, servers, proxies, caches, and other tools. Visibility is a key characteristic of HTTP. When a protocol is visible, caches, proxies, firewalls, etc., can monitor and even participate in the protocol. Features like Caching responses, automatically invalidating cached responses, Detecting concurrent writes and preventing resource changes,  Content negotiation and Safety and idempotency depend entirely on keeping requests and responses visible. In the HTTP protocol, methods such as GET, POST, PUT, and DELETE are operations on resources

HTTP has strict standards for what the envelopes should look like but is not concerned about what goes inside it. When i hit the web site http://pradeeploganathan.com the below request is generated ( we can see this using the chrome dev tools –F12 on the chrome browser)

Request

GET /2016/09/10/git-basics/ HTTP/1.1
Host: pradeeploganathan.com
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

The first line in the above request message describes the protocol (HTTP1.1) and the method GET) used by the client. The next few lines are request headers. By inspecting these header attributes and lines , any piece of software that understands HTTP can formulate not only the intent of the request but also how to parse the body of the message.

Major parts of the HTTP request.

The HTTP method –  In this request, the method is “GET.” also called the “HTTP verb” or “HTTP action.”.User agents only interact with resources using the prescribed HTTP verbs.

The path – In terms of the envelope metaphor, the path is the address on the envelope.In the above request the path is /2016/09/10/git-basics/ with the hostname header completing the path.

The request headers – These are metadata: key-value pairs that act like informational stamps on envelope. This request has the headers: Host, Connection, User-Agent, Accept, and so on.

The entity-body/document/representation – This is the document that inside the envelope also called as payload.

The HTTP response is also a document in a envelope. It’s almost identical in form to the HTTP request.

Response

HTTP/1.1 200 OK
Date: Tue, 20 Sep 2016 02:36:14 GMT
Server: Apache Phusion_Passenger/4.0.10 mod_bwlimited/1.4 mod_fcgid/2.3.9
X-Powered-By: PHP/5.3.29
Vary: Accept-Encoding,Cookie
Content-Encoding: gzip
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

Major parts of the HTTP response.

The HTTP response code – This three digit numeric code tells the client whether its request went well or poorly, and how the client should regard this envelope and its contents.In addition to verbs, HTTP also defines a collection of response codes, such as 200 OK, 201 Created, and 404 Not Found. verbs and status codes provide a general framework for operating on resources over the network. Detailed information on response codes and verbs are below.

The response headers – Just as with the request headers, these are informational stickers on the envelope. This response has header data such as Date, Server, Content-encoding and so on.

The entity-body or representation –  This is the document inside the envelope containing any response from the server.The entity-body is the fulfilment of the GET request and contains the representation requested.

A defining feature of RESTful architecture is its use of HTTP response codes. Since the response code isn’t part of the document or the metadata, the client can see whether or not an error occurred just by looking at the first three bytes of the response. HTTP response codes are underused on the human web but are key on the programmable web where API’s can take different paths based on the response codes. There are 39 official HTTP response codes but only about 10 are used frequently in daily use.

The status codes are divided into five categories based on the intent to be communicated.Each category has detailed status codes representing the intent in detail.

  • 1xx: Informational – Communicates transfer protocol-level information
  • 2xx: Success -Indicates that the client’s request was accepted successfully.
  • 3xx: Redirection – Indicates that the client must take some additional action in order to complete their request.
  • 4xx: Client Error – This category of error status codes indicates issues with the clients request.
  • 5xx: Server Error – The server takes responsibility for these error status codes.

A HTTP response can return various status codes as below

  • 200 OK – The request went fine and the content requested was returned. This is normally used on GET requests.200 (OK) must not be used to communicate errors in the response body
  • 201 Created -The resource was created and the server has acknowledged it. It could be useful on responses to POST or PUT requests. Additionally, the new resource could be returned as part of the response body.
  • 204 No content . The action was successful but there is no content returned. Useful for actions that do not require a response body, such as a DELETE action.
  • 301 Moved permanently . This resource was moved to another location and the location is returned. This header is especially useful when URLs change over time (maybe due to a change on version, a migration, or some other disruptive change), keeping the old ones and returning a redirection to the new location allows old clients to update their references in their own time.
  • 400 Bad request . The request issued has problems (might be lacking some required parameters, for example).The server cannot or will not process the request due to something that is perceived to be a client error A good addition to a 400 response might be an error message that a developer can use to fix the request.400 is the generic client-side error status, used when no other 4xx error code is appropriate.
  • 401 Unauthorized . Especially useful for authentication when the requested resource is not accessible to the user owning the request.A 401 error response indicates that the client tried to operate on a protected resource without providing the proper authorization. It may have provided the wrong credentials or none at all.
  • 403 Forbidden . The resource is not accessible, but unlike 401, authentication will not affect the response. This indicates that the server understood the request but refuses to authorize it.A 403 response is not a case of insufficient client credentials; that would be 401 (“Unauthorized”). REST APIs use 403 to enforce application-level permissions.
  • 404 Not found . The URL provided does not identify any resource. A good addition to this response could be a set of valid URLs that the client can use to get back on track (root URL, previous URL used, etc.). If a representation or resource is permanently moved or deleted then a 410 GONE status is a preferred way of indicating the same to the client.
  • 405 Method not allowed . The HTTP verb used on a resource is not allowed. For instance doing a PUT on a resource that is read-only.
  • 500 Internal server error –  A generic error code indicating that the server encountered an unexpected condition that prevented it from fulfilling the request.. Normally, this response is accompanied by an error message explaining what went wrong.

The Uniform Interface

HTTP provides a uniform interface to operate on resources representations. These methods are standardized and provide expected results across all implementations.The HTTP standard defines eight different kinds of messages.The uniform interface makes services similar across the web. All clients know what a GET on a resource would result in and this knowledge is implicit in the uniform interface.

GET  -The HTTP standard says that a GET request is a request for a representation. It is intended to access a resource in a read-only mode and not change any resource state on the server. GET is probably the most commonly used and well-known verb. GET is a safe method. The infrastructure of the Web strongly relies on the idempotent and safe nature of GET. Clients count on being able to repeat GET requests without causing side effects. A GET for a particular URI returns a copy of the resource that URI represents. One of the most important features of GET requests is that the result of a GET can be cached. Caching GET requests also contributes to the scalability of the Web. Another feature of GET requests is that they are considered safe, because according to the HTTP specification, GET should not cause any side effects—that is, GET should never cause a change to a resource. A GET request is bot idempotent and safe.The most common response code to a GET request is 200 (OK). Redirect codes like 301 (Moved Permanently) are also common when the undelying resource location has moved.

POST  – Normally used to send a new resource into the server (create action).POST, which indicates a request to create a new resource, is probably the next most commonly used verb.  A post is generally used to create a subordinate resource which are resources which exist in relation to a parent resource. An example of this is using a post to create a customer record where the individual customer record is a subordinate of the customer table or using post to create a single blog entry where it is a subordinate to the blog posts. A POST is used to create or append a resource identified by a service generated URI. POST is not an idempotent operation. The POST method is a way of creating a new resource without the client having to know its exact URI. The common response to a POST is a ‘201 created’ response with the location header specifying the location of the newly created subordinate resource. Another common response code is 202 (Accepted), which means that the server intends to create a new resource based on the given representation asynchrounously at a later point of time.A post is neither idempotent nor a safe call.Multiple calls to post with the same representation will create multiple copies.

PUT  –  A PUT is used to update a resource using a URI computed by the client. PUT is an idempotent operation and expects the whole resource to be supplied rather than just the changes to the resource state.This guarantees that if you use PUT to change the state of a resource, you can resend the PUT request and the resource state won’t change again.PUT encapsulates the whole state of the representation ensuring that if there is any failure due to transient network or server error the operation can be safely repeated. PUT replaces the resource at the known url if it already exists. If you make a PUT request and never get a response, just make another one. If your earlier request got through, your second request will have no additional effect.

DELETE  – Used to delete a resource. On successful deletion, the operation returns a HTTP status 200 (OK) code. A DELETE operation is idempotent. However a repeated call to delete a resource should result in a 404 (NOT FOUND) or better still a 410(GONE) since it was already removed and therefore is no longer available.

HEAD  – Not part of the CRUD actions, but the verb is used to ask if a given resource exists without returning any of its representations. HEAD retrieves a metadata only representation of the resource. HEAD gives you exactly what a GET request would give you, but without the entity-body. HEAD returns the full headers, so we can do a LastModified/ContentLength check to decide if we want to re-download a given resource. HEAD can be used for existence and cache checks.

OPTIONS – Not part of the CRUD actions, but used to retrieve a list of available verbs on a given resource (i.e., What can the client do with a specific resource?).

PATCH – Modify part of the state of this resource based on the given representation. PATCH is similar to PUT, but allows for fine-grained changes to the resource state.PATCH is neither safe nor idempotent.PATCH is not defined in the HTTP specification. It’s an extension designed specifically for web APIs, and it’s relatively recent (RFC 5789).The PATCH method returns response code 200 (OK) or 204 (No Content) if the resource modified does not exist.

The above methods define the protocol semantics of HTTP. Understanding the HTTP verbs, status codes and request response messages are key to defining restful architectures successfully on the HTTP protocol . We now have an understanding of the basic plumbing involved in RESTful architectures and in the next blog post will move onto defining REST constraint and hypermedia.

%d bloggers like this: