In the entire solution, the simplest part is the implementation of the application logic layer. It
is trivial and is illustrated by the following Python code:
def current( req, cache, urlComponents) :
req.content_type = configuration.Atom.mimeType
conn = MySQLdb.connect (host = configuration.Database.server,
user = configuration.Database.user,
passwd = configuration.Database.password,
db = configuration.Database.db)
cursor = conn.cursor ()
cursor.execute ("SELECT * from entries order by post_date desc")
row = cursor.fetchone()
generateHeader( req, row)
generateEntry( req, row)
row = cursor.fetchmany( configuration.Blog.entryCount - 1)
for entry in row :
generateEntry( req, entry)
generateFooter( req)
Don’t get too excited by the code’s simplicity, as the code provides an infrastructure of
methods that are called. The method illustrates what aWeb service does for the most part,
which is accept data, process it, and persist it. You will at times implement algorithms that
perform some type of calculation. During the processing, business rules are applied that
process the sent and received data. For the scope of this solution, the data is pushed and
pulled with very little in-between processing.
An external process drives the application logic, which in the case of the solution is a
browser. The application logic is triggered by a number of requests that are defined using URLs.
There are other ways of creating Web services using other technologies, but this article’s focus is
on using Representational State Transfer (REST). Using REST implies designing URLs and using
the HTTP protocol. From the perspective of the Ajax client, REST is a perfect protocol.
Let’s start the application design process by correlating what the sample source code is
referencing. The name of the function in the sample source code is current, and the last n
blog entries are returned. When using a blog reader or an Ajax client, you will want to see the
current blog entries, therefore the simplest approach is to associate the URL http://myserver.com/
to the function current.
Ignoring the correctness of the URL for now (I discuss that in the next section), a valid
question is, how does the server know to cross-reference the URL with a specific functionality?
By default, when an HTTP server executes the URL processor, the URL is mapped to a file.
If the request was [http://myserver.com]/dir/file.html, then the HTTP server attempts to
find the file [base directory]/dir/file.html. If the file is found, the file extension processor
is loaded, which in the case of .html happens to be a static file processor. (If the extension is
.php or .aspx, then the PHP or ASP.NET processors are executed and generate content based
on the instructions in the file.) From a REST perspective, this is the wrong URL processor algorithm.
It is an easy algorithm, but it is the wrong algorithm, at least from the perspective of REST.
Understanding Why We Want REST
From the perspective of REST, all URLs represent resources on the server side; however, while
a file is a resource, it is not a resource from the perspective of the application. This is very a big
distinction that must be understood: REST URLs are application-specific resources. Using an
application-specific resource means that you are exposing functionality based on the business
logic, not technology. Using REST, you can separate the resource from the implementation, much
like interface- or contract-driven development.
To illustrate the separation of the resource from the implementation, consider the following
C# code:
interface IBase {
void Method();
}
class Implementation1 : IBase {
public void Method() { }
}
class Implementation2 : IBase {
public void Method() { }
}
The IBase interface defines a method and is implemented by two classes, Implementation1
and Implementation2. This process is referred to as interface-driven development, because when
the client uses either implementation, the client doesn’t use the actual implementation but the
interface of the implementation, as illustrated by the following source code:
class Factory {
public static IBase Instantiate() {
return new Implementation1();
}
}
class UseIt {
public void Method() {
IBase obj = Factory.Instantiate();
// ...
}
}
In the example source code, the Factory class has a static method, Instantiate, that
creates an instance of IBase by instantiating Implementation1. In the UseIt.Method class, an
instance of IBase is instantiated by calling the Factory.Instantiate method. The UseIt class
has no idea whether Implementation1 or Implementation2 is instantiated, and it uses the
interface as defined by IBase, expecting the interface methods to be implemented correctly.
When using dynamic languages, you use duck typing, and the defined contracts result in
implied functionality.
Let’s relate interface-driven development to URLs and separate the resource from the representation.
The resource is the interface, and the representation is the implementation. Currently,
most Web technologies bind together the resource and representation or use implementations
directly, as the URLs http://mydomain.com/item.aspx and http://mydomain.com/item.jsp illustrate.
The direct bindings are the .aspx and .jsp extensions, and the proper interface-defined
URL would have been http://mydomain.com/item.
Ironically, all Web technologies implement the separation of resource from representation
for the root URL /, as illustrated by the following HTTP conversation. (Note that the conversation
has been abbreviated for explanation purposes.)
Request
GET / HTTP/1.1
Host: 192.168.1.242:8100
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O;
en-US; rv:1.7.8) Gecko/20050511
Response
HTTP/1.1 200 OK
Server: Apache/2.0.53 (Ubuntu) PHP/4.3.10-10ubuntu4
The execution flow of a REST URL processor is simple in that, to determine which user
code to execute, we simply dissect the URL. Based on the structure of the URL, the appropriate
server-side code handler is called. If the client calls the URL /blog/entries/current,
the tokenized URL is blog, entries, and current. In case of the blog software, the Python
handler requires at least three pieces of the URL that are translated into the Python call
[module namespace].[module].[function]. The URL called from the client is then translated
to blog.enteries.current(). Any URL pieces after the first three pieces are passed to the
called function to fine-tune the required information.
The rule of requiring at least three URL pieces is purely specific to my Python handler
framework. Your framework might need two, or five, or whatever number of URL pieces it is
an arbitrary number. In the example blog software, the algorithm used to cross-reference the
URL to the custom code used convention over configuration techniques. Yet there is nothing
wrong with using a lookup table to cross-reference certain pieces of the URL with a piece of
custom functionality. Again, how you cross-reference the URL pieces to the custom functionality
is up to you.
When implementing your own REST URL processor, the actual implementation will vary
according to the technology used. There is no common theme, but there are two ways to
implement a REST URL processor:
• Find a way to associate a base URL with a specific handler. For example, I tend to use
/services as a base for all my Web services. The server needs to support the notion that
whenever the requested URL starts with /services, a specific handler is called. The specific
handler called is the REST URL processor that then makes the handler call.
• If you cannot associate a base handler for a base URL, then you need to write an HTTP
filter. The difference between an HTTP filter and an HTTP handler is that a filter is called
before a handler. The idea of HTTP filters is to enable user code to perform certain
common steps on all requests.
A common example is authentication. Using an HTTP
filter, you have the ability to define which handler is called. In the context of the REST
URL processor, the processor would be embedded as a last step after all of the other filters
have executed. It is the last step because you want steps such as authentication to
execute on the requested URL, and not the redirected execution URL.
When implementing your REST URL processor, you must remember that only the URL
determines which functionality is called. You must not use an HTTP cookie as part of your
decision. As outlined later in this article, using HTTP cookies to decide which functionality
to execute is very bad design. For the scope of this solution, this is all I’ll discuss regarding the
REST URL processor.
Implementing the URLs
Assuming you have implemented your own REST URL processor, the next step is to define the
URLs used. In this section, I outline the URLs used in the blog application and explain them in
such a way that the explanation can be used in a general nature. For all of my URLs, there are
a minimum of three pieces, and that is particular to the REST URL processor I’m using don’t
think you must use the same number of pieces.
The two base URLs are /blog and /services/blog. You need these two base URLs because
you are serving two different types of content: static HTML files and Web service content. In
theory, you could use only one base URL and have the Web service generate everything, but
I am wary of doing that because it complicates the implementation.
Let’s step back and think about why having a single URL complicates the implementation.
When implementing an Ajax SOA application, you have a client coder and a server coder. The
client coder does not do any server programming and does not want to; it wants to focus on
the client side of things. Thus the client-side programmer has to be able to do everything he or
she wants with static files served by an HTTP server. If the programmer needs aWeb service to
serve static files, then he or she would be dependent on the server programmer, and that
dependency is not desired. By having two base URLs, the client programmer can do what
he or she deems appropriate, and the server programmer can do what he or she deems appropriate,
independent of one another.
Note There is no reason you could not have a base URL such as /blog and then subdivide that base
namespace to /blog/static and /blog/services. The point to remember is that you have two URL
namespaces: one for the client side and one for the server side.
Because I use mod_python for the server side and Apache HTTPD for static content, it makes
sense for my Web services to all be Python based. Thus, I have a base Web service handler and
specific Web services, such as the blog software, all implemented as a Python namespace. The
base client-side URL for all applications is /, and the base server-side URL for all applications is
/services. With other architectures such as ASP.NET, the base URL would be /blog, and then that
URL could be further subdivided.
I am not going to focus on the client-side URLs, because they are driven by the Web
service URLs for the scope of the blog application. For example, if you had the entry URL
/services/blog/entries/archive/2006, there would be an appropriate static file URL
/blog/entries/archive/2006.
Before I explain the nature of the URLs, let’s review the four common HTTP verbs. Usually
you use two HTTP verbs, GET and POST, often for the same purpose. For example, an HTML
form can post its data using either GET or POST. From a REST perspective, it is bad practice to
use GET to send data to the server.
I explain how to use each HTTP verb in the following list. The best way to understand the
individual verbs is to think of them as instructions much like SQL commands. The difference
between the HTTP verbs and SQL commands is that SQL manipulates tables and rows, and
HTTP manipulates resources associated with URLs.
• DELETE (SQL equivalent delete from): A rarely used verb that is used to delete a resource
on the server side. For example, if the DELETE verb is used for the URL /services/blog/
archive/entries, the result is that all blog entries on the server are deleted. If the URL
has query parameters associated with it (e.g., ticker=DELL&value=23), then all entries
that match the query parameters are deleted. In SQL-speak (and in terms of the delete
command), the query parameters are the SQL where parameters.
• GET (SQL equivalent select): A commonly used verb that is used to retrieve content
from the server. The specified URL retrieves the resources associated with the URL.
If there are any query parameters, a selection of items associated with the URL that
match the query parameters is performed. In SQL-speak the query parameters are the
where parameters associated with the select statement.
• POST (SQL equivalent stored procedures): A commonly used verb that is used to send data
to the server. It is important to consider an HTTP POST as a stored procedure. Things get
funny with an HTTP POST in interpreting the role of the URL. You could say that the URL
defines the resource that is manipulated, and the parameters define how to manipulate
the resource, but that is not the nature of a SQL stored procedure. The name of the stored
procedure does not impact which tables are manipulated. So another view of POST could
be to define a resource that manipulates other resources, and which resources are manipulated
depends on the implementation of an HTTP POST. Choose whichever definition
makes sense to you. I find an HTTP POST to be both too general and too specific to nail
down to a single idea. I personally choose the first solution, where the URL defines the
resource to manipulate, and which algorithm used to manipulate the resource depends
on the parameters. An HTTP POST can generate data even though it is not generally used.
In SQL-speak, stored procedures can generate results, even though for the most part you
would use the select command.
• PUT (SQL equivalent insert): A rarely used verb that is used to replace the content of
a resource. If the resource associated with URL does not exist, then it is created.
The URLs used by the application are described in the sections that follow.
/services/blog/entries/current
Specifically, this URL represents the last n entries of a blog. With the passing of time, the last n
entries change, and you can use a single URL to reference the latest and greatest information. In an
abstract sense, the URL represents a view on some data. The problem you will have in your application
is that people want a single URL they can use until the end of time. For instance, going back
to the blog example, if the month is 06, the day is 07, and the year is 2006, then to get the latest and
greatest blog entries, you only have to reference the appropriate year, month, and day URLs. The
problem is that nobody would do this, as it is too complicated and requires knowledge of how data
is organized. Another way to organize blog entries is to use incremental numbers or long values
that count the seconds since 1970.
By having a “view” URL, you create a reference to data organized by an embedded serverside
algorithm. The server-side algorithm is not obvious to the user, and it does not need to be.
Be very careful of allowing query parameters to select data from the view. The aim of the view
URL is to provide an easy-to-remember and -use URL. Allowing query parameters with a view
URL is silly because the same effect can be achieved using a resource URL, as you will see
shortly.
View URLs for the most part only accept HTTP GET. View URLs should not accept an
HTTP PUT or an HTTP DELETE because the data retrieved is a reference to another URL. If you
want to support an HTTP PUT or an HTTP DELETE on a view URL, you need to delete or replace
the logic associated with the URL, and not the data. It’s more difficult to determine whether
a view URL should accept an HTTP POST. It could be argued that an HTTP POST of view URLs
does not make sense because for the most part you cannot update data generated by a SQL
view. I counter that since a view URL contains some logic to extract the appropriate data, and
an HTTP POST contains logic, an HTTP POST could be used to insert data.
With regard to the blog application, POSTing to the URL /services/blog/entries/current
would have the effect of adding a blog entry at the time that it was posted. If you had to HTTP
PUT a new blog entry, the client would have to know how the server organizes the underlying
blog data. The blog application discussed in this article is organized by date, but it need not be.
/services/blog/entries/archive
This URL specifies the base root URL for all blog entries stored on the HTTP server. In an
abstract sense, all Web services have a notion of root resource URL. The purpose of the root
resource URL is to define a main entry point to all of the resources in the Web service. Think
of this URL as the index.html of your Web service. This is not to say that some URLs are disconnected
from the base root URL hierarchy (e.g., view URLs), but it does mean that an end
device can iterate all of the resources using the root URL.
The root URL is also an example of a collection URL. Collection URLs behave a bit differently
than regular URLs. For example, consider the following two responses for the root URL.
Note that the URL must not return XML, but XML is used for simplicity.
<root>
<item href="/root/item1" id="item1" />
</root>
<root>
<item id="item1" url="/root/item1">
<data>embedded data</data>
</item>
</root>
There are two responses in the example code. The first response is a root element that
has as a child a single item element. The item element contains no child elements and has
two attributes, href and id. The second response is like the first, except that item has a data
child element, and there is no href attribute. Instead, in the second response, there is a url
reference. The difference between the two responses is the translation of what should be
returned when a collection URL is referenced.
A collection URL is a URL that itself does not contain any data, but serves as a reference
to a collection of data pieces. When the collection URL is referenced, the client can return
a set of URL references to the actual data or the actual data itself. Taking as an example the
blog application, the Atom format referencing the collection URL means to return all of the
data pieces. However, it is often impractical to return all of the data pieces, as the returned
data stream could be gigantic. To reduce traffic, link references are returned. But chasing our
tail again, various formats do not allow links. As a rule of thumb, return what is best suited for
your application. Regardless of how you return the data, be consistent. This means if all of
your collection URLs return data, then return data, and vice versa.
A root or collection URL for the most part will be called using the HTTP GET verb. There
will probably be query parameters to select specific entries, which when applied to URLs with
a large number of entries reduces the document length. For example, the view URL /services/
blog/entries/current could also be expressed as /services/blog/entries/archive?last=35.
The query parameter last is used to select the last n entries.
The HTTP verbs PUT and DELETE can apply if valid collection entries are added. While it is
possible to convert a collection URL to a data-resource URL, it makes sense to do so only if the
server does not dynamically generate the collection. For example, in the case of the blog software,
the collection is generated from a database. Executing PUT or DELETE does not make sense
unless there is logic that processes the data sent by the commands. And finally, it is possible to
process an HTTP POST if there is associated server-side logic.
/services/blog/entries/archive/2006/07/06
This URL specifies a data-resource URL that when referenced contains the data a user is interested
in. The data that is sent is determined by the client in the Accept HTTP headers (this is
discussed further in Article 5). A data-resource URL is capable of processing all of the HTTP
verbs (GET, PUT, POST, and DELETE), as explained in the previous sections.
When the URL is referenced, a piece of data is returned, but you have to ask yourself what
the format of the data is. Using the blog application as an example, the piece of data returned
must make it seem that the URL is referencing a collection URL, because the Atom format is
a single format that assumes every URL is a collection of blog entries, even though there might
be only a single entry.
So when you create data-resource URLs, keep in mind that although logically they are
a single piece of data, the format of the data could make the URL seem like a collection URL.
When the client executes an HTTP GET, the server will have no problem generating the data.
The server can get confused, however, if the client executes an HTTP PUT or an HTTP POST. The
data sent to the server might contain multiple pieces of data, even though the server expected
only a single piece of data. Thus, the logical solution is to generate an error saying that users
cannot post multiple entries. Another solution might involve looking through the list of entries
and posting the first entry. The problem with that strategy is that it is inconsistent with the
intent of the client. When the client sends data that contains multiple pieces of information,
they expect those multiple pieces of information to be saved. If the server saves only one
piece, the client is left wondering what went wrong. When an error is generated, the client is
not left wondering what went wrong.
Data-resource URLs need to be as specific as possible. You do not want any ambiguity arising
at some later point in time. For example, if today your blog application allows only a single
user, but at some point in the future it could allow multiple users, then add that functionality.
In the case of the blog application, that would mean /user/services. For the initial release of
the application, /user/services might be hard-coded and not relevant when processing the
URL. However, you have created a placeholder for the case when you have multiple users. I am
not saying you should compensate for every potential change in the future, because after all
you can use server redirection (e.g., http://user.server/services).
What you want to remember is that data-resource URLs will outlive your server, your technology,
and even your company. URLs are like pieces of real estate and are part of your brand
recognition, so you must choose them carefully. Once people know and associate a certain
URL with aWeb site or company, it is very difficult for those people to switch to using another
URL. For example, imagine if tomorrow Google decided to call itself ReallyCoolServer. The
company ReallyCoolServer would not immediately have the same impact or brand recognition
as Google has today.
/services/blog/entries/archive?delete=35 and
/services/blog/entries/archive/2005?past=35
The examples in this article show that URLs can have query parameters. However, both of
the URLs presented in this section’s heading are not examples of logic that has the same intention
only the second one is an acceptable URL. The first URL is not acceptable because it
implies changing the data.
You’ve previously used query parameters to perform a filtering operation of an HTTP GET.
Using query parameters in that context is acceptable because it does not change the underlying
nature of the data; you are specifying a filter. The filter could be used to convert the result
set from one language to another, and it may contain complex algorithms. But regardless of
the algorithm, there is no change to the data that is being filtered.
The first URL, /services/blog/entries/archive?delete=35, is different in that it uses the
word “delete,” and “delete” means to delete a record(s). Thus the query parameter will change
the underlying data and is not acceptable. Of course, there is an exception if the word “delete”
doesn’t mean delete in the sense of “delete from the data source,” but instead means delete
from the generated result set. Then the delete keyword becomes a filtering operation and is
acceptable. Changes to the underlying data are the result of executing the HTTP POST, PUT, or
DELETE verb.
The URLs used by the blog application are relatively generic and illustrate most of the variations
that you will encounter when building REST-based Web services. Overall, you should
remember that building REST-based Web services is like interacting with a database that supports
SQL. You have a number of HTTP verbs that can be used to add, manipulate, delete, and
retrieve data. How that data is managed is the responsibility of the REST Web service designer.
I am not going to walk through the server implementation because it is an application
issue. In the case of the blog application, you are manipulating blog entries, which have very
little business logic. For example, if you contrast a blog application with a mortgage application,
you can see that a mortgage application has quite a bit of logic, and it also has a type of
URL that does not contain data and is defined as follows.
/services/mortgage/calculate/payments
The defined URL is not a data-resource URL, but a “question in, answer out” URL. There is no
server-side data, or view of data, or collection of data. There is only a calculation; hence, the
only HTTP verb that can be used is HTTP POST. The other HTTP verbs do not make sense and
should not be used in this context.
|