This article focuses on building aWeb service architecture. In the previous article, you
learned how to create an overall architecture that included the basics of how to construct
aWeb service and its appropriate URLs. However, it didn’t explain the details of what aWeb
service does and why aWeb service should do what it does.
This article outlines the guidelines of how to develop a Representational State Transfer
(REST)-based Web service. Specifically, it covers the following points:
• Structuring aWeb service to solve a specific task
• Combining Web services to filter and modify data
• Combining Web services to create mashups
Problem
Let’s look at a problem within a stock-trading application as an example of a problem in building
Web services. The problem is that a number of clients need access to real-time historical
and order information.
Adding aWeb service front end allows you to leverage the middleware socket server without
making any changes to it. While it may seem like a waste of resources to add a front end to
a service in an overall Web service context, it makes sense here. This article will explore why.
Solution Part 1
The implementation of the solution involves taking one of the already existing socket technologies
and using that technology to build aWeb service. For illustration purposes, this
article uses a single technology Java. You could choose .NET or C++ the actual technology
is not important, because the exposed Web service can be consumed by any technology that
is Web service–aware. The initial approach to building the Web service is to define thegeneral
operations and then implement those general operations using some technology.
In the high-level view of the trading architecture, a class type called TradeServlet that
implements a Java servlet provides the Web service. For those not versed in Java technologies,
a Java servlet is a way of implementing an HTTP handler. TradeServlet executes the historical
requests, real-time data requests, and order requests using an interface named ITrader.
From a programming perspective, using an interface is the correct approach because it
allows you to use the Bridge pattern. The Bridge pattern lets you decouple the intention of
trading from the implementation of trading. In the high-level view of the Web services architecture,
the type ProviderTrader1 implements the calling of the appropriate functionality via
the socket layer. The Bridge pattern theory allows you to use a socket call today as a stopgap
solution, but tomorrow replace ProviderTrader1 with a new implementation (such as replacing
the middleware with the Web service) without having to change the implementation of
TradeServlet or the interface ITrader.
The class ProviderTrader1
implements the ITrader interface and provides a bridge from the trader-neutral subsystem to
the socket-based trading system. Another provider could be used to access a legacy database.
Regardless of the number of providers, the Web service interacts with the trader-neutral subsystem.
And the trader-neutral subsystem interacts with the socket-based trading system.
In theory, this approach is sound, but it suffers from being too complex. The problem of
the approach is a question of focus. In the example of the trader subsystem, the focus is the
subsystem, and the Web service layer is an add-on to the trader subsystem. In theory, the Web
service add-on is not even needed, because the trader subsystem manages everything. In the
context of an Ajax Web service application, this is the wrong approach because the focus is the
Web service, and it should not be an add-on.
Solution Part 2
The solution to the trader subsystem is to neither define nor implement a trader-neutral subsystem
using a specific technology. The trader subsystem is converted
into a series of Web services that you can assemble into a subsystem. AWeb service–based
trader subsystem still requires an interface, but the interface is defined at the HTTP Web service
level. In architectural terms, the ITrader implementations converted into Web services.
The architecture relies on reusing the already existing implementations
of ProviderTrader1 and ProviderTrader2 directly from the class TradeServlet. Each
implementation provides a set of methods, properties, and result sets that the trader-neutral
subsystem defines. Using aWeb service, the methods, properties, and result sets are converted
into something that is Web service–compatible. Then at a higher level, another technology
assembles the Web services into a trader subsystem.
Implementing the Trader-Universal Web Service
The trading Web service is an example of what to expect when implementing a complete Web
service solution. This article covers these remaining pieces:
• Defining the URLs
• Identifying the formats that can be sent and received
• How to support relative URLs
Defining the URLs for the Trader Application
Let’s continue with the evolution of the trader Web service and outline the important pieces,
namely the URLs and supported data formats. The trader application exposes the following
base URLs:
• /services/controller
• /services/realtime
• /services/orders
• /services/orders/trader123/order345
• /services/historical
/services/controller is the base URL used to manage the engine on the server side.
The controller URL lets you reset the code behind the Web service manually. For example, in
the case of the trader application, /services/controller would connect the Web service
implementation to the socket that provides the trading interface.
You could use the URL /services/controller/start to start the server code, and you could
use the URL /services/controller/stop to stop the server code. However, this approach
wouldn’t be usable, because it would seem that the identifiers start and stop are resources,
which they are not. Compare it to a light switch. A light switch is a single resource that has two
states: on and off. It does not have two resources, one for each state.
You use query parameters to start and stop the server code. To start the server code, you
execute the verb POST on the URL /services/controller with the computer graphics interface
(CGI) parameters action=start. To stop the server code, the URL remains the same, as does
the verb, but the CGI parameters change to action=stop. The verb POST is appropriate, because
you’re executing some server-side process, and what the process should do depends on
the data sent. Calling the verb GET on the URL /services/controller returns the status of the
server code.
If you’d like to control multiple pieces of server code in your application, then create child
URLs such as the following: /services/controller/code1 and /services/controller/code2.
The guidelines for starting, stopping, and retrieving the status of the individual server code
pieces remain the same.
Often, server-side code pieces require configuration directives, such as the location of the
base directory, how many threads to start, and so on. These configuration directives are typically
stored in a configuration file. You should be able to specify these directives when the server code
starts or stops. For example, if you want to specify a thread count, then you could use the CGI
parameter action=start&threadcount=12 for starting the server code.
If you have the ability to define configuration directives, then they can be queried and
retrieved when the status of the server is requested. If you want to query the individual
value of a server variable, you could filter it using query parameters such as /services/
controller?status=threadcount+uptime.
You use the base URL /services/realtime to manage the real-time stock-ticker data. For
example, if you’re interested in the ticker GM, you’d use the URL /services/realtime/GM to
retrieve the real-time information. It would seem that this base URL is the simplest, but the
simplicity is misleading.
For example, if users execute GET on the URL /services/realtime, what is returned? This
is a tricky question, because you’re approaching the limits of a piece of software. From a theoretical
perspective, calling GET results in the return of real-time data for all tickers. This sounds
good in theory, but it’s completely impractical. There are literally thousands of stocks on multiple
exchanges. Getting all tickers in one request in real time using a GET request is practically
unfeasible.
This is an example of a URL where the theory and practice are in conflict. The solution
doesn’t support the root URL as a reference to real-time data. The root URL is used to return
a list of all available real-time links. The root URL is not used to indicate what the real-time
information is because doing so would require following thousands of tickers. The root URL
will return links to where you can retrieve real-time data. This could mean returning links to
thousands of stock tickers. You could also use the root URL to return both the link to the ticker
and an abbreviated corporate description. This would help in building a search engine, as
most people don’t know the ticker but do know the name of the corporation.
Delegating the root URL to individual URLs creates a problem in that the Web service
cannot manage the real-time feeds for all stock tickers on all exchanges.
To put it simply, you
cannot track all stocks on a single computer. Tracking all stocks requires massive amounts of
horsepower that this article won’t get into. The only solution is to use a track-if-asked solution.
In a track-if-asked solution, no stocks are tracked initially for real-time data. Real-time
data will be tracked only if an HTTP GET is executed on a particular stock. An HTTP POST or
DELETE or PUT makes no sense on the real-time feed, because a real-time feed comprises data
that goes from the server to the client. The server is not interested in any information from the
client other than which ticker to generate real-time data for. If a verb other than GET is executed,
the server will generate an HTTP 500 error.
/services/orders specifies the root URL for order processing. In the context of stocks,
order processing makes use of all HTTP verbs. You use HTTP POST to submit an order, HTTP
PUT to modify an order, HTTP GET to retrieve the status of an order, and HTTP DELETE to delete
an order.
Each order will be represented as a unique identifier for example, /services/orders/1232445.
The unique identifier doesn’t have to be numeric, but it can be alphanumeric or even amore
complicated Globally Unique Identifier (GUID).
The root URL can be the host of many orders, which could literally mean millions of
orders. For the order URL, it’s important that you have the ability to filter orders according to
a specific status. You might be tempted to organize orders according to a date, but I would
advise against that. Whenever you’re creating a root URL, the data in the collection should be
accessible in its natural form. In a blog application, it’s natural to organize by date. However,
the natural order of a stock application is not by date but rather by order ID. Thus, the root
orders URL will literally have millions of orders associated with it. If an application happens
to ask for all orders, the server will need to give all of those orders. In the case of a SQL database,
if a table has millions upon millions of records, and somebody executes the query select
* from table, the database won’t ask, “Are you sure about this?” The database will go ahead
and select all of the records, even though it might not be efficient.
You create filters to optimize access to the orders. For example, if you want to find all
orders in 2006, you could execute the URL /services/orders/?year=2006. You could also convert
the query parameters into a view URL, such as /services/orders/2006. Whether you use
the query parameter or the view URL approach depends on your preference.
There is one filter that will prove problematic, and it relates to users. In any order system,
you have multiple users. A stock-trading application is no different. What makes a stock-trading
order application more complicated is that an order is not fulfilled automatically. It might not
ever be fulfilled, and it might even be canceled. If an order system doesn’t have the ability to filter
per trader, you could potentially run into a situation where one trader might open a position
and another trader closes a position.
In theory, you could buy and sell a future stock at the same time (called wash trading).
By buying and selling at the same time, you are neither gaining nor losing, at least other than
your brokerage fee. This technique of buying and selling at the same time through two different
brokers is illegal, because it makes it seem like there is action on a position when in fact
there is not. Therefore, traders are tied to their orders, and the orders are tied to their traders.
A logical refinement of the orders URL would be /services/orders/[trader]. This refined URL
illustrates that sometimes you have to create URLs that fulfill other needs, like, in this case, the
legal department’s needs.
With this refinement of the URL, does the root URL /services/orders become obsolete?
Everything, including the query parameters and the view URLs, still applies. The difference is
that the URL to access the order information will contain the unique identifier of the trader.
Assuming that you’re going to use the refinement to the URL, let’s go through what the
individual verbs will do at the different URL levels. At the root URL level (/services/orders),
only the HTTP GET applies. At this level, you can only filter out the orders you want to see. You
cannot POST, because the root URL is missing the trader ID, and you cannot PUT, because the
root URL is a collection URL. Finally, you cannot DELETE, because that would cause the deletion
of all the orders and the traders.
One level down, you would have the root URL for an individual trader (/services/orders/
trader-abc). At the root URL for an individual trader, only the HTTP GET and POST apply. You
would use GET to retrieve and potentially filter all trades that a trader has made. For example,
you could filter for trades made in a particular month, year, or day. You could use the URLs
/services/orders/trader-abc/2006 or /services/orders/trader-abc?year=2006. The HTTP
POST verb applies, because it allows the users to submit an order without an order ID. The
submission of the order returns the URL where you can retrieve the status of an order. If an
HTTP POST to the URL /services/orders/trader-abc is sent, the URL /services/orders/
trader-abc/123456 could be returned.
Applying the verb DELETE at the root URL is a bit tricky because of what the verb means.
If you were to apply the DELETE verb, it would delete all of the orders at the root URL. Practically
speaking, this is very ill-advised. One reason to support the DELETE verb is to be able to
delete items selectively via a query parameter that acts as a filter. For example, to delete all
orders in a year, you could use the URL /services/orders/trader-abc?year=2006. Notice the
URL used to selectively delete is the same as the URL used to selectively select. The difference
is the verb (DELETE vs. GET). It is a common occurrence that URLs will match but exhibit different
behavior depending on the verb. In the context of the trading system, deleting orders
would have restrictions. If an order is executing, you cannot delete the order.
The remaining verb PUT is for the most part not applicable at the root URL level. You use
the verb PUT to send a complete representation of the resource to the server. In the case of the
root URL, this means sending all orders to the server. The problem with sending orders to the
server is that you cannot send complete orders. The order is complete, but the order identifier
(calculated at the time an order is posted) is missing. Thus, you cannot use PUT to send a new
order to the server.
Another reason for using PUT would be to modify an existing order. In general, this is
a legitimate use, but it is incorrect in the context of the trading system. What happens if you
attempt to modify an order that is executing currently? There is no simple recourse, and thus
in the context of the order system, modifying an order can cause more problems than solutions.
The appropriate trading solution is to delete the order and create a new one.
/services/orders/trader123/order345 represents a URL referencing the order resource.
In general, you can apply all HTTP verbs, but you would have to create limits to reflect business
processes. In the case of the trader application, you could not use the PUT verb on a new
order, because the order application does not allow you to determine an order ID ahead of
time. You also cannot PUT an existing order, because that would mean modifying the order,
and in the context of a trading system, an order can either execute or be canceled. You could
apply and use the DELETE verb to define a cancellation of the order. A POST to an order would
only make sense if the POST represents an order that is a cancellation. A GET would be used to
retrieve the execution status of an order.
/services/historical represents a root URL used to retrieve the historical data from the
middleware. Getting a historical feed is unique in that there is only one applicable verb
namely, GET. The word historical implies something that already happened, and you cannot
rewrite history. Rewriting history would occur if you attempt to use the PUT or DELETE verb.
A POST would apply if you use the POST to create a sophisticated query. For example, you can
use a POST to scan and filter historical data according to a set of criteria.
To make the historical Web service as effective as possible, you need the ability to define
sophisticated queries. REST is not equipped to do that, because REST relies on the HTTP protocol.
This is not to say that you cannot use REST to query the data, but that you need to write
the plumbing. For example, say you want to find all stocks that traded in a specific range for
five days out of 10. You would need to code this sort of query in the form of a REST call that
delegates to a relational query, assuming that the data is stored in a relational database.
Another approach is to use an XML-based database, though you’d need to decide this
ahead of time. The advantage of storing your data in an XML-based database is that you can
easily map the HTTP queries to the XML hierarchy. Using XML Query Language (XQL) and
XPath on the XML database, you can easily execute sophisticated queries without having to
write the plumbing. You need to remember that the power of a historical Web service lies in
how you implement the queries.
What Format of Data to Send?
Thus far, all of the example Web services have been explained in terms of URLs, but not
in terms of the content that is accepted and generated. In the case of the blogging application
introduced in Article 4, the Web service generated Atom data using the MIME
type application/atom+xml. When building REST Web services, the MIME type is important,
because it determines how the data is received and sent.
In the case of the blogging application, if the Atom URL is called, it will generate an XML
stream. In theory, the REST development strategy is to create aWeb service that is technologyneutral
and will generate the right content for the right query.
|