Problem
You want a fix to turn toSource into a complete serialization solution.
Theory
Mozilla has developed a very clever method called toSource. Using toSource, it is possible to
serialize the state of an object into a buffer. Consider the following type declaration example.
Source: /website/ROOT/ajax articles/javascript/tosource.html
function DefinedClass() {
this.localvalue = 10;
this.localmethod = function(param) {
info("DefinedClass.localmethod", "called me");
}
}
The DefinedClass type has defined a localmethod method and a localvalue data member.
When the type is instantiated, the instantiated type has a toSource method that can be called,
as illustrated in the following source code.
Source: /website/ROOT/ajax articles/javascript/tosource.html
var cls = new DefinedClass();
cls.prototypemethod.value = 100;
info("mozilla_tosource", cls.toSource());
When toSource is called, the following buffer is generated:
{
localvalue:10,
localmethod:(function (param) {info("DefinedClass.localmethod", "called me");})
}
The generated buffer is a serialized form of an object instance. Missing from the serialization
is the type definition. When the object instance is re-created, recreated is the state of the
object. Yet this is not completely true, because the following prototype declaration is missed
by toSource.
Source: /website/ROOT/ajax articles/javascript/tosource.html
DefinedClass.prototype.prototypevalue = new StaticClass();
Anything that is declared via the DefinedClass.prototype property is missed by the
toSource implementation. Missing the base properties and methods makes sense if the toSource
buffer were to contain a reference to the type that created the instance. Yet there is no reference
with missing methods and properties, and no support on any browser other than Mozilla/Firefox.
So what is the use of toSource?
The toSource method by itself is limited, but the idea behind toSource is good. We want
the ability to serialize an object for later consumption, and as will be shown in later articles,
serialization is the key to implementing various object-oriented techniques, such as mixins.
As illustrated by Mozilla’s implementation of toSource, serialization can have different facets.
Before beginning an implementation of serialization, let’s identify the different contexts
of serialization:
• Plain vanilla serialization like toSource: The default serialization provided by Mozilla is
not available on other browsers. For those Web applications that do use toSource, there
needs to be an implementation for other browsers. Not serializing the prototype properties
is useful when you want to serialize the additional information and not the base
information.
• Full instance declaration serialization: A complete instance serialization is when all
methods, properties, and data members are converted into a buffer that, when executed,
will completely re-create the object. When the instance is re-created, its original
type information is lost.
• Instance state serialization: In contrast to other serializations, state serialization is the
generation of a buffer that contains only the state of an object; the function declarations
are not generated. The JavaScript Object Notation (JSON) protocol is an example
of an instance state–only serialization. When serializing the instance state, data members
defined in the prototype property are included.
• Variable assignment serialization: A full instance declaration serialization includes all of
the state of an object instance, but the function properties are missing because they not
used often. If a function property is used, then the serialized state has to be assigned to
a variable; otherwise, it is very difficult to assign function properties.
• Object-oriented serialization: Object-oriented serialization is an extension of the plain
vanilla serialization. The reason for defining an object-oriented serialization is to
enable the separation of class-specific data and instance-specific data. Using objectoriented
serialization, an object could be serialized and re-created with different
default behavior.
Each of the contexts is a specific flavor of serialization. The common feature among all of
the contexts is that they serialize the same information and filter out what is not necessary.
For example, plain vanilla serialization filters out any property referenced by prototype. State
serialization filters out all functions, but iterates properties referenced by prototype.
Solution
As a first step, we’ll create a general “serialize everything” implementation. The serialize
everything implementation will include filtering capabilities and output generation control
capabilities. Using the general serialization requires quite a bit of understanding of the serialization
process, so that fine-tuning is possible.
As a second step, we’ll implement the specific serialization contexts with the appropriate
filtering implementations.
Serializing everything means iterating everything that is stored in the object instance, and
then asking the caller if it is OK to serialize the information. From a high level, the serialize
everything function is implemented as follows. In the serialize everything functionality, two
pieces of functionality have been cut out for clarity purposes and are noted by the comments
// Removed for clarity.
Source: /website/ROOT/scripts/common.js
serialize : function(obj, callbacks) {
var buffer = "{";
var comma = function() {
comma = function() {
return ",";
}
return "";
}
var quoteProperties = "";
var canProcessFilter = function() { return true; }
var functionPropertyCallback = function() { }
var callingStack;
if (typeof(arguments[ 2]) == "undefined") {
callingStack = new Array();
callingStack.push("cls");
}
else {
callingStack = arguments[ 2];
}
if (callbacks) {
// Removed for clarity
}
for( property in obj) {
if (canProcessFilter(obj[property], obj, property)) {
switch (typeof(obj[property])) {
// Removed for clarity
}
}
}
buffer += "}";
return buffer;
}
The serialize function has two parameters, but for certain contexts (explained later) there
is a third parameter. The third parameter has been left off so as to not confuse those people
who want to use the function. The first parameter, obj, represents the object instance that is
serialized. The second parameter, callbacks, represents the customization methods that are
called when the data is serialized.
Until the loop is started using the for keyword, variables are initialized. They are defined
as follows:
• buffer: This variable is used to create the complete text that represents the serialized
object.
• comma: This variable uses the technique outlined in article 2-7 to determine whether
a comma is needed when creating the JavaScript serialized object format. As a reference,
each property declaration in a serialized object format (e.g., {prop1: true;,prop2:false})
is separated using a comma. The function implements a technique where the first time
it is called, no comma is necessary, but for every call thereafter, a comma is necessary.
Without using the technique in article 2-7, a decision block and flag would be necessary.
• quoteProperties: This variable denotes whether the buffer contains a double quote.
The quote is used when serializing the object to JSON format.
• canProcessFilter: This variable is a function callback that is called for each property
value found. The callback returns either true to serialize the property or false to ignore
the property. The callback has three parameters: property, the actual property reference;
obj, the object being serialized; and propertyIdentifier, the string identifier of
the property.
• functionPropertyCallback: This variable is a function callback that is called when iterating
the properties of a function. The properties of the function cannot be stored in
the buffer variable because the serialized format does not allow for the definition of
function properties. The properties of a function need to be assigned after the definition
of the serialized JavaScript object buffer. This is why the complete serialization of
a JavaScript object requires a variable definition.
• callingStack: To assign a function property embedded within another JavaScript object
declaration, you need the serialized object reference (e.g., variable.embeddedobj.
function.value). To create the reference, a stack is used, where each element in the
stack is an object reference.
After the declarations, the properties of the object are iterated (for(property...) in
a loop. Before explaining the details of the loop, I’ll cover the missing callback initialization.
Source: /website/ROOT/scripts/common.js
if (callbacks) {
if (callbacks.canProcessFilter) {
canProcessFilter = callbacks.canProcessFilter;
}
if (callbacks.functionPropertyCallback) {
functionPropertyCallback = callbacks.functionPropertyCallback;
}
if (callbacks.variablename) {
callingStack.pop();
callingStack.push(callbacks.variablename);
}
if (callbacks.quoteProperties) {
if (callbacks.quoteProperties == true) {
quoteProperties = "\"";
}
}
}
The caller of the serialization does not need to provide a value for callbacks. If no value is
provided, a default serialization of everything is assumed, except the function properties. The
function properties are not serialized because there is no way in the serialized JavaScript
object format to associate a function property with the function. More of the code will be
explained shortly.
The serialization has four callback functions:
• canProcessFilter: Used to determine whether the property can be serialized.
• functionPropertyCallback: Called whenever a function property is serialized.
• variableName: Represents the variable identifier used when a serialization to a variable
is generated.
• quoteProperties: Represents a value that when set to true generates quotes around the
property identifier. This is typically used when generating a serialization format for JSON.
Now that we’ve looked at the initialization details, let’s move on to examine the serialization
logic. The loop is responsible for serializing the object instance, and the details of the loop
have been abbreviated. At this stage, I’ll explain the overall strategy.
In JavaScript, each method and data member can be accessed on an object using the following
notation:
obj.datamember = ...
This notation is the most common way of accessing a method or data member when writing
source code. For serialization purposes, the notation is not useful because the programmer is
expected to know what the individual methods and data members are. For serialization purposes,
reflection is needed. Reflection in JavaScript is a two-step process:
1. The string value property identifiers are available using an enumeration and iterated
using a loop (e.g., for(property in obj)).
2. The actual property is accessed using an array notation, where the array is the object
instance and the index is the string value property identifier (e.g., obj[property]).
As each property is iterated, the serialization first queries if the property should be serialized
by calling the canProcessFilter callback function. If the property can be serialized, then
a switch statement is called that tests the type of the property. The typeof function returns six
different identifiers, of which five are of interest (we are not interested in undefined, as undefined
should not be serialized). The details of the switch statement are as follows.
Source: /website/ROOT/scripts/common.js
switch (typeof(obj[property])) {
case "boolean":
buffer += comma() + quoteProperties +
property + quoteProperties + ":" + object[property];
break;
case "function":
buffer += comma() + quoteProperties +
property + quoteProperties + ":" + obj[property].toString();
callingStack.push(property);
functionPropertyCallback( obj[property], obj, property, callbacks,
callingStack);
callingStack.pop();
break;
case "number":
buffer += comma() + quoteProperties +
property + quoteProperties + ":" + obj[property];
break;
case "object":
callingStack.push(property);
buffer += comma() + quoteProperties +
property + quoteProperties + ":" +
ops.serialize(obj[property], callbacks, callingStack);
callingStack.pop();
break;
case "string":
buffer += comma() + quoteProperties +
property + quoteProperties + ":" + object[property];
break;
}
In the implementation of the switch statement, the types number, string, and boolean
have a straightforward serialization implementation. The serialization of those types follow
the convention [property identifier] : [property value]. function and object are more
complicated.
When an object is encountered, then an embedded JavaScript object serialization occurs
and the ops.serialization function is called recursively. The result of the serialization is a property
value that is added to the buffer to be returned to the caller. The remaining parts of the
serialize function add a curly bracket to close off the serialization and return the generated
buffer to the caller.
The presented serialization is complete, and each of the contexts uses the serialization
function to generate its own generated buffer.
Let’s consider the implementation of the Serializer.toSource function, which mimics
the Mozilla toSource serialization. This means that any function or data member defined as
part of the prototype property is not processed. What is being asked is to determine whether
a property should be serialized using a filter. The complete implementation of
Serializer.toSource follows.
Source: /website/ROOT/scripts/jaxson/commons.js
Serializer.toSource = function(obj) {
return ops.serialize(obj,
{
currProcessedObject : null,
iterPrototype : null,
canProcessFilter: function(property, currObj, propertyIdentifier) {
if (this.currProcessed != currObj) {
GetPrototypeObject(currObj, function(prototype) {
this.iterPrototype = prototype;
});
this.currProcessed = currObj;
}
if (typeof(iterPrototype) == "object") {
for( prototypeIdentifier in iterPrototype) {
if (prototypeIdentifier == propertyIdentifier) {
return false;
}
}
}
return true;
}
});
}
In the implementation of Serialize.
toSource is a single method call, and it is to
ops.serialize. By default, ops.serialize will serialize everything, and that should be
avoided. To be able to distinguish between an instance property and a property defined by the
prototype property, the implementation of canProcessFilter has to figure out what properties
are associated with the instance. In the canProcessFilter method implementation is a reference to
GetPrototypeObject. GetPrototypeObject is a convenience function used to retrieve the prototype
property associated with the object. I cover the implementation of GetPrototypeObject
shortly.
For the moment, let’s focus on what happens in the filter. When ops.serialize is called,
it will iterate the properties of the toSerialize object. When a property is retrieved, the userdefined
filter canFilterProcess function is called. canFilterProcess has as a second parameter
the object to which the soon-to-be-serialized property belongs to.
In the canFilterProcess implementation is a reference to currProcessedObject. The reference
is necessary for performance purposes. An object to be serialized can reference other
objects that have a prototype property. Because objects can contain objects to be serialized,
you cannot retrieve the prototype property for obj and verify every property against obj. The
only solution is to retrieve the prototype property for every object that is passed to the
canFilterProcess implementation. If an object has five properties, the GetPrototypeObject
function is called five times, and that is a waste of resources. Thus, currProcessedObject is
used to cache the last used object reference and retrieve the prototype property only when
a new object instance is being serialized.
To verify if a property belongs to prototype, a loop is started that iterates the properties of the
prototype property reference (iterProperty). If a property from prototype (prototypeIdentifier)
matches the property to be serialized, then a value of false is returned, indicating that the property
should not be serialized. A returned value of true indicates that the property should be
serialized.
Before we move on to look at another context, let’s examine the implementation of
GetPrototypeObject, which is complicated because it involves the dynamic nature of JavaScript.
The problem with JavaScript is that you cannot figure out what the type of an instance is when
it has been instantiated. The only reference information you have is the constructor property,
which is the function used to instantiate the object. This little piece of information helps us
because in JavaScript the constructor function also happens to be the name of the type. So the
strategy is to extract the identifier of the constructor function and then reference the prototype
property, as shown in the following code.
Source: /website/ROOT/scripts/jaxson/commons.js
function GetPrototypeObject(obj, callback) {
if (typeof(obj.constructor) == "function") {
var funcMatch = /function\s(.*)\(/;
var result = obj.constructor.toString().match(funcMatch);
if (result != null) {
if (typeof(callback) == "function") {
var iterobj;
if (typeof(result[1]) == "string") {
eval("var prototypePropery = " + result[1] + ".prototype;");
callback(prototypePropery, result[1]);
}
}
}
}
}
In the implementation of GetPrototypeObject, the first test is the verification that the
obj.constructor property actually exists. If the function does not exist, then there is no constructor,
and there is no need to continue. If the function does exist, then a regular expression
is used to extract the function name. The regular expression in the code example is shown in
bold and is recognized as a regular expression because of the slashes.
When using regular expressions in the context of a string, the match function is called
and returns the results of the match. If there are results, then an identifier is found that can be
used to reference the prototype property. But a text buffer and not an object is found. The text
buffer has to be converted into an object, by using the eval statement. The dynamically executed
buffer will assign the locally declared prototypeProperty to reference the prototype property.
Then using a code block, the object property and identifier are passed to the caller.
Another context is the serialization of an object instance that only includes state and no
functions. Without yet seeing the code, you can probably guess what the filter does. The filter
code tests if the property to be filtered is a function object. If the property is a function object,
then the property should not be filtered. And, in fact, that is how the filter code is written, as
shown in the following listing.
Source: /website/ROOT/scripts/jaxson/commons.js
Serializer.toSourceState = function(obj) {
return ops.serialize(obj,
{
canProcessFilter: function(property, obj, propertyIdentifier) {
if (typeof(property) == "function") {
return false;
}
else {
return true;
}
}
});
}
The bold code shows how to test the type of object using the typeof operator.
Another context is the serialization of state to JSON notation. Serializing to JSON is like
serializing to a state, except that the property identifiers have quotes around them. The serialization
code is identical to the state serialization code, except that the quoteProperties data
member is set to true.
Source: /website/ROOT/scripts/jaxson/commons.js
Serializer.toSourceJSON = function(obj) {
return ops.serialize(obj,
{
quoteProperties : true,
canProcessFilter: function(property, obj, propertyIdentifier) {
if (typeof(property) == "function") {
return false;
}
else {
return true;
}
}
});
}
After examining the previous three serialization contexts, you are probably thinking that
the code is relatively similar, but the results are very different. This is an example of how code
blocks can be used to separate the general iteration from a specific processing.
Another serialization context that you will use that is similar to toSource is serializing with
a reference. The context of this serialization is as follows. You are creating a system where a type
serves as a basic functionality. After having instantiated the type, customizations are performed.
Then you decide to serialize the object, but you want to serialize only the customizations, the
reason being that when the object is re-created on another computer or program, you want
a different base functionality to be used. Thus, the same class could operate with different
base functionalities. The solution is to not serialize the prototype properties and then generate
a buffer that instantiates the type.
Note I don’t explain the implementation of the other serialization context types because they don’t illustrate
any new techniques. I cover only how to use GetPrototypeObject in a different context and more
complicated filter code. If you’re interested in learning more, take a look at the test code in the file /website/
ROOT/ajax articles/javascript/tosource.html, and in particular the test method jaxson_tosource_oo.
Serialization in JavaScript seems to be a simple thing, and the toSource method looks
extremely useful. Yet as discussed in this article, toSource is incomplete. When you write
JavaScript code to serialize, keep the following points in mind:
• Serialization in JavaScript means to generate a buffer that is formatted in the JavaScript
Object format.
• In this article, we didn’t look at how to re-create the serialized object. This is because to
do so only requires passing the buffer to the eval statement and assigning the results of
eval to a variable.
• Serialization has many different contexts. The ops.serialize function implements
a very general serialization that needs to be specialized.
• When serializing, there is no type information. To have type information, you need to
extract it and then store it somewhere. Remember that JavaScript is a prototype-based
programming language, and JavaScript types are different in concept when compared
to types in languages like C# and Java.
• This article’s serialization techniques show how to define an algorithm that uses code
blocks to separate a general iteration code block from a specific context code block.
|