Recently I had to implement a portal that embedded contents from other web applications. I tried to build a framework for easily adding remote applications without any changes to the remote applications. The framework had following design:
  • For different functionality and pages we stored feature-link and url in configuration, that configuration basically allowed us to find where to get the HTML/XHTML render page or form.
  • However, instead of displaying raw HTML, I first converted it into valid XHTML using Tidy and then transformed such that all form submissions go through our proxy server. As part of the transformation, I added original form URLs, methods and cookies as hidden fields. This allowed me to keep the proxy server without any client state, which was nice for clustering.
  • When the form is submitted, the proxy server intercepts it and then makes the request to the remote application and sends back response. The proxy server reads the original target URLs, method types, cookies from the form so that remote application can manage state if it’s using Sessions.

Following are a few simplified classes that I developed for the proxy framework:

Content Transformation

First, I developed HTML to XHTML conversion and transformation to modify forms. Following is the XSLT that I used:
  1 <xsl:stylesheet
 
  2   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  3   version="1.0"
  4   xmlns:xhtml="http://www.w3.org/1999/xhtml"
 
  5   xmlns="http://www.w3.org/1999/xhtml"
  6   exclude-result-prefixes="xhtml">
  7 
 
  8   <xsl:param name="callbackHandler"/>
  9   <xsl:param name="callbackUser"/>
 
 10   <xsl:param name="callbackState"/>
 11 
 12   <xsl:template match="@* | node()">
 
 13     <xsl:copy>
 14       <xsl:apply-templates select="@* | node()"/>
 15     </xsl:copy>
 
 16   </xsl:template>
 17 
 18   <xsl:template match="form">
 19     <xsl:copy>
 
 20       <xsl:apply-templates select="@*"/>
 21       <xsl:attribute name="action" xmlns:java="http://xml.apache.org/xslt/java">
 
 22             <xsl:param name="callbackOriginalActionUrl" select="@action"/>
 23             <xsl:text disable-output-escaping="yes">/web/myproxy?myarg=xxx&amp;otherarg=2</xsl:text>
 
 24             <xsl:value-of select="java:com.plexobject.transform.XslContentTransformer.setAction($callbackHandler, string(@action))" />
 25       </xsl:attribute>
 26       <xsl:attribute name="method" xmlns:java="http://xml.apache.org/xslt/java">
 
 27             <xsl:param name="callbackOriginalMethodType" select="@method"/>
 28             <xsl:text disable-output-escaping="yes">POST</xsl:text>
 
 29             <xsl:value-of select="java:com.plexobject.transform.XslContentTransformer.setMethod($callbackHandler, string(@method))" />
 30       </xsl:attribute>
 31       <xsl:attribute name="id">_Form</xsl:attribute>
 
 32       <xsl:attribute name="name">_Form</xsl:attribute>
 33         <input type="hidden" name="_user" value="{$callbackUser}"/>
 
 34         <input type="hidden" name="_originalActionUrl" value="{@action}"/>
 
 35         <input type="hidden" name="_orginalMethodType" value="{@method}"/>
 
 36         <input type="hidden" name="_userState" value="{$callbackState}"/>
 
 37       <xsl:apply-templates select="node()"/>
 38     </xsl:copy>
 39   </xsl:template>
 
 40 
 41 
 42   <xsl:template match="title"/>
 43 
 44 
 
 45 </xsl:stylesheet>
 46 
 47 
 48 
 
A few things note
  • xsl:param allows passing parameters from the runtime (Java)
  • xsl:template is matching for “form” tag and replaces action/method attributes and adds id/name attributes. It then adds a few input hidden fields
  • Finally, I am removing title tag

ContentTransformer interface

  1 package com.plexobject.transform;
 
  2 
  3 import java.util.Map;
  4 
  5 public interface ContentTransformer {
 
  6     /**
  7      * This method transforms given contents
 
  8      * 
  9      * @param contents
 10      *            - input contents
 
 11      * @param properties
 12      *            - input/output properties for transformation
 
 13      * @return transformed contents
 14      * @throws TransformationException
 
 15      *             - when error occurs while transforming content.
 
 16      */
 17     public String transform(String contents, Map<String, String> properties)
 18             throws TransformationException;
 
 19 }
 20 
 21 
 

ContentTransformer implementation

A few things to note in following implementation:
  • I use JTidy to convert HTML to XHTML
  • I pass some of the parameters to the XSL stylesheet and I also read a few properties back. Though, reading properties back is a bit kludgy but it works.
   1 package com.plexobject.transform;
 
   2 
   3 import java.io.ByteArrayInputStream;
   4 import java.io.ByteArrayOutputStream;
   5 import java.io.InputStream;
 
   6 import java.util.HashMap;
   7 import java.util.Map;
   8 
   9 import javax.xml.transform.Result;
 
  10 import javax.xml.transform.Source;
  11 import javax.xml.transform.Transformer;
  12 import javax.xml.transform.TransformerException;
 
  13 import javax.xml.transform.TransformerFactory;
  14 import javax.xml.transform.stream.StreamResult;
  15 import javax.xml.transform.stream.StreamSource;
 
  16 
  17 import org.w3c.tidy.Tidy;
  18 
  19 public class XslContentTransformer implements ContentTransformer {
 
  20     public static final String ACTION = "form_action";
  21     public static final String METHOD = "form_method";
 
  22 
  23     private static final Map<String, Map<String, String>> xslProperties = new HashMap<String, Map<String, String>>();
 
  24     private volatile Transformer transformer;
  25     private final String xslUri;
  26     private final boolean useTidy;
 
  27 
  28     public XslContentTransformer(final String xslUri, final boolean useTidy) {
 
  29         this.xslUri = xslUri;
  30         this.useTidy = useTidy;
  31     }
  32 
 
  33     public static final String setAction(final String callbackHandler,
  34             final String action) {
 
  35         getPropertiesForCallback(callbackHandler).put(ACTION, action);
  36         return "";
  37     }
  38 
 
  39     public static final String getAction(final String callbackHandler) {
  40         return getPropertiesForCallback(callbackHandler).get(ACTION);
 
  41     }
  42 
  43     public static final String setMethod(final String callbackHandler,
 
  44             final String method) {
  45         getPropertiesForCallback(callbackHandler).put(METHOD, method);
  46         return "";
  47     }
 
  48 
  49     public static final String getMethod(final String callbackHandler) {
  50         return getPropertiesForCallback(callbackHandler).get(METHOD);
 
  51     }
  52 
  53     /**
  54      * This method transforms given contents
 
  55      * 
  56      * @param contents
  57      *            - input contents
 
  58      * @param properties
  59      *            - input/output properties for transformation
 
  60      * @return transformed contents
  61      * @throws TransformationException
 
  62      *             - when error occurs while transforming content.
 
  63      */
  64     public String transform(String contents, Map<String, String> properties)
  65             throws TransformationException {
 
  66         initTransformer();
  67         final long started = System.currentTimeMillis();
  68 
  69         contents = contents.replaceAll("<!–.*?–>", "");
 
  70         InputStream in = new ByteArrayInputStream(contents.getBytes());
  71         if (useTidy) {
  72             in = tidy(in, (int) contents.length());
 
  73         }
  74 
  75         //
  76         final Source xmlSource = new StreamSource(in);
 
  77         final ByteArrayOutputStream out = new ByteArrayOutputStream(
  78                 (int) contents.length());
  79 
 
  80         final Result result = new StreamResult(out);
  81         String callbackHandler = properties.get("callbackHandler");
  82         if (callbackHandler == null) {
 
  83             callbackHandler = Thread.currentThread().getName();
  84         }
  85         final Map<String, String> props = new HashMap<String, String>();
 
  86         xslProperties.put(callbackHandler, props);
  87         transformer.setParameter("callbackHandler", callbackHandler);
  88         for (Map.Entry<String, String> e : properties.entrySet()) {
 
  89             transformer.setParameter(e.getKey(), e.getValue());
  90         }
  91         try {
  92             transformer.transform(xmlSource, result);
 
  93         } catch (TransformerException e) {
  94             throw new TransformationException("Failed to transform " + contents, e);
 
  95         }
  96         properties.put(ACTION, getAction(callbackHandler));
  97         properties.put(METHOD, getMethod(callbackHandler));
  98         xslProperties.remove(callbackHandler);
  99         return new String(out.toByteArray());
 
 100     }
 101 
 102     private static final Map<String, String> getPropertiesForCallback(
 
 103             String callbackHandler) {
 104         final Map<String, String> props = xslProperties.get(callbackHandler);
 105         if (props == null) {
 
 106             throw new NullPointerException(
 107                     "Failed to find properties for callback " + callbackHandler);
 108         }
 109         return props;
 
 110     }
 111 
 112     // no synchronization needed, multiple initialization is acceptable
 113     private final void initTransformer() {
 
 114         if (transformer == null) {
 115             try {
 116                 TransformerFactory transFact = TransformerFactory.newInstance();
 117                 InputStream in = getClass().getResourceAsStream(xslUri);
 
 118                 if (in == null) {
 119                     throw new TransformationException("failed to find xslt "
 120                             + xslUri);
 
 121                 }
 122                 Source xsltSource = new StreamSource(in);
 123                 transformer = transFact.newTransformer(xsltSource);
 124             } catch (TransformationException e) {
 
 125                 throw e;
 126             } catch (RuntimeException e) {
 127                 throw e;
 128             } catch (Exception e) {
 
 129                 throw new TransformationException(
 130                         "Failed to initialize XSL transformer", e);
 131             }
 132         }
 
 133     }
 134 
 135     private final InputStream tidy(InputStream in, int length) {
 136         ByteArrayOutputStream out = new ByteArrayOutputStream(length);
 
 137         Tidy converter = new Tidy();
 138         converter.setTidyMark(false);
 139         converter.setXmlOut(true);
 140         converter.setXmlPi(true);
 
 141         converter.setXmlPIs(true);
 142         converter.setNumEntities(true);
 143         converter.setDocType("omit");
 144         converter.setUpperCaseTags(false);
 
 145         converter.setUpperCaseAttrs(false);
 146         converter.setFixComments(true);
 147         converter.parse(in, out);
 148         return new ByteArrayInputStream(out.toByteArray());
 
 149     }
 150 }
 151 
 152 
 

Proxy

Following interfaces and classes show how GET/POST requests are proxied:

Proxy Interface

  1 package com.plexobject.web.proxy;
 
  2 
  3 import java.io.IOException;
  4 import java.util.Map;
  5 
 
  6 public interface HttpProxy {
  7     /**
  8      * This method issues a GET or POST request based on method and URI URI specified in the ProxyState
 
  9      * and adds given parameters to the request.
 
 10      * 
 11      * @param state
 12      *            - proxy state
 
 13      * @param params
 14      *            - name/value pairs of parameters that are sent to the get
 
 15      *            request
 16      */
 17     public ProxyResponse request(ProxyState state, Map<String, String[]> params)
 
 18             throws IOException;
 19 }
 20 
 21 
 

Proxy Implementation

Following class implements HttpProxy interface using HTTPClient library:
   1 package com.plexobject.web.proxy;
 
   2 
   3 import java.io.IOException;
   4 import java.util.ArrayList;
   5 import java.util.List;
 
   6 import java.util.Map;
   7 
   8 import org.apache.commons.httpclient.Cookie;
   9 import org.apache.commons.httpclient.DefaultHttpMethodRetryHandler;
 
  10 import org.apache.commons.httpclient.HttpClient;
  11 import org.apache.commons.httpclient.HttpMethodBase;
  12 import org.apache.commons.httpclient.HttpState;
 
  13 import org.apache.commons.httpclient.NameValuePair;
  14 import org.apache.commons.httpclient.cookie.CookiePolicy;
  15 import org.apache.commons.httpclient.methods.GetMethod;
 
  16 import org.apache.commons.httpclient.methods.PostMethod;
  17 import org.apache.commons.httpclient.params.HttpMethodParams;
  18 
  19 import com.plexobject.io.IoUtil;
 
  20 
  21 public class HttpProxyImpl implements HttpProxy {
  22     private static final int CONNECTION_TIMEOUT_MILLIS = 30000;
 
  23 
  24     /**
  25      * This method issues a GET or POST request based on method and URI URI specified in the ProxyState
 
  26      * and adds given parameters to the request.
 
  27      * 
  28      * @param state
  29      *            - proxy state
 
  30      * @param params
  31      *            - name/value pairs of parameters that are sent to the get
 
  32      *            request
  33      */
  34     public ProxyResponse request(ProxyState state, Map<String, String[]> params)
 
  35             throws IOException {
  36         if (state.getMethod() == MethodType.GET) {
  37             return get(state, params);
 
  38         } else {
  39             return post(state, params);
  40         }
 
  41     }
  42 
  43 
  44     /**
  45      * This method issues a GET request on the URI specified in the ProxyState
 
  46      * and adds given parameters to the request.
 
  47      * 
  48      * @param state
  49      *            - proxy state
 
  50      * @param params
  51      *            - name/value pairs of parameters that are sent to the get
 
  52      *            request
  53      */
  54     private ProxyResponse get(ProxyState state, Map<String, String[]> params)
 
  55             throws IOException {
  56         GetMethod method = new GetMethod(state.getUri());
  57         method.setQueryString(toNameValues(params));
 
  58         return doRequest(state, params, method);
  59     }
  60 
  61     /**
 
  62      * This method issues a POST request on the URI specified in the ProxyState
 
  63      * and adds given parameters to the request.
 
  64      * 
  65      * @param state
  66      *            - proxy state
 
  67      * @param params
  68      *            - name/value pairs of parameters that are sent to the POST
 
  69      *            request
  70      */
  71     private ProxyResponse post(ProxyState state, Map<String, String[]> params)
 
  72             throws IOException {
  73         PostMethod method = new PostMethod(state.getUri());
  74         method.setRequestBody(toNameValues(params));
 
  75         return doRequest(state, params, method);
  76     }
  77 
  78     private ProxyResponse doRequest(ProxyState proxyState,
 
  79             Map<String, String[]> params, HttpMethodBase method)
  80             throws IOException {
  81         long started = System.currentTimeMillis();
 
  82         HttpClient client = new HttpClient();
  83         client.getHttpConnectionManager().getParams().setConnectionTimeout(
  84                 CONNECTION_TIMEOUT_MILLIS);
  85         client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
 
  86         method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
  87                 new DefaultHttpMethodRetryHandler(3, false));
  88 
  89         HttpState initialState = new HttpState();
 
  90         for (Cookie cookie : proxyState.getCookies()) {
  91             initialState.addCookie(cookie);
  92         }
  93         client.setState(initialState);
 
  94 
  95         try {
  96             int statusCode = client.executeMethod(method);
  97             String contents = IoUtil.read(method.getResponseBodyAsStream());
 
  98             //
  99             Cookie[] cookies = client.getState().getCookies();
 100             for (Cookie cookie : cookies) {
 101                 proxyState.addCookie(cookie);
 
 102             }
 103 
 104             return new ProxyResponse(statusCode, contents, proxyState);
 105         } catch (RuntimeException e) {
 
 106             throw e;
 107         } catch (IOException e) {
 108             throw e;
 109         } catch (Exception e) {
 
 110             throw new IOException("failed to process request", e);
 111         } finally {
 112             method.releaseConnection();
 
 113         }
 114     }
 115 
 116     private NameValuePair[] toNameValues(Map<String, String[]> params) {
 117         if (params == null || params.size() == 0) {
 
 118             return new NameValuePair[0];
 119         }
 120         List<NameValuePair> nvPairs = new ArrayList<NameValuePair>();
 
 121         for (Map.Entry<String, String[]> e : params.entrySet()) {
 122             String[] values = e.getValue();
 123             for (int j = 0; j < values.length; j++) {
 
 124                 nvPairs.add(new NameValuePair(e.getKey(), values[j]));
 125             }
 126         }
 127         return (NameValuePair[]) nvPairs.toArray(new NameValuePair[nvPairs
 
 128                 .size()]);
 129     }
 130 }
 131 
 132 
 

ProxyState

Following class maintains URL, cookies, headers, and other information related to web request:
   1 package com.plexobject.web.proxy;
 
   2 
   3 import java.io.Serializable;
   4 import java.io.UnsupportedEncodingException;
   5 import java.net.URLDecoder;
 
   6 import java.net.URLEncoder;
   7 import java.util.Collection;
   8 import java.util.Date;
 
   9 import java.util.HashMap;
  10 import java.util.Map;
  11 
  12 import org.apache.commons.httpclient.Cookie;
 
  13 
  14 /**
  15  * Class: ProxyState
 
  16  * 
  17  * Description: This class stores state needed to make a proxy request including
 
  18  * method type and cookies.
  19  * 
 
  20  */
  21 public class ProxyState implements Serializable {
  22     private static final long serialVersionUID = 1L;
 
  23     private static final String DATA_DELIMITER = "\n";
  24     private static final String COOKIE_DELIMITER = ";";
 
  25     private static final String NULL = "null";
  26 
  27     private String uri;
 
  28     private MethodType method;
  29     private Map<String, Cookie> cookies;
  30 
 
  31     /**
  32      * Constructors for ProxyState
  33      */
 
  34     public ProxyState(String uri, String method) {
  35         this(uri, MethodType.valueOf(method));
  36     }
  37 
 
  38     public ProxyState(String uri, MethodType method) {
  39         this.uri = uri;
  40         this.method = method;
  41         this.cookies = new HashMap<String, Cookie>();
 
  42     }
  43 
  44     /**
  45      * @return uri
 
  46      */
  47     public String getUri() {
  48         return this.uri;
 
  49     }
  50 
  51     /**
  52      * @return method
 
  53      */
  54     public MethodType getMethod() {
  55         return this.method;
 
  56     }
  57 
  58     /**
  59      * @return cookies
 
  60      */
  61     public Collection<Cookie> getCookies() {
  62         return this.cookies.values();
 
  63     }
  64 
  65 
  66     /**
  67      * @param cookies
 
  68      */
  69     public void addCookies(Collection<Cookie> cookies) {
  70         for (Cookie cookie : cookies) {
 
  71             addCookie(cookie);
  72         }
  73     }
  74 
  75     /**
 
  76      * @param cookie
  77      *            - to add
 
  78      */
  79     public void addCookie(Cookie cookie) {
  80         this.cookies.put(cookie.getName(), cookie);
 
  81     }
  82 
  83     public String getCookieString() {
  84         StringBuilder sb = new StringBuilder(512);
 
  85         for (Cookie cookie : cookies.values()) {
  86             if (cookie.getDomain() != null) {
  87                 sb.append(cookie.getDomain()).append(COOKIE_DELIMITER);
 
  88             } else {
  89                 sb.append(NULL).append(COOKIE_DELIMITER);
  90             }
  91             sb.append(cookie.getName()).append(COOKIE_DELIMITER).append(
 
  92                     cookie.getValue()).append(COOKIE_DELIMITER);
  93 
  94             if (cookie.getPath() != null) {
  95                 sb.append(cookie.getPath()).append(COOKIE_DELIMITER);
 
  96             } else {
  97                 sb.append(NULL).append(COOKIE_DELIMITER);
  98             }
  99             if (cookie.getExpiryDate() != null) {
 
 100                 sb.append(String.valueOf(cookie.getExpiryDate().getTime()))
 101                         .append(COOKIE_DELIMITER);
 102             } else {
 103                 sb.append(NULL).append(COOKIE_DELIMITER);
 104             }
 
 105             sb.append(String.valueOf(cookie.getSecure()))
 106                     .append(DATA_DELIMITER);
 107         }
 108         return sb.toString();
 109     }
 
 110 
 111 
 112     @Override
 113     public String toString() {
 114         StringBuilder sb = new StringBuilder(512);
 
 115         sb.append(uri.toString()).append(DATA_DELIMITER);
 116         sb.append(method.toString()).append(DATA_DELIMITER);
 117         sb.append(getCookieString());
 118         return sb.toString();
 119     }
 
 120 
 121     /**
 122      * This method converts proxy state into string based serialized state
 
 123      * 
 124      * @return string based serialized state
 
 125      */
 126     public String toExternalFormat() {
 127         try {
 128             return URLEncoder.encode(toString(), "UTF8");
 
 129         } catch (UnsupportedEncodingException e) {
 130             throw new IllegalStateException("failed to encode", e);
 131         }
 
 132     }
 133 
 134     /**
 135      * This method converts a string based serialized state into the proxy state
 
 136      * 
 137      * @param ser
 138      *            - string based serialized state
 
 139      * @return ProxyState
 140      * @throws IllegalArgumentException
 141      *             - if serialized state is null or corrupted.
 
 142      */
 143     public static ProxyState valueOf(String ser) {
 144         if (ser == null)
 
 145             throw new IllegalArgumentException("Null serialized object");
 146         String decoded;
 147         try {
 
 148             decoded = URLDecoder.decode(ser, "UTF8");
 149         } catch (UnsupportedEncodingException e) {
 150             throw new IllegalArgumentException("Unsupported encoding " + ser, e);
 
 151         }
 152         String[] lines = decoded.split(DATA_DELIMITER);
 153         if (lines.length < 2)
 154             throw new IllegalArgumentException(
 
 155                     "Insufficient number of tokens in serialized object ["
 156                             + decoded + "]");
 157         ProxyState state = new ProxyState(lines[0], lines[1]);
 158         for (int i = 2; i < lines.length; i++) {
 
 159             String[] cookieFields = lines[i].split(COOKIE_DELIMITER);
 160             if (cookieFields.length < 6)
 161                 throw new IllegalArgumentException(
 
 162                         "Insufficient number of tokens 6 in serialized cookies ["
 163                                 + lines[i] + "]/[" + decoded + "]");
 164             String domain = cookieFields[0];
 165             if (NULL.equals(domain)) {
 
 166                 domain = null;
 167             }
 168             String name = cookieFields[1];
 169             String value = cookieFields[2];
 170             String path = cookieFields[3];
 
 171             if (NULL.equals(path)) {
 172                 path = null;
 173             }
 174             Date expires = null;
 
 175             if (!NULL.equals(cookieFields[4])) {
 176                 expires = new Date(Long.parseLong(cookieFields[4]));
 177             }
 178             boolean secure = new Boolean(cookieFields[5]).booleanValue();
 
 179             Cookie cookie = new Cookie(domain, name, value, path, expires,
 180                     secure);
 181             state.addCookie(cookie);
 182         }
 183         return state;
 
 184     }
 185 
 186     @Override
 187     public boolean equals(Object o) {
 188         if (this == o)
 
 189             return true;
 190         if (!(o instanceof ProxyState))
 191             return false;
 
 192         final ProxyState other = (ProxyState) o;
 193         if (uri != null ? !uri.equals(other.uri) : other.uri != null)
 194             return false;
 
 195         if (method != null ? !method.equals(other.method)
 196                 : other.method != null)
 197             return false;
 
 198         return true;
 199     }
 200 
 201     @Override
 202     public int hashCode() {
 
 203         int result;
 204         result = (uri != null ? uri.hashCode() : 0);
 205         result = 29 * result + (method != null ? method.hashCode() : 0);
 
 206         return result;
 207     }
 208 }
 209 
 210 
 

ProxyResponse

Following class stores response from the HttpProxy interface:
  1 package com.plexobject.web.proxy;
 
  2 
  3 import java.io.Serializable;
  4 
  5 
  6 /**
 
  7  * Class: ProxyResponse
  8  * 
  9  * Description: This class stores proxy state and response.
 
 10  */
 11 public class ProxyResponse implements Serializable {
 12     private static final long serialVersionUID = 1L;
 
 13     private int responseCode;
 14     private String contents;
 15     private ProxyState state;
 
 16 
 17     /**
 18      * Constructor for ProxyResponse
 19      */
 
 20     public ProxyResponse(int responseCode, String contents, ProxyState state) {
 21         this.responseCode = responseCode;
 22         this.contents = contents;
 23         this.state = state;
 
 24     }
 25 
 26     /**
 27      * @return http response code
 
 28      */
 29     public int getResponseCode() {
 30         return this.responseCode;
 
 31     }
 32 
 33     /**
 34      * @return XHTML contents
 
 35      */
 36     public String getContents() {
 37         return this.contents;
 38     }
 
 39 
 40     /**
 41      * @return state associated with the proxy web request
 
 42      */
 43     public ProxyState getState() {
 44         return this.state;
 45     }
 
 46 
 47     @Override
 48     public String toString() {
 49         return this.responseCode + "\n" + this.state + "\n" + this.contents;
 
 50     }
 51 }
 52 
 53 
 

MethodType

Following class defines enum for http method types:
  1 package com.plexobject.web.proxy;
 
  2 
  3 /**
  4  * Class: MethodType
 
  5  * 
  6  * Description: Defines supported method types for proxy request.
 
  7  * 
  8  */
  9 public enum MethodType {
 
 10     GET, POST;
 11 }
 12 
 13 
 

Service Example

Following classes show how above HTTPProxy and ContentTransfomer interfaces can be used with Servlet/Portlet APIs:

ProxyService Interface

 1 package com.plexobject.web.service;
 
 2 import javax.servlet.http.*;
 3 import java.io.*;
 4 
 5 public interface ProxyService {
 
 6     public void render(HttpServletRequest request,  HttpServletResponse response) throws IOException ;
 7     public void submit(HttpServletRequest request,  HttpServletResponse response) throws IOException ;
 
 8 }
 9 
 0 
 

ProxyService Implementation

  1 package com.plexobject.web.service;
 
  2 import com.plexobject.web.proxy.*;
  3 import com.plexobject.transform.ContentTransformer;
  4 import javax.servlet.http.*;
 
  5 import java.io.*;
  6 import java.util.*;
  7 
  8 
 
  9 public class ProxyServiceImpl implements ProxyService {
 10     private HttpProxy httpProxy;
 11     private ContentTransformer contentTransformer;
 
 12     public ProxyServiceImpl(HttpProxy httpProxy, ContentTransformer contentTransformer) {
 13         this.httpProxy = httpProxy;
 14         this.contentTransformer = contentTransformer;
 15     }
 16 
 
 17     public void render(HttpServletRequest request,  HttpServletResponse response)  throws IOException {
 18         String url = "http://plexrails.plexobject.com/guest_book/sign";
 19         ProxyState state = new ProxyState(url, MethodType.GET);
 
 20         String inputXhtml = httpProxy.request(state, null).getContents();
 21         Map<String, String> properties = new HashMap<String, String>();
 22         properties.put("callbackState", state.toExternalFormat());
 
 23         String transformedXhtml = contentTransformer.transform(inputXhtml, properties);
 24         response.getWriter().println(transformedXhtml);
 25     }
 26 
 27     public void submit(HttpServletRequest request,  HttpServletResponse response)  throws IOException {
 
 28         String originalActionUrl = request.getParameter("originalActionUrl");
 29         String orginalMethodType = request.getParameter("orginalMethodType");
 30         ProxyState userState = ProxyState.valueOf(request.getParameter("userState"));
 31         Map<String, String[]> params = request.getParameterMap();
 
 32         ProxyState state = new ProxyState(originalActionUrl, orginalMethodType);
 33         state.addCookies(userState.getCookies());
 34         ProxyResponse proxyResponse = httpProxy.request(state, params);
 35         response.getWriter().println(proxyResponse.getContents());
 36     }
 
 37 }
 38 
 39 
 

Download Code

You can download above code from here.

Acknowledgement

I would like to thank the folks at XSLT forum of Programmer-to-Programmer (http://p2p.wrox.com/forum.asp?FORUM_ID=79) for answering my XSLT questions.
U.S. Killed 90, Including 60 Children, in Afghan Village, U.N. Finds

U.S. soldiers say they executed Iraqis on riverbank

I saw Damien Katz blog on REST, I just don’t get it and I was a bit surprised that he doesn’t get REST especially since he wrote CouchDB based on REST. Though, I admit there are a lot of bad examples of REST services that use REST sort of like RPC over POST, but resource oriented services can be quite simple and powerful. REST principles are building blocks for the web and it has proven to be quite scalable and efficient. I have been developing REST based services for a number of years, in some ways before I learned about Roy Fieldings’ thesis and REST principles. Back in 90s, I worked on building traffic sites and used CORBA to subscribe and publish traffic events. We also published that data on the website, but soon we found a number of people were scraping the website so I wrote a simple XML over HTTP service to download the data that other groups can use. I have found following benefits when using REST based services:
  • separating reads from writes. I have worked on large ecommerce and travel website and one of the lesson is to keep your read/query services separate from your transactional services. REST APIs define separate operations for reads and write/updates.
  • caching: you can find tons of off the shelf solutions for caching GET requests including hardware solutions. There are tons of features like ETags and cache headers that provide this feature.
  • compression: Since REST uses HTTP, you can use compression such as gzip. This can improve the performance of the services.
  • idempotency: GET, PUT, DELETE and HEAD are idempotent, which means if designed correctly the request can be retried without any worries about side effects. POST on the other hand is not idempotent and may have side effects.
  • bookmarking: GET requests can be easily bookmarked. It is important not to use GET to change state of application.
  • security: Though, security has been weakest area of REST as compared to SOAP, but HTTPS and simple authentication surfice. Though, there are better standards like oauth.
  • big response size: REST/HTTP is the only service platform that I have seen supports gigabytes of responses. I have done a lot of CORBA based services in 90s, EJBs/SOAP in early 2000s and messaging based services for over ten years. None of those platforms support large size responses.
  • simplicity: I find this is the biggest reason for using REST. I can use browser to call GET based requests and write client in any language.
  • resources: REST response can include URIs for other APIs and client can change state through these resources. You can use XHTML to embed all these resources that can be easily tested with browsers.
  • No need for additional jars: When I used CORBA, EJBs, RMI or JINI, I always had to put client/skeleton jar files. Having worked in large companies where I had to import dozens of these jar files became maintenance problem. With REST, I can simply call the service without importing anything.
  • Error codes: HTTP comes with a number of response codes for real life services including thrashing requests such as server busy (503).
  • Meta data: As opposed to CORBA, JINI, RMI services, I can pass meta data easily as HTTP supports predefined and user-defined headers. These headers can include information on authentication, quality of service, timeout or other context related data. Occasionally, I add Map<String,String> to APIs when I use Java based services, but it polutes pure APIs.
The only real drawbacks I see for REST based services are that they are generally synchronous and blocking which can waste threads (though some of it can be solved with async I/O or event based dispatching). Personally I like to use messaging underneath REST services that provide asynchronous, persistence, and better reliability.

Love and Hate with Java

July 31st, 2008

For past few years, bashing Java has been really popular though some of the criticism has merits. But in general, due to its popularity, Java has become “the man” who tries to bring everyone down. There are millions of programmers who work for the “Java the man”. I saw recent post from NYU professor, who called Java-savvy college grads to tomorrow’s pizza delivery man. I know Joel Spolsky often mentions teaching C in unversities to help understand pointers and memory management. I agree with notion of teachings multiple languages in universities so that graduates have wide breadth of understanding with differant programming paradigms. I started learning programming back in 80’s on Atari and learned BASIC. I then moved to PC and learned GW-BASIC and then learned C, FORTRAN, Assembler, COBOL, Pascal, C++ in college. I also learned Lisp, Prolog, Perl on my own. In late 80’s and early 90s, I also learned DBase III, RPG and SAS, which was called 4th generation language. Similarly, C, FOTRAN, Pascal, etc. were called third languages, assembly languages were second generation languages. I learned Java in ‘95 when it came out and found it to be much easier to program than C/C++. I also learned Python, Ruby and Erlang for past few years and have been learning Haskell and Scala these days.

For most part, Java has been my primary language with some use of C++, Perl, Ruby, Python/Jython (and Erlang on my own). Though I wish I could use more Erlang but I don’t have same experience with Erlang as I have with Java. Over time, Java managed to take a lot of C/C++ share of the market. Also, Java has managed to buid large ecosystem with open source and commercial suites of libraries and frameworks. I often hear that Java is so enterprisy and popular in large companies, but truth is that Java has proven itself to be reliable language. Steve Yeggie also mentioned in his blog how Google primarily uses Java, C++, Python and Javascript.

I like the polyglot environment, where I write performance critical code in system language like Java and use Ruby/Python for high level glue code or web tier. I find often the criticism of Java is dishonest. For example, though people raves about metaprogramming in Ruby but forget to mention all the overhead that goes with it, not to mention security holes and memory leaks issues. The truth is that none of hot languages like Python, Ruby, Erlang, Haskell provide same performance as Java, in fact Java’s hotspot compiler beats C++ in production. I am going to ignore static vs dynamic language debate, but I’ve found static languages work better with large number of developers. Again, I like these languages, but I prefer to see some balanced comparison. The real reason Java is popular is because there are tons of jobs. Here is quick comparison of jobs in Java, C++, C#, Erlang, Haskell, OCaml, Ruby, Python and Factor:

As Bruce Lee said:

I fear not the man who has practiced 10,000 kicks once, but I fear the man who has practiced one kick 10,000 times.

I find it, the way you can distinguish yourself is by learning more about the design and architecture of developing system and learning more about the ecosystem. It takes years to learn the ins and outs of programming language and all the tools and libraries with it. Though, I totally agree with learning a number of different languages like Haskell, Factor, Erlang, Scala, Groovy and I have been trying to learn all those for many years. However, for system language my first choice is still Java, simply because I have found it to be reliable and efficient language. As James Gosling said Java is a blue collar language. Sure it does not have closures (yet), actors, transactional memory, metaprogramming or AST/macros but it is well suited for building large applications by hundreds or thousands of programmers. I just started a large project in my division at Amazon, and sure enough I chose Java because I have been using it for over twelve years and I know it can do the job. It wasn’t simply because Java is safe choice (no one got fired for choosing IBM), but practically Java has more matured solutions for business needs. For example, my project needs to integrate with 20+ applications and is aimed at reducing manual work so it needed portal server, workflow engine, rules engine and messaging service and there are tons of options for those in Java community.

Finally, JVM is proving to be neat platform for building new languages like JRuby, Jython, Groovy, Scala, Clojure, etc. that can bring cool features and high interoperability with existing system. As Guy Steele said in his recent interview, you can’t expect one language to solve all problems.

Reaction vs Preparedness

July 7th, 2008

I’ve had struggled with culture of fire-fighting and heroism and discussed these issues several time in my blog [1], [2], [3], [4]. Over the weekend, I saw Bruce Schneier’s talk on security. He talked about Hurricane Katrina’s devastation and said that politicians don’t like to invest in infrastructure because it does not get noticed. He showed how reaction is considered more important than preparedness. Another thing he mentioned was that investing in new projects is received better than investing in existing projects. I find that IT culture has very similar value system, where heros or martyrs who react (on daily basis) to emerging crisis and fires are noticed by the managers and receive promotions and accolades. The same culture does not value in investing to better prepare for disasters or emergencies. Unfortunately, once people get promoted they moved on and leave the pile of shit for someone else. Of course, the next team would rewrite rather than keeping the old system. In last two years, I had to rewrite two major systems where the previous systems were written less than two years ago. May be they will be rewritten by someone else when I leave. The cycle will continue…

Introduction

I have been a Twitter user for a while, have observed or heard about downtime and scalability problems with Twitter. The scalability of Twitter has become a topic for a lot of discussions and blogs and has also offered a useful excercise to design scalable systems. A common root cause as identified from Twitter’s blogs is that the architecture is based on CMS because it was written in Ruby on Rails and that is what Rails good at. The solution to the scalability problem as pointed by other people is messaging based architecture. There’s also been a lot of blame for Twitter’s problems on Ruby and Rails because Ruby is a slow language compared to other static and dynamic languages and Rails is not built for scalability. Though, there is some truth to it, but I don’t think there are the sole bottlenecks. In fact, I am going to show a small prototype written in Ruby and Rails (partially) that integrate with the messaging middlewares, which can be scaled easily. I have been using messaging middlwares such as CORBA Event service, IBM MQ series, Websphere, Weblogic, Tibco and ActiveMQ for over ten years and have long been proponent of messaging based sysems for scalable systems [1] [2]. So, I spent a few hours to put together a prototype based on messaging middleware and Ruby on Rails to see how such system can be developed.

Design Principles

Before describing my design, I am going to review some of the design principles that I have used for building large scale systems ([1], [2]), which are:
  • Coupling/Cohesion - loosely coupled and shared nothing architecture and partitioning based on functionality.
  • Messaging or SEDA architecture to implement reliable and scalable services and avoid temporal dependencies.
  • Resource management - good old practices of avoiding leaks of memory, database connections, etc.
  • Data replication especially read-only data.
  • Partition data (Sharding) - using multiple databases to partition the data.
  • Avoid single point of failure.
  • Bring processing closer to the data.
  • Design for service failures and crashes.
  • Dynamic Resources - Design service frameworks so that resources can be removed or added automatically and clients can automatically discover them. Use virtualization and horizontal scalability whenever possible.
  • Smart Caching - Cache expensive operations and contents as much as possible.
  • Avoid distributed transactions, use optimistic compensating transactions (See CAP principle).

Architecture & Design

Following is high level architecture for the Microblogging system:

First, I selected REST architecture as an entry point to our system for both Web UI and 3rd party applications and used messaging middleware for implementing the sevices. This gives us ease of access with REST APIs and scalability with messaging. In my implementation I chose JRuby/Rails to implement most of the code, Derby for the database and ActiveMQ for the messaging store. In addition to scalability, the messaging middleware gives you a lot of advantages from functional languages like Erlang such as immutability, message passing, fault tolerance (via persistence queues). You can even build support for versioning and hot code swapping by adding version number to each message and creating routers (See integration patterns) to direct messages to different handlers.

APIs

Following are REST APIs that will be exposed to 3rd party apps, Web and other kind of UI:
Create User
POST /users where the user information is passed in the body in the form of parameters.
Login/Authenticate User
POST /users/userid/sessions This will authenticate the user and create a session. Note that most of following APIs send back session-id, will be stored in the database (sharded) and will be used to retrieve user information.
Logout
HTTP-HEADER session-id DELETE /users/userid/sessions
Get User information
HTTP-HEADER session-id GET /users/userid This API will return detailed user information
Anonymous User information
GET /users/userid This API will return public user information
List of Followings
HTTP-HEADER session-id GET /followings/userid This API will return summary of people, the user is following.
Create Followers
HTTP-HEADER session-id POST /followers/followerid This API will create one-way follower relationship between the user and follower.
Enable notification for Followers
HTTP-HEADER session-id POST /followers/followerid/notifications
Disable notification for Followers
HTTP-HEADER session-id DELETE /followers/followerid/notifications
Block Followers
HTTP-HEADER session-id POST /followers/followerid/blocking
Unblock Followers
HTTP-HEADER session-id DELETE /followers/followerid/blocking
Follower Exist
HTTP-HEADER session-id GET /followers/followerid This API will return 200 HTTP code if follower exist.
List of Followers
HTTP-HEADER session-id GET /followers This API will return summary of people, the user is following.
Archive Messages
HTTP-HEADER session-id GET /messages?offset=xxx&limit=yyy&since=date This API will return archived messages for the user, where offset and limit will be optional.
DELETE Messages
HTTP-HEADER session-id DELETE /messages/message-id This API will return archived messages for the user, where offset and limit will be optional.
Send Direct Messages
HTTP-HEADER session-id POST /directmessages/targetuserid This API will return send direct message to the given user.
Send Reply
HTTP-HEADER session-id POST /reply/message-id This API will return reply for the given message-id and pass the contents of the message in the body (as parameters).
Direct Messages Received
HTTP-HEADER session-id GET /directmessages/userid?offset=xxx&limit=yyy&since=date This API will return messages received by the user.
Replies Received
HTTP-HEADER session-id GET /replies?offset=xxx&limit=yyy&since=date This API will return replies received by the user.
Update Status
HTTP-HEADER session-id POST /statuses This API will update status of the user and pass the contents of the message in the body (as parameters).
Get Statuses
HTTP-HEADER session-id GET /statuses?offset=xxx&limit=yyy&since=date This API will update status of the user and pass the contents of the message in the body (as parameters).
User Timeline
HTTP-HEADER session-id GET /timeline/username?offset=xxx&limit=yyy&since=date This API will return timeline of the user. This API will compare given username with the authenticated username and will return detailed timeline if match, otherwise it will return public timeline.
Public Timeline
GET /timeline/username?offset=xxx&limit=yyy&since=date This API will return public timeline of the user.
Friends Timeline
HTTP-HEADER session-id GET /friendstimeline?offset=xxx&limit=yyy&since=date This API will return timeline of the friends of the user.

Request Flow

Here is an illustration of how information is flowed through different components:

Though, I am not showing request flow of all APIs, but they will follow similar pattern of flow.

Detailed Design

Domain Classes

Primary domain classes are:
  • User
  • Message, which has four subclasses DirectMessage, ReplyMessage, Tweet and Status for various kind of messages in the system.
  • Follower - creates one-way relationship between two users, where follower can choose to be notified when the user changes his/her status.
I identify each message with special GUID and using a simple scheme to generate GUID for request-ids and message-ids, but for real project I would recommend better libraries such as UUIDTools.

Schema

Followers
  1 
  2 class CreateFollowers < ActiveRecord::Migration
  3   def self.up
  4     
  5     create_table :followers do |t|
  6       t.column :username,            :string, :limit => 16
  7       t.column :follower_username,   :string, :limit => 16
  8       t.column :relation_type,       :string, :default => ‘Follower’, :limit => 32
  9       t.column :blocked,             :boolean, :default => false
 10       t.column :notifications,       :boolean, :default => false
 11       t.column :created_at,          :datetime
 12       t.column :updated_at,          :datetime
 13       t.column :deleted_at,          :datetime
 14     end
 15     add_index :followers, :username
 16     add_index :followers, :follower_username
 17   end
 18 
 19   def self.down
 20     drop_table :followers
 21   end
 22 end
 
 

Messages
  1 class CreateMessages < ActiveRecord::Migration
  2   def self.up
  3     create_table :messages do |t|
  4       t.column :message_id,          :string, :limit => 64
  5       t.column :type,                :string, :limit => 32
  6       t.column :message_type,        :string, :default => ‘Say’, :limit => 32
  7       t.column :reply_message_id,    :string, :limit => 64
  8       t.column :username,            :string, :limit => 16
  9       t.column :channel_name,        :string, :limit => 32
 10       t.column :message_body,        :string, :limit => 140
 11       t.column :favorite,            :boolean, :default => false
 12       t.column :sent_at,             :datetime, :default => Time.now.utc
 13       t.column :created_at,          :datetime
 14       t.column :deleted_at,          :datetime
 15     end
 16     add_index :messages, :message_id
 17     add_index :messages, :username
 18   end
 19 
 20   def self.down
 21     drop_table :messages
 22   end
 23 end
 
 

Users
  1 class CreateUsers < ActiveRecord::Migration
  2   def self.up
  3     create_table :users do |t|
  4       t.column :username,            :string, :limit => 16
  5       t.column :password,            :string, :limit => 16
  6       t.column :name,                :string, :limit => 64
  7       t.column :email,               :string, :limit => 64
  8       t.column :time_zone_id,        :string, :limit => 32
  9       t.column :created_at,          :datetime
 10       t.column :updated_at,          :datetime
 11       t.column :deleted_at,          :datetime
 12     end
 13     add_index :users, :username
 14   end
 15 
 16   def self.down
 17     drop_table :users
 18   end
 19 end
 
 

Persistence

I used Rails’ ActiveRecord library to provide persistence, though alternatively I could have used ActiveHibernate. These libraries provide a quick way to add persistence capabilities with minimum configuration and boilerplate. This prototype is using multiple levels of partitioning, first at the service level, second at the persistence level. I am using multiple databases of Derby to store objects using a simple hashing scheme for load distribution. This prototype also shows how to connect to multiple databases in Rails, which was difficult in early implementation of Twitter.

Domain Services

The core model and services use domain driven design and applies principles of fat model and thin service (as opposed to fat servicess and anemic model). The domain services implement external REST APIs and use underlying ActiveRecord for most of the functionality.

Messaging Middleware

The REST based web services don’t invoke domain services directly, instead they use messaging middleware. In real application, I might use ESB/integration patterns such as intelligent routing to partition the system across multiple machines and send the request to the suitable queue. In this prototype, I am simply using ActiveMQ, which is fairly robust and easy to use. I am also using separate queues for different kind of operations. Another lesson I have learned in building large systems is to separate reads from the writes so that you can scale them independently and also offer different quality of services, e.g. read queues can be non-persistent, but write queues can be persistent.

Business Delegate

The REST based web services don’t interact with the messaging middleware directly, instead they use business delegates that hides all details of sending out message, creating temporary queues and receiving messages. The interface of business delegates is same as services.

Benchmark Results

Though, performance was not the objective of my prototype, but I tried to check how many requests I can process on my development machine. I chose only to benchmark messaging middlewares and not REST server because JVM uses native threads and web containers such as Tomcat uses a small sized thread pool to perform requests. Since, our architecture is heavily IO-bound, that would not scale. Alternatively, I could have build reactive or event based APIs for HTTP or use Yaws/Mochiweb as a container for REST based web sevices because creating a process in Erlang is pretty cheap. For example, Erlang process takes 300 bytes, whereas Java thread take 2M by default (though, it can be reduced to 256K on most machines). Here are results of running a simple server with embedded ActiveMQ and load test both running JRuby on my Pentium Linux machine. I used default VM size for both JRuby processes and didn’t tune any options:
WhatElapsed Time (secs)ThroughputInvocation Times
load_test_create_users