Implementing HTTP Proxy Service with XSL Transformation
August 27th, 2008
- For different functionality and pages we stored feature-link and url in configuration, that configuration basically allowed us to find where to get the HTML/XHTML render page or form.
- However, instead of displaying raw HTML, I first converted it into valid XHTML using Tidy and then transformed such that all form submissions go through our proxy server. As part of the transformation, I added original form URLs, methods and cookies as hidden fields. This allowed me to keep the proxy server without any client state, which was nice for clustering.
- When the form is submitted, the proxy server intercepts it and then makes the request to the remote application and sends back response. The proxy server reads the original target URLs, method types, cookies from the form so that remote application can manage state if it’s using Sessions.
Following are a few simplified classes that I developed for the proxy framework:
Content Transformation
First, I developed HTML to XHTML conversion and transformation to modify forms. Following is the XSLT that I used:1 <xsl:stylesheet 2 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 3 version="1.0" 4 xmlns:xhtml="http://www.w3.org/1999/xhtml" 5 xmlns="http://www.w3.org/1999/xhtml" 6 exclude-result-prefixes="xhtml"> 7 8 <xsl:param name="callbackHandler"/> 9 <xsl:param name="callbackUser"/> 10 <xsl:param name="callbackState"/> 11 12 <xsl:template match="@* | node()"> 13 <xsl:copy> 14 <xsl:apply-templates select="@* | node()"/> 15 </xsl:copy> 16 </xsl:template> 17 18 <xsl:template match="form"> 19 <xsl:copy> 20 <xsl:apply-templates select="@*"/> 21 <xsl:attribute name="action" xmlns:java="http://xml.apache.org/xslt/java"> 22 <xsl:param name="callbackOriginalActionUrl" select="@action"/> 23 <xsl:text disable-output-escaping="yes">/web/myproxy?myarg=xxx&otherarg=2</xsl:text> 24 <xsl:value-of select="java:com.plexobject.transform.XslContentTransformer.setAction($callbackHandler, string(@action))" /> 25 </xsl:attribute> 26 <xsl:attribute name="method" xmlns:java="http://xml.apache.org/xslt/java"> 27 <xsl:param name="callbackOriginalMethodType" select="@method"/> 28 <xsl:text disable-output-escaping="yes">POST</xsl:text> 29 <xsl:value-of select="java:com.plexobject.transform.XslContentTransformer.setMethod($callbackHandler, string(@method))" /> 30 </xsl:attribute> 31 <xsl:attribute name="id">_Form</xsl:attribute> 32 <xsl:attribute name="name">_Form</xsl:attribute> 33 <input type="hidden" name="_user" value="{$callbackUser}"/> 34 <input type="hidden" name="_originalActionUrl" value="{@action}"/> 35 <input type="hidden" name="_orginalMethodType" value="{@method}"/> 36 <input type="hidden" name="_userState" value="{$callbackState}"/> 37 <xsl:apply-templates select="node()"/> 38 </xsl:copy> 39 </xsl:template> 40 41 42 <xsl:template match="title"/> 43 44 45 </xsl:stylesheet> 46 47 48A few things note
- xsl:param allows passing parameters from the runtime (Java)
- xsl:template is matching for “form” tag and replaces action/method attributes and adds id/name attributes. It then adds a few input hidden fields
- Finally, I am removing title tag
ContentTransformer interface
1 package com.plexobject.transform; 2 3 import java.util.Map; 4 5 public interface ContentTransformer { 6 /** 7 * This method transforms given contents 8 * 9 * @param contents 10 * - input contents 11 * @param properties 12 * - input/output properties for transformation 13 * @return transformed contents 14 * @throws TransformationException 15 * - when error occurs while transforming content. 16 */ 17 public String transform(String contents, Map<String, String> properties) 18 throws TransformationException; 19 } 20 21
ContentTransformer implementation
A few things to note in following implementation:- I use JTidy to convert HTML to XHTML
- I pass some of the parameters to the XSL stylesheet and I also read a few properties back. Though, reading properties back is a bit kludgy but it works.
1 package com.plexobject.transform; 2 3 import java.io.ByteArrayInputStream; 4 import java.io.ByteArrayOutputStream; 5 import java.io.InputStream; 6 import java.util.HashMap; 7 import java.util.Map; 8 9 import javax.xml.transform.Result; 10 import javax.xml.transform.Source; 11 import javax.xml.transform.Transformer; 12 import javax.xml.transform.TransformerException; 13 import javax.xml.transform.TransformerFactory; 14 import javax.xml.transform.stream.StreamResult; 15 import javax.xml.transform.stream.StreamSource; 16 17 import org.w3c.tidy.Tidy; 18 19 public class XslContentTransformer implements ContentTransformer { 20 public static final String ACTION = "form_action"; 21 public static final String METHOD = "form_method"; 22 23 private static final Map<String, Map<String, String>> xslProperties = new HashMap<String, Map<String, String>>(); 24 private volatile Transformer transformer; 25 private final String xslUri; 26 private final boolean useTidy; 27 28 public XslContentTransformer(final String xslUri, final boolean useTidy) { 29 this.xslUri = xslUri; 30 this.useTidy = useTidy; 31 } 32 33 public static final String setAction(final String callbackHandler, 34 final String action) { 35 getPropertiesForCallback(callbackHandler).put(ACTION, action); 36 return ""; 37 } 38 39 public static final String getAction(final String callbackHandler) { 40 return getPropertiesForCallback(callbackHandler).get(ACTION); 41 } 42 43 public static final String setMethod(final String callbackHandler, 44 final String method) { 45 getPropertiesForCallback(callbackHandler).put(METHOD, method); 46 return ""; 47 } 48 49 public static final String getMethod(final String callbackHandler) { 50 return getPropertiesForCallback(callbackHandler).get(METHOD); 51 } 52 53 /** 54 * This method transforms given contents 55 * 56 * @param contents 57 * - input contents 58 * @param properties 59 * - input/output properties for transformation 60 * @return transformed contents 61 * @throws TransformationException 62 * - when error occurs while transforming content. 63 */ 64 public String transform(String contents, Map<String, String> properties) 65 throws TransformationException { 66 initTransformer(); 67 final long started = System.currentTimeMillis(); 68 69 contents = contents.replaceAll("<!–.*?–>", ""); 70 InputStream in = new ByteArrayInputStream(contents.getBytes()); 71 if (useTidy) { 72 in = tidy(in, (int) contents.length()); 73 } 74 75 // 76 final Source xmlSource = new StreamSource(in); 77 final ByteArrayOutputStream out = new ByteArrayOutputStream( 78 (int) contents.length()); 79 80 final Result result = new StreamResult(out); 81 String callbackHandler = properties.get("callbackHandler"); 82 if (callbackHandler == null) { 83 callbackHandler = Thread.currentThread().getName(); 84 } 85 final Map<String, String> props = new HashMap<String, String>(); 86 xslProperties.put(callbackHandler, props); 87 transformer.setParameter("callbackHandler", callbackHandler); 88 for (Map.Entry<String, String> e : properties.entrySet()) { 89 transformer.setParameter(e.getKey(), e.getValue()); 90 } 91 try { 92 transformer.transform(xmlSource, result); 93 } catch (TransformerException e) { 94 throw new TransformationException("Failed to transform " + contents, e); 95 } 96 properties.put(ACTION, getAction(callbackHandler)); 97 properties.put(METHOD, getMethod(callbackHandler)); 98 xslProperties.remove(callbackHandler); 99 return new String(out.toByteArray()); 100 } 101 102 private static final Map<String, String> getPropertiesForCallback( 103 String callbackHandler) { 104 final Map<String, String> props = xslProperties.get(callbackHandler); 105 if (props == null) { 106 throw new NullPointerException( 107 "Failed to find properties for callback " + callbackHandler); 108 } 109 return props; 110 } 111 112 // no synchronization needed, multiple initialization is acceptable 113 private final void initTransformer() { 114 if (transformer == null) { 115 try { 116 TransformerFactory transFact = TransformerFactory.newInstance(); 117 InputStream in = getClass().getResourceAsStream(xslUri); 118 if (in == null) { 119 throw new TransformationException("failed to find xslt " 120 + xslUri); 121 } 122 Source xsltSource = new StreamSource(in); 123 transformer = transFact.newTransformer(xsltSource); 124 } catch (TransformationException e) { 125 throw e; 126 } catch (RuntimeException e) { 127 throw e; 128 } catch (Exception e) { 129 throw new TransformationException( 130 "Failed to initialize XSL transformer", e); 131 } 132 } 133 } 134 135 private final InputStream tidy(InputStream in, int length) { 136 ByteArrayOutputStream out = new ByteArrayOutputStream(length); 137 Tidy converter = new Tidy(); 138 converter.setTidyMark(false); 139 converter.setXmlOut(true); 140 converter.setXmlPi(true); 141 converter.setXmlPIs(true); 142 converter.setNumEntities(true); 143 converter.setDocType("omit"); 144 converter.setUpperCaseTags(false); 145 converter.setUpperCaseAttrs(false); 146 converter.setFixComments(true); 147 converter.parse(in, out); 148 return new ByteArrayInputStream(out.toByteArray()); 149 } 150 } 151 152
Proxy
Following interfaces and classes show how GET/POST requests are proxied:Proxy Interface
1 package com.plexobject.web.proxy; 2 3 import java.io.IOException; 4 import java.util.Map; 5 6 public interface HttpProxy { 7 /** 8 * This method issues a GET or POST request based on method and URI URI specified in the ProxyState 9 * and adds given parameters to the request. 10 * 11 * @param state 12 * - proxy state 13 * @param params 14 * - name/value pairs of parameters that are sent to the get 15 * request 16 */ 17 public ProxyResponse request(ProxyState state, Map<String, String[]> params) 18 throws IOException; 19 } 20 21
Proxy Implementation
Following class implements HttpProxy interface using HTTPClient library:1 package com.plexobject.web.proxy; 2 3 import java.io.IOException; 4 import java.util.ArrayList; 5 import java.util.List; 6 import java.util.Map; 7 8 import org.apache.commons.httpclient.Cookie; 9 import org.apache.commons.httpclient.DefaultHttpMethodRetryHandler; 10 import org.apache.commons.httpclient.HttpClient; 11 import org.apache.commons.httpclient.HttpMethodBase; 12 import org.apache.commons.httpclient.HttpState; 13 import org.apache.commons.httpclient.NameValuePair; 14 import org.apache.commons.httpclient.cookie.CookiePolicy; 15 import org.apache.commons.httpclient.methods.GetMethod; 16 import org.apache.commons.httpclient.methods.PostMethod; 17 import org.apache.commons.httpclient.params.HttpMethodParams; 18 19 import com.plexobject.io.IoUtil; 20 21 public class HttpProxyImpl implements HttpProxy { 22 private static final int CONNECTION_TIMEOUT_MILLIS = 30000; 23 24 /** 25 * This method issues a GET or POST request based on method and URI URI specified in the ProxyState 26 * and adds given parameters to the request. 27 * 28 * @param state 29 * - proxy state 30 * @param params 31 * - name/value pairs of parameters that are sent to the get 32 * request 33 */ 34 public ProxyResponse request(ProxyState state, Map<String, String[]> params) 35 throws IOException { 36 if (state.getMethod() == MethodType.GET) { 37 return get(state, params); 38 } else { 39 return post(state, params); 40 } 41 } 42 43 44 /** 45 * This method issues a GET request on the URI specified in the ProxyState 46 * and adds given parameters to the request. 47 * 48 * @param state 49 * - proxy state 50 * @param params 51 * - name/value pairs of parameters that are sent to the get 52 * request 53 */ 54 private ProxyResponse get(ProxyState state, Map<String, String[]> params) 55 throws IOException { 56 GetMethod method = new GetMethod(state.getUri()); 57 method.setQueryString(toNameValues(params)); 58 return doRequest(state, params, method); 59 } 60 61 /** 62 * This method issues a POST request on the URI specified in the ProxyState 63 * and adds given parameters to the request. 64 * 65 * @param state 66 * - proxy state 67 * @param params 68 * - name/value pairs of parameters that are sent to the POST 69 * request 70 */ 71 private ProxyResponse post(ProxyState state, Map<String, String[]> params) 72 throws IOException { 73 PostMethod method = new PostMethod(state.getUri()); 74 method.setRequestBody(toNameValues(params)); 75 return doRequest(state, params, method); 76 } 77 78 private ProxyResponse doRequest(ProxyState proxyState, 79 Map<String, String[]> params, HttpMethodBase method) 80 throws IOException { 81 long started = System.currentTimeMillis(); 82 HttpClient client = new HttpClient(); 83 client.getHttpConnectionManager().getParams().setConnectionTimeout( 84 CONNECTION_TIMEOUT_MILLIS); 85 client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY); 86 method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, 87 new DefaultHttpMethodRetryHandler(3, false)); 88 89 HttpState initialState = new HttpState(); 90 for (Cookie cookie : proxyState.getCookies()) { 91 initialState.addCookie(cookie); 92 } 93 client.setState(initialState); 94 95 try { 96 int statusCode = client.executeMethod(method); 97 String contents = IoUtil.read(method.getResponseBodyAsStream()); 98 // 99 Cookie[] cookies = client.getState().getCookies(); 100 for (Cookie cookie : cookies) { 101 proxyState.addCookie(cookie); 102 } 103 104 return new ProxyResponse(statusCode, contents, proxyState); 105 } catch (RuntimeException e) { 106 throw e; 107 } catch (IOException e) { 108 throw e; 109 } catch (Exception e) { 110 throw new IOException("failed to process request", e); 111 } finally { 112 method.releaseConnection(); 113 } 114 } 115 116 private NameValuePair[] toNameValues(Map<String, String[]> params) { 117 if (params == null || params.size() == 0) { 118 return new NameValuePair[0]; 119 } 120 List<NameValuePair> nvPairs = new ArrayList<NameValuePair>(); 121 for (Map.Entry<String, String[]> e : params.entrySet()) { 122 String[] values = e.getValue(); 123 for (int j = 0; j < values.length; j++) { 124 nvPairs.add(new NameValuePair(e.getKey(), values[j])); 125 } 126 } 127 return (NameValuePair[]) nvPairs.toArray(new NameValuePair[nvPairs 128 .size()]); 129 } 130 } 131 132
ProxyState
Following class maintains URL, cookies, headers, and other information related to web request:1 package com.plexobject.web.proxy; 2 3 import java.io.Serializable; 4 import java.io.UnsupportedEncodingException; 5 import java.net.URLDecoder; 6 import java.net.URLEncoder; 7 import java.util.Collection; 8 import java.util.Date; 9 import java.util.HashMap; 10 import java.util.Map; 11 12 import org.apache.commons.httpclient.Cookie; 13 14 /** 15 * Class: ProxyState 16 * 17 * Description: This class stores state needed to make a proxy request including 18 * method type and cookies. 19 * 20 */ 21 public class ProxyState implements Serializable { 22 private static final long serialVersionUID = 1L; 23 private static final String DATA_DELIMITER = "\n"; 24 private static final String COOKIE_DELIMITER = ";"; 25 private static final String NULL = "null"; 26 27 private String uri; 28 private MethodType method; 29 private Map<String, Cookie> cookies; 30 31 /** 32 * Constructors for ProxyState 33 */ 34 public ProxyState(String uri, String method) { 35 this(uri, MethodType.valueOf(method)); 36 } 37 38 public ProxyState(String uri, MethodType method) { 39 this.uri = uri; 40 this.method = method; 41 this.cookies = new HashMap<String, Cookie>(); 42 } 43 44 /** 45 * @return uri 46 */ 47 public String getUri() { 48 return this.uri; 49 } 50 51 /** 52 * @return method 53 */ 54 public MethodType getMethod() { 55 return this.method; 56 } 57 58 /** 59 * @return cookies 60 */ 61 public Collection<Cookie> getCookies() { 62 return this.cookies.values(); 63 } 64 65 66 /** 67 * @param cookies 68 */ 69 public void addCookies(Collection<Cookie> cookies) { 70 for (Cookie cookie : cookies) { 71 addCookie(cookie); 72 } 73 } 74 75 /** 76 * @param cookie 77 * - to add 78 */ 79 public void addCookie(Cookie cookie) { 80 this.cookies.put(cookie.getName(), cookie); 81 } 82 83 public String getCookieString() { 84 StringBuilder sb = new StringBuilder(512); 85 for (Cookie cookie : cookies.values()) { 86 if (cookie.getDomain() != null) { 87 sb.append(cookie.getDomain()).append(COOKIE_DELIMITER); 88 } else { 89 sb.append(NULL).append(COOKIE_DELIMITER); 90 } 91 sb.append(cookie.getName()).append(COOKIE_DELIMITER).append( 92 cookie.getValue()).append(COOKIE_DELIMITER); 93 94 if (cookie.getPath() != null) { 95 sb.append(cookie.getPath()).append(COOKIE_DELIMITER); 96 } else { 97 sb.append(NULL).append(COOKIE_DELIMITER); 98 } 99 if (cookie.getExpiryDate() != null) { 100 sb.append(String.valueOf(cookie.getExpiryDate().getTime())) 101 .append(COOKIE_DELIMITER); 102 } else { 103 sb.append(NULL).append(COOKIE_DELIMITER); 104 } 105 sb.append(String.valueOf(cookie.getSecure())) 106 .append(DATA_DELIMITER); 107 } 108 return sb.toString(); 109 } 110 111 112 @Override 113 public String toString() { 114 StringBuilder sb = new StringBuilder(512); 115 sb.append(uri.toString()).append(DATA_DELIMITER); 116 sb.append(method.toString()).append(DATA_DELIMITER); 117 sb.append(getCookieString()); 118 return sb.toString(); 119 } 120 121 /** 122 * This method converts proxy state into string based serialized state 123 * 124 * @return string based serialized state 125 */ 126 public String toExternalFormat() { 127 try { 128 return URLEncoder.encode(toString(), "UTF8"); 129 } catch (UnsupportedEncodingException e) { 130 throw new IllegalStateException("failed to encode", e); 131 } 132 } 133 134 /** 135 * This method converts a string based serialized state into the proxy state 136 * 137 * @param ser 138 * - string based serialized state 139 * @return ProxyState 140 * @throws IllegalArgumentException 141 * - if serialized state is null or corrupted. 142 */ 143 public static ProxyState valueOf(String ser) { 144 if (ser == null) 145 throw new IllegalArgumentException("Null serialized object"); 146 String decoded; 147 try { 148 decoded = URLDecoder.decode(ser, "UTF8"); 149 } catch (UnsupportedEncodingException e) { 150 throw new IllegalArgumentException("Unsupported encoding " + ser, e); 151 } 152 String[] lines = decoded.split(DATA_DELIMITER); 153 if (lines.length < 2) 154 throw new IllegalArgumentException( 155 "Insufficient number of tokens in serialized object [" 156 + decoded + "]"); 157 ProxyState state = new ProxyState(lines[0], lines[1]); 158 for (int i = 2; i < lines.length; i++) { 159 String[] cookieFields = lines[i].split(COOKIE_DELIMITER); 160 if (cookieFields.length < 6) 161 throw new IllegalArgumentException( 162 "Insufficient number of tokens 6 in serialized cookies [" 163 + lines[i] + "]/[" + decoded + "]"); 164 String domain = cookieFields[0]; 165 if (NULL.equals(domain)) { 166 domain = null; 167 } 168 String name = cookieFields[1]; 169 String value = cookieFields[2]; 170 String path = cookieFields[3]; 171 if (NULL.equals(path)) { 172 path = null; 173 } 174 Date expires = null; 175 if (!NULL.equals(cookieFields[4])) { 176 expires = new Date(Long.parseLong(cookieFields[4])); 177 } 178 boolean secure = new Boolean(cookieFields[5]).booleanValue(); 179 Cookie cookie = new Cookie(domain, name, value, path, expires, 180 secure); 181 state.addCookie(cookie); 182 } 183 return state; 184 } 185 186 @Override 187 public boolean equals(Object o) { 188 if (this == o) 189 return true; 190 if (!(o instanceof ProxyState)) 191 return false; 192 final ProxyState other = (ProxyState) o; 193 if (uri != null ? !uri.equals(other.uri) : other.uri != null) 194 return false; 195 if (method != null ? !method.equals(other.method) 196 : other.method != null) 197 return false; 198 return true; 199 } 200 201 @Override 202 public int hashCode() { 203 int result; 204 result = (uri != null ? uri.hashCode() : 0); 205 result = 29 * result + (method != null ? method.hashCode() : 0); 206 return result; 207 } 208 } 209 210
ProxyResponse
Following class stores response from the HttpProxy interface:1 package com.plexobject.web.proxy; 2 3 import java.io.Serializable; 4 5 6 /** 7 * Class: ProxyResponse 8 * 9 * Description: This class stores proxy state and response. 10 */ 11 public class ProxyResponse implements Serializable { 12 private static final long serialVersionUID = 1L; 13 private int responseCode; 14 private String contents; 15 private ProxyState state; 16 17 /** 18 * Constructor for ProxyResponse 19 */ 20 public ProxyResponse(int responseCode, String contents, ProxyState state) { 21 this.responseCode = responseCode; 22 this.contents = contents; 23 this.state = state; 24 } 25 26 /** 27 * @return http response code 28 */ 29 public int getResponseCode() { 30 return this.responseCode; 31 } 32 33 /** 34 * @return XHTML contents 35 */ 36 public String getContents() { 37 return this.contents; 38 } 39 40 /** 41 * @return state associated with the proxy web request 42 */ 43 public ProxyState getState() { 44 return this.state; 45 } 46 47 @Override 48 public String toString() { 49 return this.responseCode + "\n" + this.state + "\n" + this.contents; 50 } 51 } 52 53
MethodType
Following class defines enum for http method types:1 package com.plexobject.web.proxy; 2 3 /** 4 * Class: MethodType 5 * 6 * Description: Defines supported method types for proxy request. 7 * 8 */ 9 public enum MethodType { 10 GET, POST; 11 } 12 13
Service Example
Following classes show how above HTTPProxy and ContentTransfomer interfaces can be used with Servlet/Portlet APIs:ProxyService Interface
1 package com.plexobject.web.service; 2 import javax.servlet.http.*; 3 import java.io.*; 4 5 public interface ProxyService { 6 public void render(HttpServletRequest request, HttpServletResponse response) throws IOException ; 7 public void submit(HttpServletRequest request, HttpServletResponse response) throws IOException ; 8 } 9 0
ProxyService Implementation
1 package com.plexobject.web.service; 2 import com.plexobject.web.proxy.*; 3 import com.plexobject.transform.ContentTransformer; 4 import javax.servlet.http.*; 5 import java.io.*; 6 import java.util.*; 7 8 9 public class ProxyServiceImpl implements ProxyService { 10 private HttpProxy httpProxy; 11 private ContentTransformer contentTransformer; 12 public ProxyServiceImpl(HttpProxy httpProxy, ContentTransformer contentTransformer) { 13 this.httpProxy = httpProxy; 14 this.contentTransformer = contentTransformer; 15 } 16 17 public void render(HttpServletRequest request, HttpServletResponse response) throws IOException { 18 String url = "http://plexrails.plexobject.com/guest_book/sign"; 19 ProxyState state = new ProxyState(url, MethodType.GET); 20 String inputXhtml = httpProxy.request(state, null).getContents(); 21 Map<String, String> properties = new HashMap<String, String>(); 22 properties.put("callbackState", state.toExternalFormat()); 23 String transformedXhtml = contentTransformer.transform(inputXhtml, properties); 24 response.getWriter().println(transformedXhtml); 25 } 26 27 public void submit(HttpServletRequest request, HttpServletResponse response) throws IOException { 28 String originalActionUrl = request.getParameter("originalActionUrl"); 29 String orginalMethodType = request.getParameter("orginalMethodType"); 30 ProxyState userState = ProxyState.valueOf(request.getParameter("userState")); 31 Map<String, String[]> params = request.getParameterMap(); 32 ProxyState state = new ProxyState(originalActionUrl, orginalMethodType); 33 state.addCookies(userState.getCookies()); 34 ProxyResponse proxyResponse = httpProxy.request(state, params); 35 response.getWriter().println(proxyResponse.getContents()); 36 } 37 } 38 39
Download Code
You can download above code from here.Acknowledgement
I would like to thank the folks at XSLT forum of Programmer-to-Programmer (http://p2p.wrox.com/forum.asp?FORUM_ID=79) for answering my XSLT questions.U.S. Killed 90, Including 60 Children, in Afghan Village, U.N. Finds
August 26th, 2008
Benefits of REST based services
August 14th, 2008
- separating reads from writes. I have worked on large ecommerce and travel website and one of the lesson is to keep your read/query services separate from your transactional services. REST APIs define separate operations for reads and write/updates.
- caching: you can find tons of off the shelf solutions for caching GET requests including hardware solutions. There are tons of features like ETags and cache headers that provide this feature.
- compression: Since REST uses HTTP, you can use compression such as gzip. This can improve the performance of the services.
- idempotency: GET, PUT, DELETE and HEAD are idempotent, which means if designed correctly the request can be retried without any worries about side effects. POST on the other hand is not idempotent and may have side effects.
- bookmarking: GET requests can be easily bookmarked. It is important not to use GET to change state of application.
- security: Though, security has been weakest area of REST as compared to SOAP, but HTTPS and simple authentication surfice. Though, there are better standards like oauth.
- big response size: REST/HTTP is the only service platform that I have seen supports gigabytes of responses. I have done a lot of CORBA based services in 90s, EJBs/SOAP in early 2000s and messaging based services for over ten years. None of those platforms support large size responses.
- simplicity: I find this is the biggest reason for using REST. I can use browser to call GET based requests and write client in any language.
- resources: REST response can include URIs for other APIs and client can change state through these resources. You can use XHTML to embed all these resources that can be easily tested with browsers.
- No need for additional jars: When I used CORBA, EJBs, RMI or JINI, I always had to put client/skeleton jar files. Having worked in large companies where I had to import dozens of these jar files became maintenance problem. With REST, I can simply call the service without importing anything.
- Error codes: HTTP comes with a number of response codes for real life services including thrashing requests such as server busy (503).
- Meta data: As opposed to CORBA, JINI, RMI services, I can pass meta data easily as HTTP supports predefined and user-defined headers. These headers can include information on authentication, quality of service, timeout or other context related data. Occasionally, I add Map<String,String> to APIs when I use Java based services, but it polutes pure APIs.
Love and Hate with Java
July 31st, 2008
For past few years, bashing Java has been really popular though some of the criticism has merits. But in general, due to its popularity, Java has become “the man” who tries to bring everyone down. There are millions of programmers who work for the “Java the man”. I saw recent post from NYU professor, who called Java-savvy college grads to tomorrow’s pizza delivery man. I know Joel Spolsky often mentions teaching C in unversities to help understand pointers and memory management. I agree with notion of teachings multiple languages in universities so that graduates have wide breadth of understanding with differant programming paradigms. I started learning programming back in 80’s on Atari and learned BASIC. I then moved to PC and learned GW-BASIC and then learned C, FORTRAN, Assembler, COBOL, Pascal, C++ in college. I also learned Lisp, Prolog, Perl on my own. In late 80’s and early 90s, I also learned DBase III, RPG and SAS, which was called 4th generation language. Similarly, C, FOTRAN, Pascal, etc. were called third languages, assembly languages were second generation languages. I learned Java in ‘95 when it came out and found it to be much easier to program than C/C++. I also learned Python, Ruby and Erlang for past few years and have been learning Haskell and Scala these days.
For most part, Java has been my primary language with some use of C++, Perl, Ruby, Python/Jython (and Erlang on my own). Though I wish I could use more Erlang but I don’t have same experience with Erlang as I have with Java. Over time, Java managed to take a lot of C/C++ share of the market. Also, Java has managed to buid large ecosystem with open source and commercial suites of libraries and frameworks. I often hear that Java is so enterprisy and popular in large companies, but truth is that Java has proven itself to be reliable language. Steve Yeggie also mentioned in his blog how Google primarily uses Java, C++, Python and Javascript.
I like the polyglot environment, where I write performance critical code in system language like Java and use Ruby/Python for high level glue code or web tier. I find often the criticism of Java is dishonest. For example, though people raves about metaprogramming in Ruby but forget to mention all the overhead that goes with it, not to mention security holes and memory leaks issues. The truth is that none of hot languages like Python, Ruby, Erlang, Haskell provide same performance as Java, in fact Java’s hotspot compiler beats C++ in production. I am going to ignore static vs dynamic language debate, but I’ve found static languages work better with large number of developers. Again, I like these languages, but I prefer to see some balanced comparison. The real reason Java is popular is because there are tons of jobs. Here is quick comparison of jobs in Java, C++, C#, Erlang, Haskell, OCaml, Ruby, Python and Factor:
As Bruce Lee said:
|
|
I find it, the way you can distinguish yourself is by learning more about the design and architecture of developing system and learning more about the ecosystem. It takes years to learn the ins and outs of programming language and all the tools and libraries with it. Though, I totally agree with learning a number of different languages like Haskell, Factor, Erlang, Scala, Groovy and I have been trying to learn all those for many years. However, for system language my first choice is still Java, simply because I have found it to be reliable and efficient language. As James Gosling said Java is a blue collar language. Sure it does not have closures (yet), actors, transactional memory, metaprogramming or AST/macros but it is well suited for building large applications by hundreds or thousands of programmers. I just started a large project in my division at Amazon, and sure enough I chose Java because I have been using it for over twelve years and I know it can do the job. It wasn’t simply because Java is safe choice (no one got fired for choosing IBM), but practically Java has more matured solutions for business needs. For example, my project needs to integrate with 20+ applications and is aimed at reducing manual work so it needed portal server, workflow engine, rules engine and messaging service and there are tons of options for those in Java community.
Finally, JVM is proving to be neat platform for building new languages like JRuby, Jython, Groovy, Scala, Clojure, etc. that can bring cool features and high interoperability with existing system. As Guy Steele said in his recent interview, you can’t expect one language to solve all problems.Reaction vs Preparedness
July 7th, 2008
Designing Microblogging system for Scalability
July 1st, 2008
Introduction
I have been a Twitter user for a while, have observed or heard about downtime and scalability problems with Twitter. The scalability of Twitter has become a topic for a lot of discussions and blogs and has also offered a useful excercise to design scalable systems. A common root cause as identified from Twitter’s blogs is that the architecture is based on CMS because it was written in Ruby on Rails and that is what Rails good at. The solution to the scalability problem as pointed by other people is messaging based architecture. There’s also been a lot of blame for Twitter’s problems on Ruby and Rails because Ruby is a slow language compared to other static and dynamic languages and Rails is not built for scalability. Though, there is some truth to it, but I don’t think there are the sole bottlenecks. In fact, I am going to show a small prototype written in Ruby and Rails (partially) that integrate with the messaging middlewares, which can be scaled easily. I have been using messaging middlwares such as CORBA Event service, IBM MQ series, Websphere, Weblogic, Tibco and ActiveMQ for over ten years and have long been proponent of messaging based sysems for scalable systems [1] [2]. So, I spent a few hours to put together a prototype based on messaging middleware and Ruby on Rails to see how such system can be developed.Design Principles
Before describing my design, I am going to review some of the design principles that I have used for building large scale systems ([1], [2]), which are:- Coupling/Cohesion - loosely coupled and shared nothing architecture and partitioning based on functionality.
- Messaging or SEDA architecture to implement reliable and scalable services and avoid temporal dependencies.
- Resource management - good old practices of avoiding leaks of memory, database connections, etc.
- Data replication especially read-only data.
- Partition data (Sharding) - using multiple databases to partition the data.
- Avoid single point of failure.
- Bring processing closer to the data.
- Design for service failures and crashes.
- Dynamic Resources - Design service frameworks so that resources can be removed or added automatically and clients can automatically discover them. Use virtualization and horizontal scalability whenever possible.
- Smart Caching - Cache expensive operations and contents as much as possible.
- Avoid distributed transactions, use optimistic compensating transactions (See CAP principle).
Architecture & Design
Following is high level architecture for the Microblogging system:
First, I selected REST architecture as an entry point to our system for both Web UI and 3rd party applications and used messaging middleware for implementing the sevices. This gives us ease of access with REST APIs and scalability with messaging. In my implementation I chose JRuby/Rails to implement most of the code, Derby for the database and ActiveMQ for the messaging store. In addition to scalability, the messaging middleware gives you a lot of advantages from functional languages like Erlang such as immutability, message passing, fault tolerance (via persistence queues). You can even build support for versioning and hot code swapping by adding version number to each message and creating routers (See integration patterns) to direct messages to different handlers.
APIs
Following are REST APIs that will be exposed to 3rd party apps, Web and other kind of UI:Create User
POST /users where the user information is passed in the body in the form of parameters.Login/Authenticate User
POST /users/userid/sessions This will authenticate the user and create a session. Note that most of following APIs send back session-id, will be stored in the database (sharded) and will be used to retrieve user information.Logout
HTTP-HEADER session-id DELETE /users/userid/sessionsGet User information
HTTP-HEADER session-id GET /users/userid This API will return detailed user informationAnonymous User information
GET /users/userid This API will return public user informationList of Followings
HTTP-HEADER session-id GET /followings/userid This API will return summary of people, the user is following.Create Followers
HTTP-HEADER session-id POST /followers/followerid This API will create one-way follower relationship between the user and follower.Enable notification for Followers
HTTP-HEADER session-id POST /followers/followerid/notificationsDisable notification for Followers
HTTP-HEADER session-id DELETE /followers/followerid/notificationsBlock Followers
HTTP-HEADER session-id POST /followers/followerid/blockingUnblock Followers
HTTP-HEADER session-id DELETE /followers/followerid/blockingFollower Exist
HTTP-HEADER session-id GET /followers/followerid This API will return 200 HTTP code if follower exist.List of Followers
HTTP-HEADER session-id GET /followers This API will return summary of people, the user is following.Archive Messages
HTTP-HEADER session-id GET /messages?offset=xxx&limit=yyy&since=date This API will return archived messages for the user, where offset and limit will be optional.DELETE Messages
HTTP-HEADER session-id DELETE /messages/message-id This API will return archived messages for the user, where offset and limit will be optional.Send Direct Messages
HTTP-HEADER session-id POST /directmessages/targetuserid This API will return send direct message to the given user.Send Reply
HTTP-HEADER session-id POST /reply/message-id This API will return reply for the given message-id and pass the contents of the message in the body (as parameters).Direct Messages Received
HTTP-HEADER session-id GET /directmessages/userid?offset=xxx&limit=yyy&since=date This API will return messages received by the user.Replies Received
HTTP-HEADER session-id GET /replies?offset=xxx&limit=yyy&since=date This API will return replies received by the user.Update Status
HTTP-HEADER session-id POST /statuses This API will update status of the user and pass the contents of the message in the body (as parameters).Get Statuses
HTTP-HEADER session-id GET /statuses?offset=xxx&limit=yyy&since=date This API will update status of the user and pass the contents of the message in the body (as parameters).User Timeline
HTTP-HEADER session-id GET /timeline/username?offset=xxx&limit=yyy&since=date This API will return timeline of the user. This API will compare given username with the authenticated username and will return detailed timeline if match, otherwise it will return public timeline.Public Timeline
GET /timeline/username?offset=xxx&limit=yyy&since=date This API will return public timeline of the user.Friends Timeline
HTTP-HEADER session-id GET /friendstimeline?offset=xxx&limit=yyy&since=date This API will return timeline of the friends of the user.Request Flow
Here is an illustration of how information is flowed through different components:Though, I am not showing request flow of all APIs, but they will follow similar pattern of flow.
Detailed Design
Domain Classes
Primary domain classes are:- User
- Message, which has four subclasses DirectMessage, ReplyMessage, Tweet and Status for various kind of messages in the system.
- Follower - creates one-way relationship between two users, where follower can choose to be notified when the user changes his/her status.
Schema
Followers1 2 class CreateFollowers < ActiveRecord::Migration 3 def self.up 4 5 create_table :followers do |t| 6 t.column :username, :string, :limit => 16 7 t.column :follower_username, :string, :limit => 16 8 t.column :relation_type, :string, :default => ‘Follower’, :limit => 32 9 t.column :blocked, :boolean, :default => false 10 t.column :notifications, :boolean, :default => false 11 t.column :created_at, :datetime 12 t.column :updated_at, :datetime 13 t.column :deleted_at, :datetime 14 end 15 add_index :followers, :username 16 add_index :followers, :follower_username 17 end 18 19 def self.down 20 drop_table :followers 21 end 22 end
Messages
1 class CreateMessages < ActiveRecord::Migration 2 def self.up 3 create_table :messages do |t| 4 t.column :message_id, :string, :limit => 64 5 t.column :type, :string, :limit => 32 6 t.column :message_type, :string, :default => ‘Say’, :limit => 32 7 t.column :reply_message_id, :string, :limit => 64 8 t.column :username, :string, :limit => 16 9 t.column :channel_name, :string, :limit => 32 10 t.column :message_body, :string, :limit => 140 11 t.column :favorite, :boolean, :default => false 12 t.column :sent_at, :datetime, :default => Time.now.utc 13 t.column :created_at, :datetime 14 t.column :deleted_at, :datetime 15 end 16 add_index :messages, :message_id 17 add_index :messages, :username 18 end 19 20 def self.down 21 drop_table :messages 22 end 23 end
Users
1 class CreateUsers < ActiveRecord::Migration 2 def self.up 3 create_table :users do |t| 4 t.column :username, :string, :limit => 16 5 t.column :password, :string, :limit => 16 6 t.column :name, :string, :limit => 64 7 t.column :email, :string, :limit => 64 8 t.column :time_zone_id, :string, :limit => 32 9 t.column :created_at, :datetime 10 t.column :updated_at, :datetime 11 t.column :deleted_at, :datetime 12 end 13 add_index :users, :username 14 end 15 16 def self.down 17 drop_table :users 18 end 19 end
Persistence
I used Rails’ ActiveRecord library to provide persistence, though alternatively I could have used ActiveHibernate. These libraries provide a quick way to add persistence capabilities with minimum configuration and boilerplate. This prototype is using multiple levels of partitioning, first at the service level, second at the persistence level. I am using multiple databases of Derby to store objects using a simple hashing scheme for load distribution. This prototype also shows how to connect to multiple databases in Rails, which was difficult in early implementation of Twitter.Domain Services
The core model and services use domain driven design and applies principles of fat model and thin service (as opposed to fat servicess and anemic model). The domain services implement external REST APIs and use underlying ActiveRecord for most of the functionality.Messaging Middleware
The REST based web services don’t invoke domain services directly, instead they use messaging middleware. In real application, I might use ESB/integration patterns such as intelligent routing to partition the system across multiple machines and send the request to the suitable queue. In this prototype, I am simply using ActiveMQ, which is fairly robust and easy to use. I am also using separate queues for different kind of operations. Another lesson I have learned in building large systems is to separate reads from the writes so that you can scale them independently and also offer different quality of services, e.g. read queues can be non-persistent, but write queues can be persistent.Business Delegate
The REST based web services don’t interact with the messaging middleware directly, instead they use business delegates that hides all details of sending out message, creating temporary queues and receiving messages. The interface of business delegates is same as services.Benchmark Results
Though, performance was not the objective of my prototype, but I tried to check how many requests I can process on my development machine. I chose only to benchmark messaging middlewares and not REST server because JVM uses native threads and web containers such as Tomcat uses a small sized thread pool to perform requests. Since, our architecture is heavily IO-bound, that would not scale. Alternatively, I could have build reactive or event based APIs for HTTP or use Yaws/Mochiweb as a container for REST based web sevices because creating a process in Erlang is pretty cheap. For example, Erlang process takes 300 bytes, whereas Java thread take 2M by default (though, it can be reduced to 256K on most machines). Here are results of running a simple server with embedded ActiveMQ and load test both running JRuby on my Pentium Linux machine. I used default VM size for both JRuby processes and didn’t tune any options:| What | Elapsed Time (secs) | Throughput | Invocation Times |
| load_test_create_users |
