web content translation • 1
Aventail Web Translation
Web Developer Guide
May 2007
© Aventail Corporation 2007. All rights reserved.
Aventail, Aventail.Net, AventailExtraNetCenter, Aventail ExtraWeb, Aventail ExtraNet, Aventail Connect, and their respective logos are trademarks, service marks or registered trademarks of Aventail Corporation.
Other product and company names mentioned in this publication are the trademarks of their respective owners.
Table of Contents
Overview
Introduction
Version Compliance
Who is this document for?
How does the Aventail Web Translation server works?
Content-Type of Web Pages
Recommendations
Character Encoding
Recommendations
Cookie translation
Recommendations
URLs
Recommendations
HTML translation
Recommendations
CSS translation
JavaScript translation
Translation rules
Recommendations
VBScript translation
Java applet, ActiveX and Flash translation
Recommendations
XML translation
Recommendations
Web aliases
Recommendations
Miscellaneous
Referrer lookup
Overview
Introduction
A truly clientless VPN appliance requires a robust web-content translation engine. The reason is simple: all network references within the web content must be changed to point to the VPN appliance instead of internal hosts. With full-client VPNs or pseudo-clientless VPN appliances that use web-deployed ActiveX or Java clients, this host mapping can be done on the client. For VPN use on the broadest possible browser base however, web content translation is indispensable.
A simple example is effective in illustrating the translation of web content. Imagine an HTML page with the following anchor tag that links to an internal resource:
<a href=“ Web Access</a>
Within the corporate network, such a link works perfectly. When the user clicks on the link in the browser, the latter asks the internal DNS server what the IP address of “owa.in.aventail.com” is and retrieves the desired page.
Outside the corporate network however, say at an employee’s home, this link does not work. The browser asks the DNS server of the local ISP what IP address corresponds to “owa.in.aventail.com” and is told that that address doesn’t exist. Even if the link were to a routable IP address within the corporate network, the corporate firewall would probably prevent the browser from accessing the desired resource.
Web content translation is the process of changing (translating) the link above into something like:
<a href=“ Web Access</a>
The hostname is changed from the internal hostname to the DNS-resolvable hostname of the VPN appliance. However, the appliance doesn’t hold the desired resource; therefore that end resource must be encoded in some way within the URL. In our example, it is encoded within the path portion of the URL.
If the only kind of translation necessary were a translation of HTML links such as the above, things would be easy. This unfortunately is not so. There are numerous ways to reference network resources in HTML alone. Javascript, the now-ubiquitous web scripting language, augments the scope of the problem tremendously. Javascript in fact makes the problem intractable. It provides means of executing code on the browser and it allows the user to feed in additional input that is unknown at the time the server-side translation is done. For example, the user can be prompted for a URL using Javascript and the browser can then be instructed to go to that URL.
Version Compliance
Users of the document must note that this document is updated for each ASAP version that is released by Aventail. Please check the version you are running on Aventail box is in compliance with that of this document.
This document is valid for releases ASAP 8.6 through ASAP 8.8.
Who is this document for?
This document is for Web Application Developers who wish to make their software easy to translate by the Aventail translation engine. It provides a set of guidelines to achieve this goal and gives a brief overview of certain aspects of the translation engine.
How does the Aventail Web Translation server works?
The Aventail Web Translation server is part of the Aventail VPN appliance which sits at the network perimeter. It isolates and protects private Web-based resources from unauthorized external access.
A user first logs in to the Aventail appliance and is presented with the Workplace page. The user then follows a link on that page to request a resource from the internal network, or enters a URL on the Workplace page. All URLs point to the Aventail appliance.
The Aventail Web Translation server translates an incoming URL using an "alias" contained in theURL. Aliases are used to obscure the URLs that point to resources on your internal (or“downstream”) servers. Because all requests are directed to the Aventail appliance, theuser only sees the incoming URL that contains the alias. The Aventail Web Translation server matchesthe alias to a list it stores in memory and translates the URL.
Once it determines that the URL submitted by the user is valid and points to aresource on the network, the Aventail appliance checks its access control and authentication rulestomake sure the user is authorizedto access the requested resource.
Content-Type of Web Pages
Although the Aventail translation engine possesses heuristics to guess the type of content in an HTTP response from the backend web server, it is best to avoid relying on this and to instead specify the type explicitly.
Recommendations
The single most important thing you can do to ensure proper translation is to make sure that all pages are served up with the correct “Content-Type” header. In particular, it is imperative that:
- HTML content isserved up with the “text/html” Content-Type.
- Javascript contentis served up with the “application/x-javascript” Content-Type.
- XML content is served up with the “text/xml” Content-Type.
Character Encoding
As an internationalized network device, the Aventail appliance uses UTF-8 exclusively for its internal work.
Recommendations
- Use UTF-8 exclusively for all your Web content. Do not use the Microsoft code-pages. This particularly important when POSTing form data.
Cookie translation
The path portion of a “Set-Cookie” header is translated. The domain portion of this header is discarded. For example, if the backend web server sends the header:
Set-cookie: x=y; path=/; domain=.in.aventail.com
and the alias associated with the web resource is “morty”, then this header is translated to:
Set-Cookie: x=y; path=/morty/
This forces the web browser to send this cookie back only to the alias (and therefore the web server) that set the cookie.
Recommendations
- Avoid sophisticated client-side cookie manipulations using Javascript
- Avoid using URLs in cookies. Although an attempt is made to translate those URLs, there is some risk of letting them through.
URLs
The Aventail translation engine can handle URLs in any form:
- Fully-qualified URLs (e.g. “
- Absolute paths (e.g. “/dir1/dir2/file.html”)
- Relative paths (e.g. “../dir2/file.html”)
Recommendations
- It is best to use relative paths exclusively in your web application. This of course also has the advantage of making your web application more portable (e.g. to another web server and directory).
HTML translation
HTML translation is handled very reliably by the Aventail appliance.
Recommendations
- Make sure your HTML is formatted according to standard, especially the quotes around attributes in tags. Ideally, use XHTML formatting. HTML attributes containing a value (for example, src="path") may not be translated if they contain any of the following errors:
- Spaces before or after the equal sign.
src ="path" or src= "path"
- Leading or trailing spaces within the value.
src=" path" or src="path "
- Missing lead or end quotation mark.
src="path or src=path"
- Avoid base tags, such as:
<base href=" />
in your HTML code.
- The “meta” tag is commonly used to redirect users to another page. For example:
<meta http-equiv="refresh" content="5;url=redirectURL.html" />
The meta tag’s content attribute must be formatted carefully; don’t include line breaks or spaces.
CSS translation
CSS content should be handled without difficulty.
JavaScript translation
JavaScript translation is complex and there are certain coding practices that you can use to make sure your JavaScript code translates correctly.
Translation rules
The current Aventail JavaScript translation engine is a parse-tree based engine that can handle complex syntax. It is a rule-based translator that makes use of Aventail’s client-side JavaScript library. The rules are stored in:
/usr/local/extranet/etc/jstrans.cfg
The translation rules are divided into four categories:
- Assignment statements (type ASSIGNMENT)
- Function calls (type CALL)
- Substitution of one language token with another (type SUBSTITUTION)
- Special kind of substitution in a function call (type SUBARGS)
You should not need to write any new rules. It is however useful to be aware of the rules as you follow the recommendations below.
Here are the majority of the JavaScript rules as of September 2006:
# Javascript Translation
# Assignment Statement Translation
#
# TypeLeft Hand Side (LHS)Encapsulate RHS with
#
ASSIGNMENTlocationaventail.translate_url
ASSIGNMENT.locationaventail.translate_url
ASSIGNMENT.hrefaventail.translate_url
ASSIGNMENT.srcaventail.translate_url
ASSIGNMENT.actionaventail.translate_url
ASSIGNMENTdocument.domainaventail.setDomain
ASSIGNMENTdocument.cookieaventail.setCookie
ASSIGNMENT.innerHTMLaventail.postText
ASSIGNMENT.urlaventail.translate_url
# Function Call Translation
#
# TypeFunction NameParamEncapsulate param with
#
CALL.addBehavior1aventail.translate_url
CALL.showModalDialog1aventail.translate_url
CALL.showModelessDialog1aventail.translate_url
CALL .insertAdjacentHTML 2 aventail.postText
CALLlocation.replace1aventail.translate_url
CALLlocation.assign1aventail.translate_url
CALLeval1aventail.post
# Subsitution of one token with another
#
# lvalue/rvalue: 0: substitute always
# 1: substitute only if token is an rvalue (read from)
# 2: substitute only if token is an lvalue (written to)
#
# TypeTokenlval/Replacement
#rval
SUBSTITUTIONlocation.pathname0aventail.location.pathname
SUBSTITUTION.location.pathname0.aventail.location.pathname
SUBSTITUTIONdocument.domain1document.aventail.getDomain()
SUBSTITUTIONdocument.domain2aventail.junk
SUBSTITUTION.execCommand0.aventail.execCommand
SUBSTITUTIONlocation.pathname0aventail.location.pathname
SUBSTITUTION.location.pathname0.aventail.location.pathname
SUBSTITUTIONlocation.host0aventail.location.host
SUBSTITUTION.location.host0.aventail.location.host
SUBSTITUTIONlocation.hostname0aventail.location.hostname
SUBSTITUTION.location.hostname0.aventail.location.hostname
SUBSTITUTIONlocation.port0aventail.location.port
SUBSTITUTION.location.port0.aventail.location.port
SUBSTITUTIONlocation.protocol0aventail.location.protocol
SUBSTITUTION.location.protocol0.aventail.location.protocol
SUBSTITUTIONlocation.href1aventail.location.href
SUBSTITUTION.location.href1.aventail.location.href
SUBSTITUTIONlocation.search1aventail.location.search
SUBSTITUTION.location.search1.aventail.location.search
SUBSTITUTIONlocation1aventail.location
SUBSTITUTION.scripts1.aventail.getScripts()
# Subsitution of one token with another, with a twist:
# Take the "stem" of the call and make it the first argument in the new function.
# For example:
# If we have the token "foo.bar" and the replacement "aventail.ourFoo":
# We will replace the construction "anObject.foo.bar(arg1, arg2)" with:
# aventail.ourFoo(anObject, arg1, arg2)
# This allows us to verify the type of the anObject object prior to operating on it
#
# lvalue/rvalue: 0: substitute always
# 1: substitute only if token is an rvalue (read from)
# 2: substitute only if token is an lvalue (written to)
# 3: special case, turn a flat lvalue into a function call
#
# The "3" case above is used in cases such as "foo.location" to allow us to ensure
# that "foo" is an object such as a document, window, or frame, and not some user-defined
# object that just happens to have a "location" member.
#
# TypeTokenlval/Replacement
#rval
SUBARGSdocument.close0aventail.docClose
SUBARGSdocument.write0aventail.docWrite
SUBARGSdocument.writeln0aventail.docWrite
SUBARGS.open0aventail.objOpen
SUBARGS.Open0aventail.objOpen
SUBARGS.location3aventail.objLocation
Recommendations
- Do not use DOM references as variables names. For example, do not call any of your variables “location”. See the translation rules above to know what to avoid.
- Avoid the “with” construct: with(object) {statements}.
- Avoid passing DOM objects as parameters to functions. For example, avoid writing functions of the form:
function test(mywin) { mywin.location = “ }
Instead, make sure that the network-sensitive javascript appears verbatim, e.g.
window.location = “
In other words, do not hide the names of the underlying DOM objects.
- Do not set a base tag using JavaScript. This invalidates all the translated URLs on the page.
- Do not use conditional compilation for Internet Explorer (e.g. “@if …”)
- Do not use Microsoft Script Encoding (e.g. language “JScript.Encode”)
VBScript translation
VBScript translation is no longer supported.
Java applet, ActiveX and Flash translation
No explicit translation of Java applets, ActiveX or Flash objects is performed. If possible, avoid using them entirely.
Recommendations
- If it is not possible to avoid using these objects entirely, consider constructing the network references they need from the URL of the page they are on. Perform this construction dynamically at run time.
XML translation
Since XML needs to be described to make sense of the data, you will need to identify the portions of the XML content that require translation. This is done in the file:
/usr/local/extranet/etc/custom-xmltrans.cfg
The format of the rules to add to this file is:
ELEMENT ATTR1 ATTR2 ... ATTRn
This instructs the translation engine to look for element “ELEMENT” in the XML and to translation its attributes “ATTR1”,“ATTR2”,...,“ATTRn”. These attributes are URLs, of course.
Recommendations
- Add your XML translation rules to custom-xmltrans.cfg
Web aliases
Web aliases are declared when you configure a resource. They are used to hide the hostname of the internal server.
Recommendations
- Avoid using the same name for the alias as for the top level directory of your application. For example, if your web appliance lives in “ do not use “coolapp” as the alias for the Aventail resource.
Miscellaneous
Referrer lookup
When a request for an absolute or relative URL for which there is no matching alias comes in, the Aventail Web Translation server looks at the “Referer” HTTP header or the referrer cookie that it sets. This headeror cookie is used to correctly assemble thedestination URL. This is a best effort attempt and should not be relied upon.