Index

Abstract

Background

Introduction

Specification

Web Platform Component Research

Linux

Apache

Access to Web Services API (REST, XML-RPC and SOAP)

REST

XML-RPC

SOAP

Comparison MySQL and PostgreSQL

PHP and Perl

Overview of Design

Description of the Subsystems

Hardware......

Software

Description of Entire Design

How the subsystems work together

Load Testing Software

Implementation

Installation and Configuration of Web Server, PHP and MySQL Server

Prototype Application

Amazon Web Services with REST

XML-RPC

SOAP

The SOAP Envelope Element:

The xmlns:soap Namespace:

The SOAP Header Element:

The actor Attribute:

Testing and Evaluation

Conclusion and Recommendations

Appendix

References

Figures

Figure 1: Design Overview

Figure 2: Dual Xenon processor server

Figure 3: Overall system design

Figure 4: Shop.php

Figure 5: ShopCart.php

Figure 6: Login.php

Figure 7: Guest_reg.php

Figure 8: Register.php

Figure 9: Order.php

Figure 10. Load testing stages

Figure 11. Example of analysis report

Figure 12: The different Graphical Analysis Reports

Tables

Table 1: Top web server developers (November 2004)

Table 2: Machines Used to test

Table 3: Components of platform used to test application

Abstract

AOL needed to change its web platform to match with the standard industry web platform. The motivation was to reduce the cost of development and maintenance. AOL is interested in how the platform choice decision is made. The project is to evaluate different sets of web platform (operating systems, web servers, databases, and web applications) and test their performance to determine the best combination using a load benchmarking software. It is recommended that AOL switches to a web platform that consists of the following: Linux as an operating system, Apache as a web server, MySQL as database.

Background

AOL is one of the largest companies providing Internet services in the United States and has its own proprietary web platform, which is the underlying software for web applications. AOL has 22 million dial-up and 2.5 million broad band customers today. In 2003, the company lost 2.2 million dial-up customers as more and more people switched to broadband [1]. It is estimated that half of the online home connections will use broadband by 2005 [2]. When AOL started its services, the Internet was not commonly used. Therefore, AOL did not consider the need to communicate with other servers, such as Amazon or MSN. It only serviced its own customers, so it just selected and developed one platform. Nowadays, the Internet services all over the world use a standard web platform. In addition, new employees are not familiar with the AOL non-standard platform, so AOL has to spend two to three months to train them. AOL has recently realized that it needs to replace its own platform with a standard one in order to be compatible with the protocols of widely used web browsers, such as Internet Explorer and Netscape Navigator. At present, AOL uses different kinds of hardware and software systems for its web servers, mail servers, and database servers. To be efficient and, thus, cost-effective, AOL has decided that it is necessary to change to one standard platform.

Introduction

AOL provides a premium online experience to both narrowband and broadband users through different technologies and products; one of the most familiar is the Windows AOL client (WALL). These services are built using standard operating system clients such as Win32 or its equivalent in MAC. AOL uses a custom engine that handles all the networking, display, and the protocol parsing called FDO (Forms Document Objects), which was created 12 years ago. This in-house web platform technology works well in narrowband connections such as dial-up modems; however, currently the entire Internet is constructed using very different standards than AOL’s FDO platform. The Internet is commonly constructed using HTTP platform as a networking transport layer, with HTML, DHTML, Java Script, and other languages used to present the contents and applications to the users. Therefore, there has been a movement inside AOL to shift from its own FDO web platform infrastructure to standard industry web platforms. Since October 2003, AOL has started to research which platform it should adopt.

Specification

The set up for our design solution must consider the following pre-set constraints:

The design must be implemented using a Unix/Linux Operating System.

The web application must use HTTP over port 80.

The web application should be written with a standard web service API (REST, XML-RPC, or SOAP).

The web applications must access a database.

The applications must be able to communicate with other web service APIs.

Web Platform Component Research

Linux

Comparing Redhat (Fedora), Debian and Mandrake. All of the above Linux have some advantages and some disadvantages over one another.

Installation: Redhat and Mandrake are really easy to install as they take 10-15 minutes for the complete installation. RedHat has a nice graphical install with kickstart to boot. On the other hand, Debian takes a lot of time to install. During the installation process, although it's a text-based install with no partitioning tools, quite often the options are reasonable and intelligent. All that is to be done during the installation is to press the enter key.

Ease of use: As far as the ease of use is concerned, out of all the linux, Mandrake is the easiest--especially in the case of dual booting and disk partitioning. (Dual booting means installing multiple operating systems on your computer, such as windows and linux. When you boot the computer, you have a choice of which one you want to run. Disk partitioning is deciding where on your hard disk different parts of your operating system go). Red-hat is far flashier but prone to screwups where debian is the conservative, slowly evolving platform and is not easy to use. Debian is also easy to use and is easy to update, which is good for security.

Stability: Debian has advanced features and is the most stable Linux as it is very easily upgradeable. Whereas Redhat on the other hand lacks these features and is not as stable as other two. Due to this reason, Debian is preferred over Redhat.

Selection: What packages come with it? And more importantly, how easy is it to update those packages? RedHat comes with rpm and up2date, both very, very new packages. Up2date requires its GUI counterpart, up2date-gnome, and python as well. I've had constant problems with both of them, not to mention the fact that I have to register with them to use it (like Microsoft's Passport) and there's only one mirror site (like Microsoft's Windows Update). On the other hand, Debian has 5 times as many packages on its network as RedHat, and one more CD on its latest distro than RedHat 7.2. And they can be downloaded from any of the mirrors mentioned above, using apt-get. One simple advantage over rpm -- if rpm can't find a package to satisfy a dependency, you have to go get it yourself, or use up2date, which is, on top of everything else, very unreliable. Up2date can find the latest package from a number of sources, including CDs, hard disk partitions, or even the Internet, and if it needs another package, it gets it automatically.

Compatibility: RedHat has its own alternative of the kernel and its own version of gcc to work with that kernel. Not that that's bad, but it could be. But, Debian is developed by tons of volunteers, from around the world. RedHat? It gets a lot of support from open-source, but is mainly a proprietary distro, even if it is "free software".

Corporate Standpoint: From a corporate standpoint, Debian's main drawback is the lack of a company to support it. It's free software in the truest sense. On the other hand Mandrake and Redhat have a company to support them.

Business Standpoint: Debian is widely used as far as the business standpoint is concerned. This is because of their Free Software Guidelines, which are a critical component from a business standpoint. They specify the requirements for licenses of any package that is to be included with Debian. Debian conforms to the official GNU version of free software, which means that every package included in Debian can be redistributed freely. Mandrake and Redhat lacks features from the business standpoint.

Packages: APT (Advanced Package Tool) is Debian's answer to the Red Hat Package Manager (RPM). It can be used to add, update, and remove software either locally (from CDs) or from remote FTP servers. Once properly configured, a system can check for updates and install the latest versions of all system software with one command.

Upgrade: It is difficult to upgrade a system from one RedHat release to another. Debian provides simple migration paths that are well trodden. No more re-installing the operating system just to upgrade to the new release. Debian's tools have the ability to do recursive upgrades of systems.

Open Source: Debian, as compared to Redhat and Linux, is much more dedicated to the open source way and part of its name is gnu.

Applications/Utilities: The only downside to debian is not all the apps, utilities and programs are going to be the very latest like you get with RedHat, but you get the stability and lack of segmentation fault errors you do with RedHat.

Performance: Debian runs on more hardware platforms than any Redhat or Mandrake.

Connection: Debian has a utility to install RedHat packages but it’s not vice versa.

Development of Software: Debian is the best choice for the development of software for all distributions of GNU/Linux as compared to Redhat and Mandrake. Because Debian's processes, in terms of policies and packaging, are fair and visible and open standards conforming, Debian is a very clean and very carefully constructed distribution. Developments that occur on a Debian platform can thus easily be delivered or transferred to other GNU/Linux (and UNIX) platforms.

RedHat uses a binary database for its package data while Debian uses text files. Debian is more robust (if a single file gets corrupted it's less of a problem) and it is possible to fix or modify things by hand using a normal text editor

Apache

Apache Server has been the most admired web server on the Internet since 1996. Apache become more widely used than all other web servers combined according to current surveys, which indicate that over 67% of the web sites on the Internet are using Apache servers. (Table 1) [3].

Developer / October 2004 / Percent / November 2004 / Percent / Change
Apache / 37620349 / 67.92 / 38028642 / 67.77 / -0.15
Microsoft / 11679222 / 21.09 / 11923566 / 21.25 / 0.16
Sun / 1685325 / 3.04 / 1761705 / 3.14 / 0.10
Zeus / 748561 / 1.35 / 739006 / 1.32 / -0.03

Table 1: Top web server developers (November 2004)

Apache server runs on a wide variety of platforms and can be used for Windows, Mac OS, and many Unix flavors, as well as in source-code form, One of the key features in apache servers is its modular design that make inserting any new feature easier to the platform without touching the core system.

Apache has been known to be more stable, more feature-enhanced, and considerably faster than many other web servers through its broad use by the web community. Even though some commercial servers claim to surpass Apache's speed, none of their benchmarks are a precise method to measure web server speed [4].

Apache server has been used on sites that handle millions of clients per day without experiencing any performance difficulties. The fact that Apache server is free of cost gives it the upper hand when comparing it with extremely fast web server that can cost thousands of dollars. “Most Apache servers run on Unix. Linux is the most popular followed by Sun Microsystems' Solaris. Apache is the leading Unix server and Linux is becoming the most popular operating system on the internet.” Mike Prettejohn, director of Netcraft [5].

Apache server is a great choice from the developer point of view. It directly supports CGI scripts and server-side includes. It also provides support for Perl, PHP, emulated ASP, and other page-generation and Web-scripting languages through its wide variety of modules as well as NSAPI and Java servlet support.

Access to Web Services API (REST, XML-RPC and SOAP)

REST

REST stands for Representational State Transfer. It is a kind of network software architecture. REST has a notion of ‘resource’. Any thing that can be referenced by Uniform Resource Identifier (URI) is a resource. Uniform Resource Locator (URL) is a URI. REST is a ‘stateless’ protocol.

The goals of REST are:

  • scalability of component interactions,
  • generality of interfaces,
  • independent deployment of components
  • intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems.

REST is evolved from academic research.

XML-RPC

It is a remote procedure call using HTTP as transport andXML as encoding.

SOAP

Simple Object Access Protocol

It is an XML based protocol, which is designed to communicate between two different systems on the web.

SOAP is created by vendors in order to address the problems of accessing objects through firewalls. Some creators of SOAP see it as the RPC (Remote Procedure Call) middleware that uses web protocols.

SOAP consists of WSDL (Web Services Description Language), which describes Web Services and how to access them. SOAP is descriptive. Web standards and architectures are prescriptive. [Prescod].

SOAP is gaining extensions for security, routing, and many other aspects that may not be addressed in HTTP. Even if they are, they may not be addressed robustly in HTTP.

80% of Amazon Web Services API uses REST.

Comparison MySQL and PostgreSQL

  • Features. Here PostgreSQL has the upper hand. The stable version of MySQL does not support subqueries, stored procedures, subqueries, cursors or views, all of which PostgreSQL does. One of their more serious mistakes was for the MySQL developers to justify the exclusion of many of these features (and even more fundamental features such as referential integrity, still only partially integrated) by claiming that they were not necessary. Of course this is true in many cases, but to harden DBA's many of these features are vital, and this lack gave MySQL a reputation as a 'toy' database, from which it is still recovering today. However, many of the contributors to the flame wars mentioned above have not been keeping up with what MySQL now offers - it does support transactions and referential integrity, in spite of what people read all the time! People can see the roadmap here. MySQL has committed to implementing all ANSI-SQL (standard SQL) features, so in about 2 years MySQL should support all of the listed features. So PostgreSQL seems to 'win' this one, but people need to consider whether people actually need these features. The Open Source databases claim the database market is becoming commoditized, and most databases offer all the features people need. So the other factors assume a greater importance.
  • Support. Support can mean many things. MySQL is much more widely used, so many more applications support MySQL, and there is a larger community ready to assist people with problems, as well as more books and resources on MySQL. MySQL, the commercial company guiding MySQL, and who employ most of the developers, offer various levels of support contracts. Of course, PostgreSQL has active mailing lists, and there are commercial companies offering support as well, so people are not likely to go too far wrong with either.
  • Ease-of-use. Another highly contentious issue. Debate usually goes along the lines of "A: MySQL/PostgreSQL is much easier to use because... B:You idiot. PostgreSQL/MySQL is just as easy because...". Often it is simply whichever one the person uses is the one that is easiest to use, which is not that helpful. An astronaut may find flying the space shuttle easier than writing a document on a PC, but that tells us more about them, not about how easier we would find either. If people are migrating to one of the databases, it depends where people come from. And, it depends on what people are doing. If people regularly use sub-selects or triggers, rewriting them in MySQL or a scripting language will seem unecessarily complex. PostgreSQL's extra functionality can translate into complexity if people do not require any of it. It also depends on what tools people are using - phpMyAdmin for MySQL is a well-developed tool, while phpPgAdmin is not as fully-featured. So if people are looking for a web interface in PHP, and for none of the features MySQL lacks, MySQL would be their choice here. But perhaps people do not need the extra features of phpMyAdmin? They both do everything people want!
  • Stability. MySQL claims in its press releases to be extremely stable, but the 'word on the street' is that this isn't true. It is easy to blindly repeat mantras, but again, it depends on their needs. Running a website with 10 users a day? Even MS-Access would be stable! We have experienced table corruption numerous times, but this could always be blamed on faulty hardware, and we have never had a problem recovering (with the simple REPAIR TABLE command turning us into a master DBA). MySQL is used in extremely high volume environments without problems. PostgreSQL's advanced features are more likely to be stable than the newer MySQL equivalents, having been implemented for longer. However, replication is much newer in PostgreSQL than MySQL, so the reverse applies. But here again, the supposed commoditization of databases means that database stability is taken relatively for granted, and the software tends to be a lot more stable than the hardware it relies on.
  • Speed. MySQL aimed first to be a fast database, while PostgreSQL aimed to be a fully-featured database, and both are converging in the other's direction. Used appropriately, MySQL's MyISAM tables are indeed extremely lightweight.
  • Existing skills.One of our team had MySQL skills, and it made sense to continue this. There was an ill-conceived attempt to move to Informix, but while the team battled to handle the move, others learned to tune MySQL and the move was eventually shelved.
  • Licensing. MySQL is often used as a model for Open Source companies attempting to make money. MySQL is released under the GNU GPL (General Public License), which requires derivative works to be similarly licensed, but also offers commercial licenses for those who do not want to be restricted in this way. PostgreSQL is distributed under the BSD license, which basically allows any use of the code as long as the credits are maintained. BSD vs GPL is another topic for a flame war!

PHP and Perl

PHP is another computer language. In a way, PHP and Perl are competitors in the programming world. Both languages have relatively similar learning curves, work well in the server environment, and have similar overall capabilities.As people would expect, each language has its pros and cons. Ideally the webmaster would be able to use either language and choose the best one for a given project. However most programmers have a preference for one over the other and will tend to use their favorite.In recent years PHP has become more popular with new programmers, in particular, web designers learning their first programming language. This may be because PHP is slightly easier to learn from a web design point of view. PHP pages are constructed like HTML pages, with standard HTML markup. PHP code is inserted into the page and executed when the page is requested. Conversely, Perl scripts are run as stand-alone programs and create HTML pages when the script is run.Another issue is efficiency. PHP is generally faster than Perl (although there are ways to make Perl perform as fast). PHP supporters often cite this as a good reason to choose PHP, but in reality it is not normally a major concern.Perl is a very powerful, robust language with more history than PHP. Although the newbie might think that Perl is more complicated than it needs to be for web development, experienced programmers will appreciate the vast array of options available with Perl.