GWD Grid User Services Best PracticesFebruary 2001

Grid Working Draft / J. Towns, J. Ferguson, D. Fredrick, G. Myers
February 2001

Grid User Support Best Practices

Status of this Draft

This draft invites discussions and suggestions for improvements. The distribution of the document is currently within the Grid User Services Working Group.

1Introduction

As Grid environments develop, it is recognized that a variety of support function analogous the support functions found in computer center helpdesks, software support organizations and application development services will be needed. This document surveys some of the current planned practices in some developing distributed environments and suggests the best practices as appropriate for various elements of the stated support model. The intent is to provide recommendations as to how best support users and applications in these nascent environments.

This document is expected to require regular review and updating as Grids develop and mature. These changes will certainly induce changes in the support requirements. Also, this document is closely related to another document under development by the Grid User Services Working Group intended to define requirements for services, information and tools in order to enable applications and their support in Grid environments. Finally, this document does not address the support issue for use of specific resources imbedded within the Grid environment nor the entire Grid itself, but addresses the use and support of a particular grid computing environment.

2A Support Model

As a basis for outlining the best practices, a model of support is given. The elements of this support model are based on the current practices and expected needs for Grid environments.

2.1 Elements of a Support Model

Here we delineate a number of elements of a support model.

[Need to provide definitions of these.]

2.1.1User Information and Tools

2.1.2Service Level Agreements

2.1.3User Accounts and Allocation Procedures

2.1.4Education and Training

2.1.5Help Desk Process

2.1.6Support Staff Information and Tools

2.1.7Metrics (measuring success)

2.1.8User Feedback

2.2 Identify Current Support Models in Use

Included in Appendix A of this document are the descriptions of current and planned practices in developing Grid environments. This is certainly not intended to be all-inclusive but to give a flavor of current activities.

3User Information and Tools

3.1Providing/Disseminating Information

There is a clear need to disseminate certain types of information. There also needs to be a set of mechanisms that are available to disseminate information to the users and applications developers. Here we outline the information and mode that are seen as most important and most effective.

3.1.1Types of Information for Users and Support Staff

There is a set of information that it is important for users to know about in order to target the resources they wish to use. Frequently users can make use of various resources to accomplish the task they have at hand, but need the ability to decide which resources they will use. In addition, knowing this information allows the user to know something about the state of the execution of a particular task or set of tasks.

It is equally important, if not more important, for support staff to have access to this information. This allows them to afford assistance to users in selecting resources, but also allows them to assist in determining what has gone wrong when there is a problem with the execution of a task or set of tasks. The following is a list of information considered to be a great importance to make available to user and support staff, via some mechanism, in order to support the use of grid environments. The list in not intended to specify with detail all the information needed, but give the sense of the types of information. In general, the greatest level of detail possible is required. These are broken into two general categories:

  • Quasi-static information:
  • Grid connected resource information
  • Specification/configuration
  • Access/availability/use policies
  • Infrastructure information
  • Connectivity information between any set of resources
  • Latency/Bandwidth of pipes
  • Feature set (QoS, etc)
  • Access/availability/use policies
  • Grid services
  • Availability
  • Software
  • Availability on resources
  • Dynamic information:
  • Grid connected resource information
  • Up/down
  • Availability of, or “load” on, a resource
  • Job information (queue status information)
  • Availability interrupts
  • Resource component status information (i.e., disk available, memory free, etc)
  • Infrastructure information
  • Link status (up/down)
  • Current measured available latency/bandwidth/packet loss/etc.
  • Availability interrupts

All the facets of the Grid environment with which the user will come into contact, must be documented at a level to provide an adequate understanding of their function and use. This information changes slowly over time as the environment develops. These are the information resources that users will make use of to understand how to operate in the environment, how to develop their applications and how to actually make use of resources. A representative set of documentation that is required is:

  • Access:
  • Overall grid environment documentation
  • Applying for an account
  • Obtaining an allocation of resources
  • Management of allocations
  • Service level agreements
  • Application development:
  • APIs for developing grid-based applications
  • APIs available (installed)
  • User and reference manuals
  • Software tools:
  • Debugging tools
  • Performance tools
  • Application execution:
  • Usage policies and procedures
  • Job submission and monitoring
  • Scheduling and meta-scheduling

3.1.2Method(s) of Disseminating Information

In recent years, methods of delivery of information to end-users have evolved. It is expected that this will continue to be true in various ways.

The defacto standard for delivery of end-user documentation in Grid environments, and computing environments in general, has become the Web. This is true for many reasons, the most compelling of which is that these folks are using interfaces with Web browsers available as part of the environment to make use of Grid environments. There is little reason to believe that the Web will not continue to be the preferred method of content delivery for this type of information. It is recognized that as the Grid computing environment interfaces develop, extensions to this notion will be required. Most notably, wireless devices are becoming more commonplace and delivery of Web content to these devices requires special considerations. Nonetheless, this is the preferred method. Another significant advantage is the ability to provide search capabilities on the content of each document and across documents. Special considerations should be made in the development of the online materials to support effective search capabilities.

It is recognized that there are two cases in which hardcopy materials come into play in the support of users and applications. The first is in the case of documentation provided by some software suppliers. It is still the case, particularly for some independent software vendor’s (ISV) applications that the documentation is only provided in hardcopy form. While this was historically true for documentation in the support role in computing centers in the past, this situation is rapidly trailing off.

The second situation is in the case of a user preference to having hardcopy documentation. With some frequency, it is the case that users prefer to have hardcopy versions of documentation, particularly of reference documents. As such, it is considered beneficial, though not critical, that indexed, formatted, printable versions of documentation be made available in addition to the on-line forms when reasonable.

3.2Portals

3.2.1General Grid Portals

It is recognized that value is often derived for end users, particularly users of distributed environment, by having access to an interface that provides a base set of functionality. This functionality, in most cases, really provides a single interface to execute many of the actions that a user would typically access each of the distributed resources individually to complete. In addition, a general grid portal provides a central location for access to the various online information and documentation of interest to the grid user community with an integrated presentation. A basic grid computing environment portal should provide support services to users in two general categories:

  • Information services
  • Quasi-static and dynamic
  • Accounting and allocation information
  • Help desk
  • Training
  • Interactive services
  • Helpdesk problem submission
  • Knowledge base searching
  • FAQ
  • Web-based access to resources
  • File browsing
  • Job submission
  • Account management
  • Development environments

3.2.2Applications-Specific Portals

There are a small but growing number of efforts to build graphical interfaces to applications that, in the past, were accessed and used through commend-line driven interfaces. Currently there is an increase in development of web-based interfaces, or application portals, but there are certainly others. Web-based interfaces do provide greater flexibility at this point in time. These application portals fall into two general categories: interfaces for specific applications developed within a research group, and interfaces for more broadly used applications such as community codes or ISV applications.

Application portals developed for the use of specific research groups will certainly allow them to be more effective in utilizing resources within a grid environment, but are typically only useful to those groups’ activities. Some more general application portals are being developed (e.g. a GAUSSIAN98 portal) and such interfaces should be adopted and made available via the general user portal for use by interested members of the user community. Such application portals not only make use of resources easier, but a well constructed applications portal also typically reduces the number of errors users might make in the process of making use of the applications. This allows the researcher to be more productive, have a better experience and lowers the impact of supporting such applications.

4End User Service Level Expectations

One of the most difficult issues in providing good support and in giving users a good experience with that support is managing their expectations. To complicate matters, currently most users of developing grid environments have no formal contractual arrangement with the providers of services and support within the grid environment. As such, there are rarely any well-defined agreements on the shared expectations the users of these environments and those providing support can count on. A clear statement that accurately delineates these expectations for both the users and support operations in a grid computing environment is therefore critical. It is a requirement that the following things be delineated for the users:

[Might follow structure of information on resources available to state expectations]

  • Who is supported?
  • What is supported?
  • When is it supported?
  • What the commitment is to acknowledge problem reports?
  • What the commitment is to solve problem reports?

Clear Grid User Service Level Agreements (GUSLA) must be arranged among cooperating sites providing services and support within the grid environment. The establishment of such agreements, through a specific concrete and well-documented mechanism such as a memorandum of understanding, must be part of the generic arrangement among sites, as with security and accounting. Ideally, user accounts should not be authorized without this arrangement and the establishment of the necessary minimum grid user services infrastructure.

Service level agreements should delineate user services goals from the user perspective and be agreed to by all participating sites. Areas covered should include the following support services infrastructure:

  • Consulting/Technical Support
  • Accounts/password problems – solved within x working hours
  • Mechanisms for contacting support
  • Web problem report forms
  • Email
  • Phone contacts during specified times of the day
  • Resolve x% of user problems within x working days
  • Problems not resolved within x working days are escalated
  • Mechanisms for users to track problem report
  • Documentation – provide accurate, complete information on:
  • Grid resources and services
  • Use of the grid computing environment, particularly resource access and security
  • Software development
  • Software optimization
  • Allocation procedures
  • Training
  • Software development for Grid systems
  • Software optimizations
  • Software performance measurement
  • User Service Performance Metrics
  • User Surveys
  • Other User feedback, formal and informal
  • Support contacts / trouble ticket statistics
  • Annual summaries of metrics made available to users
  • System Resource and Grid Environment Notices
  • Timely notice of regularly scheduled system downtimes
  • Notice of major system downtimes for upgrades, etc., “X” days in advance

[I’m wondering if we want to limit this list to just the high-level bullets. I can think of many other things that can be added in these areas. –JT]

5User Accounts and Allocation Procedures

Fundamentally, all users need to obtain some type of account and some form of authorization to use specific resources within any grid environment. As grid environments are rapidly developing the definitions of and policies for these items are evolving. While we cannot therefore state best practices we can cover what is currently done and make recommendations on how these might best be done in the future. Accounts for users typically take the form of logins for individuals on specific resources. This is primarily an artifact of the process by which grids are being created, they are typically the aggregation of pre-existing resources under sufficiently separate control such that they have had independent processes for establishing accounts. The same is largely true for the processes by which allocations of resources are made. While the umbrella organization providing the basis for establishing the grid environment helps to unify some of these issues, there are still many implications for users and, in fact, these processes are still evolving.

5.1Grid Policies Affecting Accounts and Allocation

It is very important that the policies under which the grid environment will operate are well defined as early as possible. Clearly, these will evolve over time, but this is another piece that is important in establishing a shared understanding of many issues amongst all those involved in supporting and using a grid computing environment. This section addresses a number of questions and issues prospective users must deal when trying to work in a grid computing environment.

5.1.1Trust

One of the most difficult issues in dealing with the creation of accounts to access multiple resources in an emerging grid environment is the establishment of a trust relationship between sites and a formalization of that trust that minimizes impact on the user community. A very effective means of accomplishing this has been through the establishment of a Public Key Infrastructure (PKI) as the basis of this trust relationship. The establishment of a Certificate Policy (CP) is the basis of these trust relationships and provides for either the creation of a trusted Certificate Authority (CA) or the enlistment of an existing CA for issuance of certificates. Given the common agreement to the CP, participating sites can reliably accept certificates issued by the CA to allow for authentication to local resources.

Certainly, there are many other issues to be dealt with, but this basis for trust allows many of them to be addressed in a relatively straightforward and logical manner. A PKI does not, by itself, establish a single sign-on capability, but makes it possible is a sensible way. In addition, the existence of a PKI lays the foundation for trust relationships between grid environments. This, is turn can allow users access to resources in other grid environments without necessarily requiring the user to go through an account acquisition process.

It should be noted that this trust relationship does not address the issue of authorization to use a resource. It simply provides a mechanism for authentication.

5.1.2Acceptable Use

As users begin to explore the possible resources and services they potentially can make use of in a grid environment, they must be guided by a clear acceptable use policy (AUP) for each resource or services or for collections of these. Typically, such statements for the use of resources exist addressing issues in the context of an isolated site. These must be reviewed and extended to address the acceptable use of resources and services provided to the grid environment by grid users.

5.1.3Memorandum of understanding

[I’m drawing a blank on what was intended here… -JT]

5.2Account Acquisition Process

At some level, a user must acquire an account of some type to ultimately be able to access resources with a grid environment. As grid environments are rapidly developing and the policies surrounding access and accounts mature, the exact processes by which a use obtains an account in any particular environment will evolve. The general consensus is that it is desirable that these environments develop single sing-on capabilities. Given a PKI, it is possible to develop an environment that does not require individual local accounts on the resources to which users have access.

In reality, the near term is dominated by the need to accommodate the policy restrictions of local sites participating in a grid environment in allowing users access to their local resources. This typically does mean that local accounts are required to day and often means that local policy will require local accounts for some time.

This means that a number of issues must be clearly documented in order for users to be able to understand what they are required to do in order to be able to access to resources of the grid environment. A number of these are delineated here.

It must be made clear whether an allocation of resources required for an account can be issued either in the grid environment or on any particular resource within the environment. Issues relating to the allocation of resources are address below. In an account associated directly with an allocation of resources, there is the implication that the account will be deactivated when the user no longer has access to an active allocation of resources.

The mechanics of requesting an account must be clearly defined. It should be possible to obtain an account through a centralized account management system although it might also be possible to obtain accounts within the grid computing environment from any of the participating sites individually. Either of these is possible, but they have distinct implications on the account management process within the grid environment. The policy for this must be decided early.