DataGrid
WP1 - WMS Software Administrator and User Guide
Document identifier: / DataGrid-01-TEN-0118-1_28
Date: / 24/11/2003
Work package: / WP1
Partner: / Datamat SpA
Document status
Deliverable identifier:
Abstract: This note provides the administrator and user guide for the WP1 WMS software.
IST-2000-25182 / PUBLIC / 107 / 107
/ WP1 - WMS Software Administrator and User Guide / Doc. Identifier:
DataGrid-01-TEN-0118-1_2
Date: 24/11/2003
Delivery Slip
Name / Partner / Date / Signature
From / Fabrizio Pacini / Datamat SpA / 24/11/2003
Verified by / Stefano Beco / Datamat SpA / 24/11/2003
Approved by
Document Log
Issue / Date / Comment / Author
0_0 / 21/12/2001 / First draft / Fabrizio Pacini
0_1 / 14/01/2002 / Draft / Fabrizio Pacini
0_2 / 24/01/2002 / Draft / Fabrizio Pacini
0_3 / 05/02/2002 / Draft / Fabrizio Pacini
0_4 / 15/02/2002 / Draft / Fabrizio Pacini
0_5 / 08/04/2002 / Draft / Fabrizio Pacini
0_6 / 13/05/2002 / Fabrizio Pacini
0_7 / 19/07/2002 / Fabrizio Pacini
0_8 / 16/09/2002 / Fabrizio Pacini
0_9 / 03/12/2002 / Fabrizio Pacini
1_0 / 13/06/2003 / First issue for Release 2.0 / Fabrizio Pacini,
Massimo Sgaravatto
1_1 / 04/09/2003 / Fabrizio Pacini
1_2 / 24/11/2003 / Fabrizio Pacini
Document Change Record /
Issue / Item / Reason for Change /
0_1 / General update / - Take into account changes in the rpm generation procedure.
- Add missing info about daemons (RB/JSS/CondorG) starting accounts
- Some general corrections
0_2 / General Update / - Add Cancelling and Cancel Reason information.
- Add OUTPUTREADY job state.
- Add new profile rpms.
- Remove /etc/workload* shell scripts.
- Add summary map table (user / daemon).
- Add CEId format check.
- Add new job cancel notification.
0_3 / General Update / - Modified RB/JSS start-up procedure
- Add gridmap-file users/groups issues
- Add proxy certificate usage by daemons
- Job attribute CEId changed to SubmitTo
- Add DGLOG_TIMEOUT setting
- Add workload-profile and userinterface-profile rpms
0_4 / General Update / - Add configure option –enable-wl for system configuration files
- Add installation checking option –with-globus for Globus to the Workload configure
- Add new Information Index configure options
- Remove edg-profile and edg-user-env rpms from II and UI dependencies
- Add security configuration rpm’s for all the Certificate Authorities to UI dependencies
- Add new parameters to RB configuration file
- Add new Job Exit Code field to the returned job status info
- Remove dependence from SWIG in the userinterface binary rpm
0_5 / General Update / - Modify command options syntax (getopt-like style)
- Add MyProxy server and client package installation/utilisation
- Modify job cancel notification
- Add Userguide rpm
0_6 / General Update / - Modify configure options for the various components
- UI commands modified to use python2 executable
- Clarify myproxy usage
- Explain how RB/LB addresses in the UI config file are used by the commands
- Add –logfile option to the UI commands
0_7 / General Update / - Modify configure options for the various components
- Clarify UI commands –notify option usage
- Add make test target for UI
0_8 / General Update / - Specified dependencies of profile rpms
- Update needed env vars for UI
- Explain how to include default constraints in the job requirements
- Explain that the lc field in the ReplicaCatalog address is now mandatory
- Explain how to specify wildcards and special chars in "Arguments" in the JDL expression
0_9 / General Update / - Defaults for Rank and Requirements in the UI config file
- Added reference to the “.BrokerInfo” file document
- other.CEId in Requirements vs --resource option
- Explain MyProxy Server configuration
- Added description of new parameters in RB configuration file
- RB/JSS databases clean-up procedure added
- Explain usage of RetryCount JDL attribute
- Better explain how to specify wildcards and special chars in "Arguments" in the JDL expression
- Updated reference to JDL Attributes note
- Added Annex on Submission failures analysis
1_0 / General Update / - Refer to WMS release 2
1_1 / General Update / - Description of new UI commands options for interactive jobs (--nogui, --nolisten)
- Added annexes section on job re-submission
1_2 / General Update / - Add voms client APIs rpms among WMS components dependencies
- Update commands description due to the integration with VOMS
- Remove proxy credential creation from UI commands
- Remove --hours option from UI edg-job-submit command
Files
Software Products / User files
Word 2000 / DataGrid-01-TEN-0118-1_2.doc
Acrobat Exchange 5.0 / DataGrid-01-TEN-0118-1_28.pdf
Content
1. Introduction 10
1.1. Objectives of this document 10
1.2. Application area 10
1.3. Applicable documents and reference documents 10
1.4. Document evolution procedure 12
1.5. Terminology 12
2. Executive summary 14
3. workload management system overvieW 15
3.1. Deployment of the WMS software 17
4. Installation and Configuration 20
4.1. Logging and Bookkeeping services 20
4.1.1. Required software 20
4.1.1.1. LB local-logger and LB APIs 20
4.1.1.2. LB Server 20
4.1.2. Configuration 21
4.1.2.1. LB Local-Logger 22
4.1.2.2. LB Server 22
4.1.3. Environment Variables 22
4.2. services running in the “rb node”: Ns, wm, jc, lm 24
4.2.1. Required software 24
4.2.1.1. Globus installation and configuration 24
4.2.1.1.1. Condor-G installation and configuration 24
4.2.1.2. ClassAd installation and configuration 25
4.2.1.3. Boost installation and configuration 25
4.2.1.4. Replica Manager installation and configuration 25
4.2.2. Configuration 25
4.2.2.1. Configuration of the “common” attributes 26
4.2.2.2. NS configuration 27
4.2.2.3. WM configuration 29
4.2.2.4. JC configuration 31
4.2.2.5. LM configuration 32
4.2.3. Environment variables 33
4.2.4. Other requirements and configurations for the “RB node” 34
4.2.4.1. Customized Gridftp server 34
4.2.4.2. Grid-mapfile 35
4.2.4.3. Disk Quota 35
4.3. Security Services 36
4.3.1. MyProxy Server 36
4.3.2. Proxy renewal service 37
4.3.2.1. Required software 37
4.3.2.2. Configuration 37
4.3.2.3. Environment variables 38
4.4. Grid accounting services 38
4.4.1. Required software 38
4.4.1.1. Creating the MySQL databases for the HLR server 39
4.4.1.2. Creating the MySQL database for the PA server 39
4.4.2. Configuration 40
4.4.2.1. Configuring the HLR server 40
4.4.2.2. Configuring the PA server 41
4.4.2.3. Configuring the ATM client software 41
4.4.3. Environment variables 42
4.5. User Interface 43
4.5.1. Required software 43
4.5.1.1. Python Command Line Interface 43
4.5.1.2. C++ API 45
4.5.1.3. Java API 45
4.5.1.4. Java GUI 46
4.5.2. RPM installation 48
4.5.2.1. Python Command Line Interface 48
4.5.2.2. C++ API 49
4.5.2.3. Java API 49
4.5.2.4. Java GUI 50
4.5.3. Configuration 51
4.5.3.1. Python Command Line Interface 52
4.5.3.2. Java GUI 55
4.5.4. Environment variables 58
4.5.4.1. Python Command Line Interface 59
4.5.4.2. Java GUI 59
5. Operating the System 60
5.1. LB local-logger 60
5.1.1. Starting and stopping daemons 60
5.1.2. Troubleshooting 61
5.2. LB Server 62
5.2.1. Starting and stopping daemons 62
5.2.2. Creating custom indices 63
5.2.3. Purging the LB database 65
5.2.4. Experimental R-GMA Interface 65
5.2.5. Troubleshooting 66
5.3. SERVICES RUNNING in the “rb node”: ns, wm, jc, lm 66
5.3.1. Starting and stopping NS, WM, JC and LM daemons 66
5.3.2. NS, WM, JC, LM troubleshooting 66
5.4. Proxy renewal 66
5.4.1. Starting and stopping daemon 66
5.4.2. Troubleshooting 67
5.5. Purger 67
5.6. GRID ACCOUNTING 69
5.6.1. Starting and stopping daemon 69
5.6.1.1. HLR server 69
5.6.1.2. PA Server 69
5.6.2. HLR server administration 70
5.6.2.1. Creating a Fund account 71
5.6.2.2. Creating a Group account 72
5.6.2.3. Creating a User account 73
5.6.2.4. Creating a Resource account 74
5.6.2.5. Deleting accounts 75
5.6.3. Troubleshooting 75
5.7. User Interface (Java GUI) 75
5.7.1. Troubleshooting 76
6. User Guide 80
6.1. User interface 80
6.1.1. Security 80
6.1.1.1. MyProxy 81
6.1.1.1.1. MyProxyClient 81
6.1.2. Common behaviours 83
6.1.2.1. The --input option 85
6.1.3. Commands description 87
6.1.3.1. edg-job-submit 87
6.1.3.2. edg-job-get-output 98
6.1.3.3. edg-job-list-match 98
6.1.3.4. edg-job-cancel 98
6.1.3.5. edg-job-status 98
6.1.3.6. edg-job-get-logging-info 98
6.1.3.7. edg-job-attach 98
6.1.3.8. edg-job-get-chkpt 98
7. ANNEXES 98
7.1. JDL Attributes 98
7.2. Job Status Diagram 98
7.3. Job Event Types 98
7.4. Submission Failures Analysis 98
7.5. job Resubmission and RetryCount 98
7.6. wildcard patterns 98
7.7. The Match Making Algorithm 98
7.7.1. Direct Job Submission 98
7.7.2. Job submission without data-access requirements 98
7.7.3. Job submission with data-access requirements 98
1. Introduction
This document provides a guide to the installation, configuration and usage of the WP1 WMS software released within the DataGrid project.
1.1. Objectives of this document
Goal of this document is to describe the complete process by which the WP1 WMS software can be installed and configured on the DataGrid test-bed platforms.
Guidelines for operating the whole system and accessing provided functionalities are also provided.
1.2. Application area
Administrators can use this document as a basis for installing, configuring and operating WP1 WMS software. Users can refer to the User Guide chapter for accessing provided services through the User Interface.
1.3. Applicable documents and reference documents
Applicable documents
[A1] / JDL Attributes - DataGrid-01-TEN-0142-0_0 – 13/06/2003(http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0142-0_0.{doc,pdf})
[A2] / Definition of the architecture, technical plan and evaluation criteria for the resource
co-allocation framework and mechanisms for parallel job partitioning
(http://www.infn.it/workload-grid/docs/DataGrid-01-D1.4-0127-1_0.{doc, pdf})
[A3] / DataGrid Accounting System - Architecture v 1.0
(http://www.infn.it/workload-grid/docs/DataGrid-01-TED-0126-1_0.pdf)
[A4] / Logging and Bookkeeping Architecture – DataGrid-01-TED-0141
(http://lindir.ics.muni.cz/dg_public/lb_draft2_formatted.pdf)
[A5] / Job Description Language HowTo – DataGrid-01-TEN-0102-02 – 17/12/2001
(http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0102-0_2.pdf)
[A6] / The Glue CE Schema
(http://www.cnaf.infn.it/~sergio/datatag/glue/v11/CE/index.htm)
Reference documents
[R1] / The Resource Broker Info file – DataGrid-01-TEN-0135-0_0(http://www.infn.it/workload-grid/docs/DataGrid-01-TEN-0135-0_0.{doc,pdf})
[R2] / LB-API Reference Document – DataGrid-01-TED-0139-0_0
(http://lindir.ics.muni.cz/dg_public/lb_api.pdf)
[R3] / Job Partitioning and Checkpointing – DataGrid-01-TED-0119-0_3
(https://edms.cern.ch/file/347730/1/DataGrid-01-TED-0119-0_3.pdf)
IST-2000-25182 / PUBLIC / 107 / 107
/ WP1 - WMS Software Administrator and User Guide / Doc. Identifier:
DataGrid-01-TEN-0118-1_2
Date: 24/11/2003
1.4. Document evolution procedure
The content of this document will be subjected to modification according to the following events:
· Comments received from Datagrid project members,
· Changes/evolutions/additions to the WMS components.
1.5. Terminology
Definitions
Condor / Condor is a High Throughput Computing (HTC) environment that can manage very large collections of distributively owned workstationsGlobus / The Globus Toolkit is a set of software tools and libraries aimed at the building of computational grids and grid-based applications.
Glossary
class-ad / Classified advertisementCE
CLI / Computing Element
Command Line Interface
DB
DGAS
EDG / Data Base
Datagrid Grid Accounting Service
European DataGrid
FQDN / Fully Qualified Domain Name
GIS / Grid Information Service, aka MDS
GSI
GUI
HLR
IS / Grid Security Infrastructure
Graphical User Interface
Home Location Register
Information Service
job-ad
JA
JC / Class-ad describing a job
Job Adapter
Job Controller
JDL / Job Description Language
LB
LM / Logging and Bookkeeping Service
Log Monitor
LRMS / Local Resource Management System
MDS / Metacomputing Directory Service, aka GIS
MPI
NS
OS
PA / Message Passing Interface
Network Server
Operating System
Price Authority
PID / Process Identifier
PM / Project Month
RB / Resource Broker
SE / Storage Element
SI00 / Spec Int 2000
SMP / Symmetric Multi Processor
TBC / To Be Confirmed
TBD / To Be Defined
UI
VO
VOMS
WM
WMS / User Interface
Virtual Organisation
Virtual Organisation Membership server
Workload Manager
Workload Management System
WP / Work Package
2. Executive summary
This document comprises the following main sections:
Section 3: Workload management System Overview
Briefly introduces the new revised Workload Management System architecture, and discusses about the deployment of the WMS components.
Section 4: Installation and Configuration
Describes changes that need to be made to the environment and the steps to be performed for installing the WMS software on the test-bed target platforms. The resulting installation tree structure is detailed for each system component.
Section 5: Operating the System
Provides actual procedures for starting/stopping WMS components processes and utilities.
Section 6: User Guide
Describes in a Unix man pages style all User Interface component commands allowing the user to access WMS provided services.
Section 7: Annexes
Deepens arguments introduced in the User Guide section that are considered useful for the user to better understand system behaviour.
3. workload management system overvieW
The revised (release 2) architecture of the EDG Workload Management System (WMS), which is described in detail in [A2], is represented in
Figure 1.
Figure 1: UML diagram describing the new (rel. 2) WMS architecture
The User Interface (UI) is the component that allows users to access the functionalities offered by the Workload Management System.
The Network Server (NS) is a generic network daemon, responsible to for accepting incoming requests from the UI (e.g. job submission, job removal), which, if valid, are then passed to the Workload Manager.
The Workload Manager (WM) is the core component of the Workload Management System. Given a valid request, it has to take the appropriate actions to satisfy it. To do so, it may need support from other components, which are specific to the different request types.
All these components that offer support to the Workload Manager provide a class whose interface is inherited from a Helper class. Essentially the Helper, given a JDL expression, returns a modified one, which represents the output of the required action. For example, if the request was to find a suitable resource for a job, the input JDL expression will be the one specified by the user, and the output will be the JDL expression augmented with the CE choice.
The Resource Broker (RB) or Matchmaker is one of these classes offering support to the Workload Manager. It provides a matchmaking service: given a JDL expression (e.g. for a job submission), it finds the resources that best match the request. It interacts with the Information Service and with the data management services.
The Job Adapter (JA) is responsible to for makinge the final “touches” to the JDL expression for a job, before it is passed to CondorG for the actual submission. So, besides preparing the CondorG submission file, this module is also responsible forto creatinge the wrapper script, and forto creatinge the appropriate execution environment in the CE worker node (this includes the transfer of the input and of the output sandboxes).