OCFS

Oracle Cluster FileSystem for Linux

Users Guide

Table of Contents

1 - Introduction...... 4

2 - Obtaining OCFS...... 4

3 - Installing OCFS...... 5

3.1 - Automatically mount OCFS during boot...... 5

3.2 - Configuring the OCFS...... 5

3.2.1 - /etc/ocfs.conf file...... 5

3.2.2 - Automatic Configuration using ocfstool...... 6

3.2.3 - Manual configuration...... 6

3.3 - Loading ocfs...... 7

3.4 - Formatting OCFS Partition...... 7

3.4.1 - Format using ocfstool...... 7

3.4.2 - Format using mkfs.ocfs...... 7

3.5 - Mount OCFS partitions...... 9

3.5.1 - Mounting manually...... 9

3.5.2 - Mounting automatically...... 9

4 - Tools...... 9

4.1 - ocfstool...... 9

4.1.1 - Format...... 13

4.1.2 - Generate Config...... 14

4.2 - Extfinder...... 15

4.3 - Debugocfs...... 15

5 - Support Utilities...... 18

5.1 - /etc/init.d/ocfs...... 18

5.2 - /sbin/fsck.ocfs...... 18

5.3 - /sbin/load_ocfs...... 18

5.4 - /sbin/mkfs.ocfs...... 18

5.5 - /sbin/mounted.ocfs...... 19

5.6 - /sbin/ocfs_uid_gen...... 19

6 - Best Practices...... 19

7 - Frequently asked Questions & Answers...... 20

Q. Can I install the Oracle Distribution in an OCFS partition?...... 20

Q. My partitions don't mount automatically during boot. What's wrong?...... 20

Q. When running fsck.ocfs, it returns the error “WARNING: nonzero bytes after the disk header structure”. What does it mean? 20

Q. What are the most appropriate (recommended) tool IO and capacity analysis of OCFS? 20

Q. Can I use ocfs on a NAS (Network Attached Storage) device like NetApp?...... 20

Q. Can I use LVM or MD to create my OCFS filesystem on top of it?...... 20

Q.I want to perform some tests using RAC/OCFS, but I don't want to spend too much money on hardware. Do I have any other solution? 21

Q.Why do I have to include the option _netdev in the fstab on RedHat?...... 21

Q.What is the best way to archive the logs to an OCFS directory?...... 21

Q.Is OCFS supported on 64 bit platform like Itanium?...... 21

Q. Can I run ocfs on non enterprise Linux distributions like RedHat 9?...... 21

Q.Do I need any specific version of United Linux to run OCFS?...... 22

Q. Can I run the latest OCFS on a plain RedHat AS 2.1, without any errata applied?..22

Q.What is the advantage of running ocfs against raw devices?...... 22

Q.What happen if I have to change the IP Address of my systems?...... 22

Q.My Network Interface Card had to be replaced. Do I need to do something?...... 22

Q.I have a partition that is not mounted. How do I know if it is an ocfs partition or not?22

Q.Can I use my OCFS partition to store regular files?...... 23

Q.How much do I lose in terms of performance compared to raw devices?...... 23

Q.How do I enable async I/O on Oracle using OCFS?...... 23

Q.How do I backup my OCFS files? Can I use tar or other OS command?...... 23

Q.Is it possible to resize an existing OCFS partition?...... 23

Q.I'm having problems with OCFS. How can I debug OCFS?...... 23

Q.Can I run OCFS in a stand-alone system? What are the advantages of running it?...24

Q.I have a database running on OCFS in a stand-alone node. Why it is so slow compared to other stand-alone systems running on ext3? 24

Q.How can I obtain more information about OCFS?...... 24

Q.I have a customized RedHat AS kernel on my system. Does Oracle support OCFS and the RDBMS on it? 24

Q.How do I know if my OCFS version is officially supported by Oracle?...... 24

8 - Appendix A...... 25

Introduction.

OCFS is a shared disk cluster filesystem. The current version (version 1) released for Linux is specifically designed to alleviate the need for managing raw devices. It can contain all the oracle datafiles, archive log files and controlfiles. It is however not designed as a general purpose filesystem.

This document describes the steps required to install OCFS on Linux and will also give guidelines for optimizations and some more in depth understanding of how the filesystem works.

Downloading OCFS.

OCFS can be downloaded from “” for the following distributions : RedHat Advanced Server 2.1 and United Linux 1.0 (Conectiva, SuSe, TuboLinux and SCO). Oracle officially supports the Oracle database on OCFS if it is installed from the binary packages that are available for download.

If the user decides to download the source code and compile it, then there will be no formal support provided by Oracle.

In addition to the OCFS binaries, we also provide a collection of utilities (cp, dd, tar and textutils) that enable O_DIRECT. The updated tools are recommended to be used as they make more efficient use of the operating system in conjunction with OCFS.

Binary distributions for ia32 and ia64 can be found under each one of the supported platforms.


There are basically three rpm packages to download in order to install OCFS. Those packages are:

OCFS-Support

OCFS-Tools

OCFS Module

Before downloading the OCFS Module, make sure it is compatible with the kernel version in use (uname -a).

Installing OCFS


Installing OCFS is an easy process. After downloading the packages, issue the following command on the directory where the packages were downloaded:

# rpm -Uhv ocfs*.rpm

This will install the support tools, the actual kernel module for the filesystem and a graphical configuration tool.

Automatically mount OCFS during boot.

After installing the OCFS packages, verify that the module will be properly initialized on startup using the command:

# chkconfig –list |grep -i ocfs

If the output looks like :

ocfs 0:off 1:off 2:off 3:on 4:on 5:on 6:off

Then, no action is required, but if the output doesn't show “on” on 3,4 and 5 (rc levels), issue the following command to enable automatic startup of the ocfs during boot:

# chkconfig ocfs on

Configuring OCFS.

OCFS depends on a node specific configuration file. This file is named ocfs.conf and it is located in the /etc directory; it can be generated automatically or manually using the ocfstool. During the next sections, the /etc/ocfs.conf file and both methods of configuring will be described in detail. This file is needed on every node in the cluster and it is highly recommended to use ocfstool to configure each node.

1.1.1 - /etc/ocfs.conf file.

The /etc/ocfs.conf file can have the following parameters:

ip_address – Specify the IP Address to be used by the OCFS DLM. The server must be able to reach all nodes participating on the cluster through the interface related to the IP Address specified in this field.

ip_port – Specify the port to be used by the OCFS DLM to communicate with the other nodes in the cluster. The port must be the same on all nodes in the cluster.

Node_name – Specify the server hostname associated to the IP Address specified in the ip_address parameter.

comm_voting – Specify which method for voting is going to be used by OCFS. If set to 0 (default), it means that OCFS will be voting using the disk, if set to 1, it will be using the network to vote. If the OCFS is set to use network to vote and it becomes unavailable for some reason, it will automatically (and transparently) fall back to disk. Enabling comm_voting will drastically increase performance for regular filesystem operations such as rm, mv, mkdir etc.

guid – This parameter is automatically filled by the ocfs_uid_gen utility and should never be manually changed.

1.1.2 - Automatic Configuration using ocfstool.

In order to perform the automatic configuration with ocfstool, it is necessary that the GUI environment is properly set and enable.

Open a GUI session as root and execute the command “ocfstool”. Make sure the DISPLAY variable is set before starting the tool.

When the ocfstool window(See Illustration 1) open, invoke the Generate Config(See Illustration 7) task by either pressing the key sequence <CTRL-G> or using the menu, clicking on Tasks>Generate Config . For more information on the parameters, check the item Generate Config in the ocfstoolsection.

1.1.3 - Manual configuration

Although the manual configuration is provided, Oracle strongly recommends the usage of the ocfstool since it does provide a reliable, consistent and easy way to properly configure OCFS.

To manually configure the OCFS, create the file /etc/ocfs.conf based on the sample below on each one of the nodes participant of the cluster. Make sure the parameters are consistent among nodes.

# ocfs config


# Ensure this file exists in /etc directory #

node_name = ca-test2.us.oracle.com

ip_address = 10.0.0.1

ip_port = 7000

comm_voting = 1

After the file is created, execute the utility ocfs_uid_gen with the -c argument as root in order to generate the unique identification key necessary for the OCFS to identify itself in the cluster. After the generation of the uid key, the /etc/ocfs.conf file should looks like:

#

# ocfs config

# Ensure this file exists in /etc#

node_name = ca-testt2.us.oracle.com

ip_address = 10.0.0.1

ip_port = 7000

comm_voting = 1

guid = 9B2996991BCB25DF4CBB0003470CFE75

Loading ocfs.

The /etc/init.d/ocfs startup script is provided in the package and automatically loads the OCFS module if there is an entry for it in /etc/fstab. Using this startup script is the preferred method to load OCFS as it does all the verification before loading the module and mounting the partitions.

If for some reason there still is a need of manually load the OCFS module, just issue the command load_ocfs as root. If the process is successfully executed, it should show a message like:

# load_ocfs

/sbin/insmod ocfs node_name=ca-test2.us.oracle.com ip_address=10.0.0.1 ip_port=7000 cs=1859 guid=9B2996991BCB25DF4CBB0003470CFE75

Using /lib/modules/2.4.9-e-enterprise-ABI/ocfs/ocfs.o

Formatting an OCFS Partition.

Similar to configuring the OCFS, there are two ways to format an OCFS partition. One is using the “ocfstool” command and the GUI environment, and the other is using the “mkfs.ocfs” command from the shell prompt. Both commands needs to be executed as root.

None of the ocfs utilities will partition the disk at any time. So, before formatting, choose the utility of your preference and partition the disk according to the needs of your implementation, and make sure the disk/partition is not being used by anything else to avoid data loss.

With OCFS you have to format a partition only once on one node, after that every node will be able to mount this filesystem. (of course the device needs to be visible on every node in the cluster)

1.1.4 - Format using ocfstool.

Start by invoking the ocfstool command like described in the section “3.2.2 Automatic configuration using ocfstool”. After getting the ocfstool window(See illustration 6), press either the sequence key <CTRL-F> or from the menu, choose Tasks>Format. Fill in all fields and click the “OK” button. For more information on the parameters, check the format item under ocfstool section.

1.1.5 - Format using mkfs.ocfs.

For people who do not have a GUI available, use mkfs.ocfs. If a GUI is available, the preferred method to format an OCFS partition is to use the ocfstool. The mkfs.ocfs command has the following syntax:

# mkfs.ocfs

usage: mkfs.ocfs -b block-size [-C] [-F] [-g gid] [-h] -L volume-label

-m mount-path [-n] [-p permissions] [-q] [-u uid] [-V] device

-b Block size in kilo bytes

-C Clear all data blocks

-F Force format existing OCFS volume

-g GID for the root directory

-h Help

-L Volume label

-m Path where this device will be mounted

-n Query only

-p Permissions for the root directory

-q Quiet execution

-u UID for the root directory

-V Print version and exit

When using the mkfs.ocfs command, the user has to provide all the information that is prompt by the ocfstool utility.

The usage of the “-C” argument will force the mkfs.ocfs to clear all blocks. Depending on the size of the partition, it may be a long process.

The “-F” argument should be used only if the partition was previously formatted as an OCFS.

The “-b” argument specify the blocksize that the partition will be formatted. The blocksize specifies the maximum size of the partition that can be mounted. It goes from 4k to 1M and allows volumes from 32Gb up to 8Tb. Format with 128kb blocksize is optimal size. Sizes between 4kb and 1mb are supported. The smaller blocksizes will have a performance penalty, but will be useful for the future when we will support regular files. 128kb blocksize means that every file created with content uses up a minimum of 128kb space on disk. Even if there is only 1 byte of data in the file. The filesystem will allocate chunks of space in <blocksize> chunks.

The example below show an ordinary partition being formatted and its output.

# mkfs.ocfs -F -b 128 -g dba -u oracle -L /u01 -m /u01 - p 775 /dev/sdb1

Checking heart beat on volume ......

Clearing volume header sectors...Cleared volume header sectors

Clearing node config sectors...Cleared node config sectors

Clearing publish sectors...Cleared publish sectors

Clearing vote sectors...Cleared vote sectors

Clearing bitmap sectors...Cleared bitmap sectors

Clearing data block...Cleared data block

Writing volume header...Wrote volume header

#

Mount OCFS partitions.

1.1.6 - Mounting manually.

At least for the first time, it is good to have the partitions mounted manually instead of automatically. That's because the user has control over all the process and can check if everything works fine.

To mount the new OCFS partition, use the mount command with the “-t” argument specifying “ocfs” in front of it. The example below shows how an OCFS partition is mounted:

# mount -t ocfs /dev/sdb1 /u01

1.1.7 - Mounting automatically

To mount the OCFS partitions automatically, just add the partition information to the /etc/fstab file. The example below show an entry for RedHat AS 2.1.


/dev/sdb1 /u01 ocfs _netdev

/dev/sdd1 /u02 ocfs _netdev

Tools

ocfstool

ocfstool is a GUI frontend for managing and debugging OCFS volumes on the system; and also the preferred method for managing OCFS. One can mount and unmount volumes, format partitions, view information and individual files, see the current node map, and block bitmap.

After starting the tool you are presented with a window consisting of 2 sections or segments.

The top portion maps all known partitions that are OCFS formatted and allow users to mount and unmount these partitions. The mount operation will try to mount the filesystem to the mountpoint specified during the format operation. Like any other filesystem the umount operation will only succeed if there is no process using it.

The bottom portion has a series of folders that are divided by areas of management and browsing. It will show the information related to the device selected in the top portion of the screen.


The information that can be obtained is divided in :

General – Contain general information about the filesystem (SeeIllustration 1). Like ocfs version that formatted the device, mountpoint, size of the filesystem, number of extents, userid and groupid with privileges on the filesystem and the appropriate permission.

File Listing – Will show file information about the filesystem (SeeIllustration 2). Selecting a file or directory, one can see its information, like size, allocation unit, ownership and protection. One can also see which nodes have the partition mounted at that specific point in time.

Configured Nodes – On this folder, one can see which nodes have the selected partition (SeeIllustration 3).

Bitmap View – On this folder, one can see the bitmap allocation for the selected partition (SeeIllustration 4).

Free Space – On this folder, one can see a list of free space for the selected partition (SeeIllustration 5). This list will show the size and the bit# of the free space.


Illustration 1

Illustration 2

Illustration 3

Illustration 4

Illustration 5

In addition to the two portions showed, there are the tasks that can be reached by the menu. The available tasks on the menu are:

1.1.8 - Format.

There are two ways to invoke the format window, one is pressing the <CTRL+F> key sequence and the other is invoking by selecting the menu Tasks>Format.

When the format window is invoked, the following options will be available:

Device – From the pull-down menu, select the device that is going to be OCFS formatted. Make sure the device is not in use by any other application or filesystem before proceeding.

Blocksize – Select the blocksize on which the partition is going to be formatted. Valid values are 4k, 8k, 16k 32k, 64k, 128k, 256k, 512k and 1024k. The optimal size suggested by Oracle is 128k. Smaller blocksize can be selected but they will carry some performance penalty. This is also going to limit the size of the partition that can be used when formatting with OCFS (32Gb to 8Tb).

Volume Label – This will specify the volume label. This is useful if the user wants to use the volume label to mount the filesystem.

Mountpoint – Specify the location where the partition is going to be mounted. The mount point must exist on all nodes in the cluster that will share the device.

User – Specify the user that will own the filesystem. When the filesystem is mounted, it is automatically owned by the user specified in this field.

Group – Specify the group that the filesystem will belong to. Similar to the User field.

Protection – Sets the default permission of the filesystem when mounted. Usually set to 0755.

Clear all Data Blocks – When checked, this option will make ocfs format block by block, zeroing all the filesystem. This option will increase the time necessary to format the partition considerably. (SLOW)

Force – This options needs to be checked if the partition to be formatted was previously formatted by OCFS.


Illustration 6

1.1.9 - Generate Config.

Invoking this task will promptly generate the /etc/ocfs.conf file according to the server configuration. Mandatory fields will be automatically filled with the server information. There is no need to do any further steps as the process will automatically generate the guid in the configuration file..