Lab 2 –MapReduce and the Web Console

Lab objectives

In this lab you will practice what you have learned in this lesson, specifically you will practice with MapReduce jobs and learn about the BigInsights Web Console.

Lab instructions

This lab has been developed as a tutorial. Simply execute the commands provided, and analyze the results.

BigInsights should be already started before working on this lab (Refer to “Lab 0 – Setup” for instructions on starting BigInsights)

What is the WordCount job?

WordCount is an example on Hadoop MapReduce that is included with the Apache open source documentation.

Running the WordCount job

1.First we need to copy the data files to HDFS:
> hadoop fs -put /BigDataUniversity/input/statsFed/ /input

2.Now we can run the wordcount job with the following command, where “/input” is where the input files are, and “output” is the directory where the output of the job will be stored.
> hadoop jar /opt/ibm/biginsights/IHC/hadoop-0.20.2-examples.jar wordcount /input output

Note:
If you are working on the IBM SmartCloud Enterprise, wordcount is in another path, so execute this command instead:
> hadoop jar /mnt/biginsights/opt/ibm/biginsights/IHC/hadoop-0.20.2-examples.jar wordcount /input output



3.Now review the output of step 2:
> hadoop fs -ls output

You can see that the output was split into multiple files. Now view the contents of one of the files:
> hadoop fs –cat output/*00




4.You can also inspect the job further by navigating to the BigInsights WebConsole. Open a Web Browser (e.g. Firefox) and navigate to:

Note:
If you are working on the Cloud, ensure you replace “localhost” with the appropriate IP address of your Cloud instance.


5.Then click the jobs tab at the top-left of the page

6.You should see a table containing a summary of jobs you’ve ran. Select the row with Name word count (this should be the top row) then click the View Job button.

You should then be presented with a screen like the following:

Top pinksection displays all the general information about the job (e.g. Start/Finish time). The bluesection displays the summary of all tasks run for said job. For example you can see that there were 4 Map, and 8 Reduce jobs ran. Moreover, the 8 Reduce jobs correspond to the 8 output files from Step 3.

7.Click on the Job Counters button to display details about the number of bytes read and written, the number of various types of input and output records produced by the MapReduce framework, and so on. Scroll through the pop-up window, if needed, to become familiar with the various statistical data collected.

8.Finally, click on the Job Conf… button to display information about the configuration parameters associated with this job.

Inspect the .xml file displayed in the lower pane, scrolling down as needed to review the information collected. Configuration information is provided through property elements that consist of name/value pairs. About midway through the file, you’ll find a property with a name of mapred.job.name and a value of word count. This corresponds to your job’s name.

9.It is recommended to spend a few minutes exploring all the logs and details available on the Job Details pages.

Working in the Web Console

Inspecting the overall health of your system

  1. Click on the Administration tab in the upper left corner. Most administrative tasks are performed from this view.
  1. Inspect the contents of the Dashboard Summary immediately beneath the Administration tab. Note that this dashboard reports the number of nodes in your cluster as well as the number of errors and warnings reported. The final line reports on the overall status of your system. In this example, the dashboard reports a healthy status for a single-node environment.

  2. Inspect the Start Stop Summary to the right of the dashboard. This indicates if all installed components are up and running. Note that your BigInsights environment may be healthy even if one or more optional components have been stopped, as shown in this figure below.
  1. Inspect the Server Administration pane at the bottom. Click on the drop-down list (Select View) to see the Components view. This view summarizes the installed components and their start/stop status.
  2. Select the checkbox of a component of your choice and click on the Status Details button. A pop-up window appears with information about your selected component, as shown in the following figure. (The report you see will vary depending on the component you selected and its operational status.)

Starting and stopping a component

  1. In the Components section on the lower left, highlight HBase and click the Stop button to stop the HBase server.

  2. After the console reports that HBase has been stopped, click on the Start button to start the service again.

------This is the end of this lab ------

1

©2011 BigDataUniversity.com