Question1. Which of the Following Is a Monitoring Solution for Hadoop?

Question1. Which of the following is a monitoring solution for hadoop?

Sirona
Sentry
Slider
Streams

Question2. ______is a distributed machine learning framework on top of spark

MLlib
Spark Streaming
GraphX
RDDs

Question3. Point out the correct statement?

Knox is a stateless reverse proxy framework
Knox also intercepts REST/HTTP calls and provides authentication
Knox scales linearly by adding more knox nodes as the load increases
All of the mentioned

Question4. PCollection, PTable, and PGroupedTable all support a ______operation.

Intersection
Union
OR
None of the mentioned

Question5. How many types of mode are present in Hama?
1. 2

2. 3

3. 4

4. 5

Question6. The IBM ______Platform provides all the foundational building blocks of trusted information, including data integration, data warehousing, master data management, big data and information governance.

Infostream
Infosphere
Infosurface
Infodata

Question7. ______is the name of the archive you would like to create.

Archive
Archive name
Name
None of the mentioned

Question 8. Ambari provides a ______API that enables integration with existing tools, such as Microsoft System Center.

Restless
Web services
Restful
None of the mentioned

Question9. ______forge software for the development of software projects.

Oozie
Allura
Ambari
All of the mentioned

Question10. Posting format now uses a ______API when writing postings just like doc values.

Push
Pull
Read
All of the mentioned

Question11. Point out the correct statement

Building Pylucene requires CNU make, a recent version of ant capable of building java lucene and a c++ compiler
Pylucene is supported on Mac OS X, linux, SOlaries and windows
Use of the setuptools is recommended for lucene
All the mentioned
Question12. ______builds virtual machines of branches trunk and 0.3 for KVM, VMWare and virtual box.

Bigtop-trunk-pakagetest
Bigtop-trunk-repository
Bigtop-VM-matrix
None of the mentioned

Question13. Zookeeper is used for configuration, leader election in cloud edition of

Solr
Solur
Solar101
Solr

Question14. How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of Mapreduce?

Keys are presented to reducer in sorted order; values for a given key are not sorted
Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order
Keys are presented to reducer in random order; values for a given key are not sorted
Keys are presented to reducer in random order; values for a given key are sorted in ascending order

Question15. Datastage RTI is real time integration pack for:

STD
ISD
EXD
None of the above

Question16. Which mapreduce stage serves as a barrier, where all the previous stages must be completed before it may proceed?

Combine
Group (a.k.a. ‘shuffle’)
Reduce
Write

Question17. Which of the following format is more compression aggressive?

Partition compressed
Record compressed
Block compressed
Uncompressed

Question18. ______is the way of encoding structured data in an efficient yet extensible format.

Thrift
Protocol buffers
Avro
None of the above

Question19. Which of the following argument is not supported by import-all-table tool?

Class name
Package name
Database name
Table name

Question20. Which of the following operating system is not supported by big top?

Fedora
Solaris
Ubuntu
SUSE

Question21. Distributed modes are mapped in the _____ file.

Groomservers
Grervers
Grsvers
Groom

Question22. ______is the architectural center of hadoop that allows multiple data processing engines.

YARN
Hive
Incubator
Chuckwa

Question23. Users can easily run spark on top of amazons______

‘infosphere
‘EC2
EMR
None of the above

Question24. Which of the following projects is interface definition language for hadoop?

Oozie
Mahout
Thrift
Impala

Question25. Output of the mapper is first written on the local disk for sorting and _____ process.

Shuffling
Secondary sorting
Forking
Reducing

Question26. HDT projects work with eclipse version _____ and above

Question27. Which of the following language is not supported by spark?

Java
Pascal
Scala
Python

Question28. Data analytics scripts are written in ______

Hivw
CQL
Piglatin
Java

Question29. Ripper is a browser based mobile phone emulator designed to aid in the development of ______bases mobile application.

Javascript’
Java
C++
HTML5

Question30. If you set the inline LOB limit to ____, all large objects will be placed in external storage.

Question31. Hadoop archives reliability by replicating the data across multiple hosts, and hence does not require _____ storage on hosts.

RAID
Standard RAID levels
ZFS
Operating system

Question32. The configuration file must be owned by the user running

Data manager
Node manager
Validation manager
None of the above

Question33. ______is non blocking a synchronous event driven high performance web framework

Question34. Falcon provides seamless integration with

HCatalog
Metastore
HBase
Kafka

Question35. One supported datatype that deserves special mention are:

Money
Counters
Smallint
Tinyint

Question36. ______are chukwa processes that actually produce data

Collectors
Agents
Hbase table
HCatalog

Question37. Which of the following hadoop file formats is supported by impala?

Sequencefile’
Avro
Rcfile
All of the above

Question38. Avro is said to be the future ______layer of hadoop

RMC
RPC
RDC
All of the above

Question39. ______nodes are the mechanism by which a workflow triggers the execution of a computation/processing task

Server
Client
Mechanism
Action

Question40. The ______attribute in the join node is the name of the workflow join node

Name
To
Down
All of the above

Question41. Yarn commands are invoked by the _____ script

Hive
Bin
Hadoop
Home

Question42. Which of the following function is used to read data in PIG?

Write
Read
Load
None of the above

Question43. Which of the following hive commands is not supported by hcatalog?

Alter index rebuild
Create new
Show functions
Drop table

Question44. Apache hadoop development tools is an effort undergoing incubation at

Question45. Kafka users key value pairs in the ______file format for configuration

RFC
Avro
Property
None of the above

Question46. Facebook tackles big data with ______based in hadoop

Project prism
Prism
Project big
Project data

Question47. The size of block in HDFCs is

512 bytes
64 mb
1024 kb
None of the above

Question48. Which is the most popular NoSQL databases for scalable big data store with hadoop?

Hbase
mongoDB
Cassandra
None of the above

Question 49. A ______- can route requests to multiple knox instances

Collector
Load balancer
Comparator
All of the above

Question50. Hcatalog is installed with hive, starting with hive release

0.10..0
0.9.0
0.11.0
0.12.0

Question51. Table metadata in hive is:

Stored as metadata on the name node
Stored along with the data in HDFCs
Stored in the metastore
Stored in zookeeper

Question52. Avro schemes are defined with ______

JSON
XML
JAVA
All of the above

Question53. Spark was initially started by ______at uc Berkeley AMPlab in 2009

Matei Zaharia
Mahek Zaharia
Doug cutting
Stonebreaker

Question54. ______does rewrite data and pack rows into column for certain time periods

Open TS
Open TSDB
Open TSD
Open DB

Question55. Which of the following phrases occur simultaneously

Shuffle and sort
Reduce and sort
Shuffle and map
All of the above

Question56. ______command fetches the contents of row or a cell

Select
Get
Put
None of the above

Quesiotn57. ______are encoded as a series of blocks

Arrays
Enum
Unions
Maps

Question58. Hive also support custom extensions written in

C#
Java
C
C++

Question59. How many types of nodes are present in storm cluster?

Question60. All decision nodes must have a ______element to avoid bringing the workflow into an error state if none of the predicates evaluates to true.

Name
Default
Server
Client

Question61. ______is a rest API for Hcatalog

Web hcat
Wbhcat
Inphcat
None of the above

Question62. Streaming supports streaming commands option as well as ______command options

Generic
Tool
Library
Task

Questio63. By default collectors listen on port

8008
8070
8080
None of the above

Question64. ______communicate with the client and handle data related operations.

Master server
Region server
Htable
All of the above

Question65. We can declare the scheme of our data either in ______file

JSON
XML
SQL
VB

Question66. ______provides a couchbase server hadoop connector by means of sqoop

Memcache
Couchbase
Hbase
All of the above

Question67. Storm integrates with ______via apache slider

Scheduler
Yarn
Compaction
All of the above

Question68. Avro-backed table can simply be created by using ______in a DDL statement

Stored as avro
Stored as hive’
Stored as avrohive
Stored as serd

Question69. Drill analyze semistructured/nested data coming from ______applications

RDBMS
NoSQL
newSQL
none of the above

Question70. The hadoop list includes the HBase Database, the apache mahout ______system and matrix operations.

Machine learning
Pattern recognition
Statistical classification
Articficial classification

Question71. Oozie workflow jobs are directed ______graphs of actions

Acyclical
Cyclical
Elliptical
All of the above

Question72. ___ is an open source SQL query engine for apache Hbase

Pig
Phoenix
Pivot’
None of the above

Question73. $ pig x tez_local will enable _____ mode in pig

Mapreduce
Tez
Local
None of the above

Question74. In comparison to SQl, pig uses

Lazy evaluation
ETL
Supports pipelines splits
All of the above

Question75. For Apache ______users, storm utilizes the same ODBC interfaces

C takes
Hive
Pig
Oozie

Question76. In one or more actions started by the workflow job are executed when the ______node is reached, the actions will be killed.

Kill’
Start
End
Finish

Question77. Which of the following data type is supported by hive?

Map
Record
String
Enum

Question78. Hcatalog supports reading and writing files in any format for which a _____ can be written

SerDE
SaerDear
Doc Sear
All

Question79. ______is python port of the core project

Solr
Lucene core
Lucy
Pylucene

Question80. Apache storm added open source, stream data processing to ______data platform

Cloudera
Hortonworks
Local cloudera
Map R

Question81. Which of the following is spatial information system?

Sling
Solr
SIS
All of the above

Question82. ______properties can be overridden by specifying them in a job-xml file or configuration element.

Pipe
Decision
Flag
None of the above

Question83. CDH process and control sensitive data and facilities:

Multi-tenancy
Flexibility
Scalability
All of the above

Qyestion84. Avro supports ______kinds of complex types

Question85. With ______we can store data and read it easily with various programming languages.

Thrift
Protocol buffers
Avro
None of the above

Question86. A float parameter defaults to 0.0001f, which means we can deal with 1 error every ______rows

1000
10000
1 million rows
None of the above

Question87. The ______data mapper framework makes it easier to use a database with Java or.NET applications

iBix
Helix
iBATIS
iBAT

Question88. ______is the most popular high level java API in Hadoop Ecosystem

scalding
HCatalog
Cascalog
Cascading

Question89. Spark includes a collection over ______operations for transforming data and familier data frame APIs for manipulating semi-structured data

Question90. Zookeper’s architecture supports high ______through redundant services

Flexibilty’
Scalability
Availability
Interactivity

Question91. The Lucene ______is pleased to announce the availability of Apache Lucene 5.0.0 and Apache solr 5.0.0

PMC
RPC
CPM
All of the above

Question92. EC2 capacity can be increased or decreased in real time from as few as one to more than ______virtual machines simultaneousl

1000
2000
3000
None of the above

Question93. HTD has been tested on______- and Juno. And can work 0n kepler as well

Raibow
Indigo
Idiavo
Hadovo

Question94. Each kafka partition has one server which acts as the ______

Leaders
Followers
Staters
All of the above

Question95. The right numbers of reduces seems to be

0.9
0.8
0.36
0.95

Question96. Which of the following is a configuration management system?

Alex
Puppet
Acem
None of the above

Question97. Which of the following is the only for storage with limited compute?

Hot
Cold
Warm
All_SSD

Question98. Grooms servers start up with a ______instance and a RPC proxy to contact the bsp master

RPC
BSP Peer
LPC
None of the above

Question99. A ______represents a distributed, immutable collection of elements of type t.

Pcollect
Pcollection
Pcol
All of the above

Question100. ______is used to read data from bytes buffers

Write{}
Read{}
Readwrite{}
All of the above

Q101-Which is the default Input Formats defined in Hadoop ?

SequenceFileInputFormat
ByteInputFormat
KeyValueInputFormat
TextInputFormat

Q102. Which of the following is not an input format in Hadoop ?

TextInputFormat
ByteInputFormat
SequenceFileInputFormat
KeyValueInputFormat

Q103. Which of the following is a valid flow in Hadoop ?

Input -> Reducer -> Mapper -> Combiner -> -> Output
Input -> Mapper -> Reducer -> Combiner -> Output
Input -> Mapper -> Combiner -> Reducer -> Output
Input -> Reducer -> Combiner -> Mapper -> Output

Q104. MapReduce was devised by ...

Apple
Google
Microsoft
Samsung

Q105. Which of the following is not a phase of Reducer ?

Map
Reduce
Shuffle
Sort

Q106. How many instances of Job tracker can run on Hadoop cluster ?

Q107. Which of the following is not the Dameon process that runs on a hadoop cluster ?

JobTracker
DataNode
TaskTracker
TaskNode

Q108-As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including:

Improved data storage and information retrieval
Improved extract, transform and load features for data integration
Improved data warehousing functionality
Improved security, workload management and SQL support

Q109-Point out the correct statement :

Hadoop do need specialized hardware to process the data
Hadoop 2.0 allows live stream processing of real time data
In Hadoop programming framework output files are divided in to lines or records
None of the mentioned

Q110-. According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop ?

Big data management and data mining
Data warehousing and business intelligence
Management of Hadoop clusters
Collecting and storing unstructured data

Q111- Point out the wrong statement :

Hardtop’s processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data
Hadoop uses a programming model called “MapReduce”, all the programs should confirms to this model in order to work on Hadoop platform
The programming model, MapReduce, used by Hadoop is difficult to write and test
All of the mentioned

Q112- What was Hadoop named after?

Creator Doug Cutting’s favorite circus act
Cutting’s high school rock band
The toy elephant of Cutting’s son
A sound Cutting’s laptop made during Hadoop’s development

Q113- All of the following accurately describe Hadoop, EXCEPT:

Open source
Real-time
Java-based
Distributed computing approach

Q114- ______can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.

MapReduce
Mahout
Oozie
All of the mentioned

Q115- ______has the world’s largest Hadoop cluster.

Apple
Datamatics
Facebook
None of the mentioned

Q116- Facebook Tackles Big Data With ______based on Hadoop.

‘Project Prism’
‘Prism’
‘Project Big’
‘Project Data’

Q 117- What is the main problem faced while reading and writing data in parallel from multiple disks?

Processing high volume of data faster.

Combining data from multiple disks.

The software required to do this task is extremely costly.

The hardware required to do this task is extremely costly.

Q118 - Under Hadoop High Availability, Fencing means

Preventing a previously active namenode from start running again.

Preventing the start of a failover in the event of network failure with the active namenode.

Preventing the power down to the previously active namenode.

Preventing a previously active namenode from writing to the edit log.

Q119 - The default replication factor for HDFS file system in hadoop is

Q120 - The hadfs command put is used to

Copy files from local file system to HDFS.

Copy files or directories from local file system to HDFS.

Copy files from from HDFS to local filesystem.

Copy files or directories from HDFS to local filesystem.

Q121 - The namenode knows that the datanode is active using a mechanism known as

heartbeats

datapulse

h-signal

Active-pulse

Q122 - When a machine is declared as a datanode, the disk space in it

Can be used only for HDFS storage

Can be used for both HDFS and non-HDFs storage

Cannot be accessed by non-hadoop commands
cannot store text files.

Q123 - The data from a remote hadoop cluster can

not be read by another hadoop cluster

be read using http

be read using hhtp

be read suing hftp

Q124 - Which one is not one of the big data feature?

Velocity

Veracity

volume

variety

Q125 - What is HBASE?

Hbase is separate set of the Java API for Hadoop cluster.

Hbase is a part of the Apache Hadoop project that provides interface for scanning large amount of data using Hadoop infrastructure.

Hbase is a "database" like interface to Hadoop cluster data.

HBase is a part of the Apache Hadoop project that provides a SQL like interface for data processing.

Q125 - Which of the following is false about RawComparator ?

Compare the keys by byte.

Performance can be improved in sort and suffle phase by using RawComparator.

Intermediary keys are deserialized to perform a comparison.

Q 126 - Zookeeper ensures that

All the namenodes are actively serving the client requests

Only one namenode is actively serving the client requests

A failover is triggered when any of the datanode fails.

A failover can not be started by hadoop administrator.

Q 127 - Which scenario demands highest bandwidth for data transfer between nodes in Hadoop?

Different nodes on the same rack

Nodes on different racks in the same data center.

Nodes in different data centers

Data on the same node.

Q128 - The hadoop frame work is written in

Python

Java

Q129 - When a client contacts the namenode for accessing a file, the namenode responds with

Size of the file requested.

Block ID of the file requested.

Block ID and hostname of any one of the data nodes containing that block.

Block ID and hostname of all the data nodes containing that block.

Q130 - Which of the following is not a goal of HDFS?

Fault detection and recovery

Handle huge dataset

Prevent deletion of data

Provide high network bandwidth for data movement

Q 131 - In HDFS the files cannot be

read

deleted

executed

Archived

Q132 - The number of tasks a task tracker can accept depends on

Maximum memory available in the node

Not limited

Number of slots configured in it

As decided by the jobTracker

Q133 - When using HDFS, what occurs when a file is deleted from the command line?

It is permanently deleted if trash is enabled.

It is placed into a trash directory common to all users for that cluster.

It is permanently deleted and the file attributes are recorded in a log file.

It is moved into the trash directory of the user who deleted it if trash is enabled.

Q134 - The org.apache.hadoop.io.Writable interface declares which two methods? (Choose 2 answers.)