Question1. Which of the following is a monitoring solution for hadoop?

  1. Sirona
  2. Sentry
  3. Slider
  4. Streams

Question2. ______is a distributed machine learning framework on top of spark

  1. MLlib
  2. Spark Streaming
  3. GraphX
  4. RDDs

Question3. Point out the correct statement?

  1. Knox is a stateless reverse proxy framework
  2. Knox also intercepts REST/HTTP calls and provides authentication
  3. Knox scales linearly by adding more knox nodes as the load increases
  4. All of the mentioned

Question4. PCollection, PTable, and PGroupedTable all support a ______operation.

  1. Intersection
  2. Union
  3. OR
  4. None of the mentioned

Question5. How many types of mode are present in Hama?
1. 2

2. 3

3. 4

4. 5

Question6. The IBM ______Platform provides all the foundational building blocks of trusted information, including data integration, data warehousing, master data management, big data and information governance.

  1. Infostream
  2. Infosphere
  3. Infosurface
  4. Infodata

Question7. ______is the name of the archive you would like to create.

  1. Archive
  2. Archive name
  3. Name
  4. None of the mentioned

Question 8. Ambari provides a ______API that enables integration with existing tools, such as Microsoft System Center.

  1. Restless
  2. Web services
  3. Restful
  4. None of the mentioned

Question9. ______forge software for the development of software projects.

  1. Oozie
  2. Allura
  3. Ambari
  4. All of the mentioned

Question10. Posting format now uses a ______API when writing postings just like doc values.

  1. Push
  2. Pull
  3. Read
  4. All of the mentioned

Question11. Point out the correct statement

  1. Building Pylucene requires CNU make, a recent version of ant capable of building java lucene and a c++ compiler
  2. Pylucene is supported on Mac OS X, linux, SOlaries and windows
  3. Use of the setuptools is recommended for lucene
  4. All the mentioned
  5. Question12. ______builds virtual machines of branches trunk and 0.3 for KVM, VMWare and virtual box.
  1. Bigtop-trunk-pakagetest
  2. Bigtop-trunk-repository
  3. Bigtop-VM-matrix
  4. None of the mentioned

Question13. Zookeeper is used for configuration, leader election in cloud edition of

  1. Solr
  2. Solur
  3. Solar101
  4. Solr

Question14. How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of Mapreduce?

  1. Keys are presented to reducer in sorted order; values for a given key are not sorted
  2. Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order
  3. Keys are presented to reducer in random order; values for a given key are not sorted
  4. Keys are presented to reducer in random order; values for a given key are sorted in ascending order

Question15. Datastage RTI is real time integration pack for:

  1. STD
  2. ISD
  3. EXD
  4. None of the above

Question16. Which mapreduce stage serves as a barrier, where all the previous stages must be completed before it may proceed?

  1. Combine
  2. Group (a.k.a. ‘shuffle’)
  3. Reduce
  4. Write

Question17. Which of the following format is more compression aggressive?

  1. Partition compressed
  2. Record compressed
  3. Block compressed
  4. Uncompressed

Question18. ______is the way of encoding structured data in an efficient yet extensible format.

  1. Thrift
  2. Protocol buffers
  3. Avro
  4. None of the above

Question19. Which of the following argument is not supported by import-all-table tool?

  1. Class name
  2. Package name
  3. Database name
  4. Table name

Question20. Which of the following operating system is not supported by big top?

  1. Fedora
  2. Solaris
  3. Ubuntu
  4. SUSE

Question21. Distributed modes are mapped in the _____ file.

  1. Groomservers
  2. Grervers
  3. Grsvers
  4. Groom

Question22. ______is the architectural center of hadoop that allows multiple data processing engines.

  1. YARN
  2. Hive
  3. Incubator
  4. Chuckwa

Question23. Users can easily run spark on top of amazons______

  1. ‘infosphere
  2. ‘EC2
  3. EMR
  4. None of the above

Question24. Which of the following projects is interface definition language for hadoop?

  1. Oozie
  2. Mahout
  3. Thrift
  4. Impala

Question25. Output of the mapper is first written on the local disk for sorting and _____ process.

  1. Shuffling
  2. Secondary sorting
  3. Forking
  4. Reducing

Question26. HDT projects work with eclipse version _____ and above

  1. 3.4
  2. 3.5
  3. 3.6
  4. 3.7

Question27. Which of the following language is not supported by spark?

  1. Java
  2. Pascal
  3. Scala
  4. Python

Question28. Data analytics scripts are written in ______

  1. Hivw
  2. CQL
  3. Piglatin
  4. Java

Question29. Ripper is a browser based mobile phone emulator designed to aid in the development of ______bases mobile application.

  1. Javascript’
  2. Java
  3. C++
  4. HTML5

Question30. If you set the inline LOB limit to ____, all large objects will be placed in external storage.

  1. 0
  2. 1
  3. 2
  4. 3

Question31. Hadoop archives reliability by replicating the data across multiple hosts, and hence does not require _____ storage on hosts.

  1. RAID
  2. Standard RAID levels
  3. ZFS
  4. Operating system

Question32. The configuration file must be owned by the user running

  1. Data manager
  2. Node manager
  3. Validation manager
  4. None of the above

Question33. ______is non blocking a synchronous event driven high performance web framework

  1. AWS
  2. AWF
  3. AWT
  4. ASW

Question34. Falcon provides seamless integration with

  1. HCatalog
  2. Metastore
  3. HBase
  4. Kafka

Question35. One supported datatype that deserves special mention are:

  1. Money
  2. Counters
  3. Smallint
  4. Tinyint

Question36. ______are chukwa processes that actually produce data

  1. Collectors
  2. Agents
  3. Hbase table
  4. HCatalog

Question37. Which of the following hadoop file formats is supported by impala?

  1. Sequencefile’
  2. Avro
  3. Rcfile
  4. All of the above

Question38. Avro is said to be the future ______layer of hadoop

  1. RMC
  2. RPC
  3. RDC
  4. All of the above

Question39. ______nodes are the mechanism by which a workflow triggers the execution of a computation/processing task

  1. Server
  2. Client
  3. Mechanism
  4. Action

Question40. The ______attribute in the join node is the name of the workflow join node

  1. Name
  2. To
  3. Down
  4. All of the above

Question41. Yarn commands are invoked by the _____ script

  1. Hive
  2. Bin
  3. Hadoop
  4. Home

Question42. Which of the following function is used to read data in PIG?

  1. Write
  2. Read
  3. Load
  4. None of the above

Question43. Which of the following hive commands is not supported by hcatalog?

  1. Alter index rebuild
  2. Create new
  3. Show functions
  4. Drop table

Question44. Apache hadoop development tools is an effort undergoing incubation at

  1. ADF
  2. ASF
  3. HCC
  4. AFS

Question45. Kafka users key value pairs in the ______file format for configuration

  1. RFC
  2. Avro
  3. Property
  4. None of the above

Question46. Facebook tackles big data with ______based in hadoop

  1. Project prism
  2. Prism
  3. Project big
  4. Project data

Question47. The size of block in HDFCs is

  1. 512 bytes
  2. 64 mb
  3. 1024 kb
  4. None of the above

Question48. Which is the most popular NoSQL databases for scalable big data store with hadoop?

  1. Hbase
  2. mongoDB
  3. Cassandra
  4. None of the above

Question 49. A ______- can route requests to multiple knox instances

  1. Collector
  2. Load balancer
  3. Comparator
  4. All of the above

Question50. Hcatalog is installed with hive, starting with hive release

  1. 0.10..0
  2. 0.9.0
  3. 0.11.0
  4. 0.12.0

Question51. Table metadata in hive is:

  1. Stored as metadata on the name node
  2. Stored along with the data in HDFCs
  3. Stored in the metastore
  4. Stored in zookeeper

Question52. Avro schemes are defined with ______

  1. JSON
  2. XML
  3. JAVA
  4. All of the above

Question53. Spark was initially started by ______at uc Berkeley AMPlab in 2009

  1. Matei Zaharia
  2. Mahek Zaharia
  3. Doug cutting
  4. Stonebreaker

Question54. ______does rewrite data and pack rows into column for certain time periods

  1. Open TS
  2. Open TSDB
  3. Open TSD
  4. Open DB

Question55. Which of the following phrases occur simultaneously

  1. Shuffle and sort
  2. Reduce and sort
  3. Shuffle and map
  4. All of the above

Question56. ______command fetches the contents of row or a cell

  1. Select
  2. Get
  3. Put
  4. None of the above

Quesiotn57. ______are encoded as a series of blocks

  1. Arrays
  2. Enum
  3. Unions
  4. Maps

Question58. Hive also support custom extensions written in

  1. C#
  2. Java
  3. C
  4. C++

Question59. How many types of nodes are present in storm cluster?

  1. 1
  2. 2
  3. 3
  4. 4

Question60. All decision nodes must have a ______element to avoid bringing the workflow into an error state if none of the predicates evaluates to true.

  1. Name
  2. Default
  3. Server
  4. Client

Question61. ______is a rest API for Hcatalog

  1. Web hcat
  2. Wbhcat
  3. Inphcat
  4. None of the above

Question62. Streaming supports streaming commands option as well as ______command options

  1. Generic
  2. Tool
  3. Library
  4. Task

Questio63. By default collectors listen on port

  1. 8008
  2. 8070
  3. 8080
  4. None of the above

Question64. ______communicate with the client and handle data related operations.

  1. Master server
  2. Region server
  3. Htable
  4. All of the above

Question65. We can declare the scheme of our data either in ______file

  1. JSON
  2. XML
  3. SQL
  4. VB

Question66. ______provides a couchbase server hadoop connector by means of sqoop

  1. Memcache
  2. Couchbase
  3. Hbase
  4. All of the above

Question67. Storm integrates with ______via apache slider

  1. Scheduler
  2. Yarn
  3. Compaction
  4. All of the above

Question68. Avro-backed table can simply be created by using ______in a DDL statement

  1. Stored as avro
  2. Stored as hive’
  3. Stored as avrohive
  4. Stored as serd

Question69. Drill analyze semistructured/nested data coming from ______applications

  1. RDBMS
  2. NoSQL
  3. newSQL
  4. none of the above

Question70. The hadoop list includes the HBase Database, the apache mahout ______system and matrix operations.

  1. Machine learning
  2. Pattern recognition
  3. Statistical classification
  4. Articficial classification

Question71. Oozie workflow jobs are directed ______graphs of actions

  1. Acyclical
  2. Cyclical
  3. Elliptical
  4. All of the above

Question72. ___ is an open source SQL query engine for apache Hbase

  1. Pig
  2. Phoenix
  3. Pivot’
  4. None of the above

Question73. $ pig x tez_local will enable _____ mode in pig

  1. Mapreduce
  2. Tez
  3. Local
  4. None of the above

Question74. In comparison to SQl, pig uses

  1. Lazy evaluation
  2. ETL
  3. Supports pipelines splits
  4. All of the above

Question75. For Apache ______users, storm utilizes the same ODBC interfaces

  1. C takes
  2. Hive
  3. Pig
  4. Oozie

Question76. In one or more actions started by the workflow job are executed when the ______node is reached, the actions will be killed.

  1. Kill’
  2. Start
  3. End
  4. Finish

Question77. Which of the following data type is supported by hive?

  1. Map
  2. Record
  3. String
  4. Enum

Question78. Hcatalog supports reading and writing files in any format for which a _____ can be written

  1. SerDE
  2. SaerDear
  3. Doc Sear
  4. All

Question79. ______is python port of the core project

  1. Solr
  2. Lucene core
  3. Lucy
  4. Pylucene

Question80. Apache storm added open source, stream data processing to ______data platform

  1. Cloudera
  2. Hortonworks
  3. Local cloudera
  4. Map R

Question81. Which of the following is spatial information system?

  1. Sling
  2. Solr
  3. SIS
  4. All of the above

Question82. ______properties can be overridden by specifying them in a job-xml file or configuration element.

  1. Pipe
  2. Decision
  3. Flag
  4. None of the above

Question83. CDH process and control sensitive data and facilities:

  1. Multi-tenancy
  2. Flexibility
  3. Scalability
  4. All of the above

Qyestion84. Avro supports ______kinds of complex types

  1. 3
  2. 4
  3. 6
  4. 7

Question85. With ______we can store data and read it easily with various programming languages.

  1. Thrift
  2. Protocol buffers
  3. Avro
  4. None of the above

Question86. A float parameter defaults to 0.0001f, which means we can deal with 1 error every ______rows

  1. 1000
  2. 10000
  3. 1 million rows
  4. None of the above

Question87. The ______data mapper framework makes it easier to use a database with Java or.NET applications

  1. iBix
  2. Helix
  3. iBATIS
  4. iBAT

Question88. ______is the most popular high level java API in Hadoop Ecosystem

  1. scalding
  2. HCatalog
  3. Cascalog
  4. Cascading

Question89. Spark includes a collection over ______operations for transforming data and familier data frame APIs for manipulating semi-structured data

  1. 50
  2. 60
  3. 70
  4. 80

Question90. Zookeper’s architecture supports high ______through redundant services

  1. Flexibilty’
  2. Scalability
  3. Availability
  4. Interactivity

Question91. The Lucene ______is pleased to announce the availability of Apache Lucene 5.0.0 and Apache solr 5.0.0

  1. PMC
  2. RPC
  3. CPM
  4. All of the above

Question92. EC2 capacity can be increased or decreased in real time from as few as one to more than ______virtual machines simultaneousl

  1. 1000
  2. 2000
  3. 3000
  4. None of the above

Question93. HTD has been tested on______- and Juno. And can work 0n kepler as well

  1. Raibow
  2. Indigo
  3. Idiavo
  4. Hadovo

Question94. Each kafka partition has one server which acts as the ______

  1. Leaders
  2. Followers
  3. Staters
  4. All of the above

Question95. The right numbers of reduces seems to be

  1. 0.9
  2. 0.8
  3. 0.36
  4. 0.95

Question96. Which of the following is a configuration management system?

  1. Alex
  2. Puppet
  3. Acem
  4. None of the above

Question97. Which of the following is the only for storage with limited compute?

  1. Hot
  2. Cold
  3. Warm
  4. All_SSD

Question98. Grooms servers start up with a ______instance and a RPC proxy to contact the bsp master

  1. RPC
  2. BSP Peer
  3. LPC
  4. None of the above

Question99. A ______represents a distributed, immutable collection of elements of type t.

  1. Pcollect
  2. Pcollection
  3. Pcol
  4. All of the above

Question100. ______is used to read data from bytes buffers

  1. Write{}
  2. Read{}
  3. Readwrite{}
  4. All of the above

Q101-Which is the default Input Formats defined in Hadoop ?

  1. SequenceFileInputFormat
  2. ByteInputFormat
  3. KeyValueInputFormat
  4. TextInputFormat

Q102. Which of the following is not an input format in Hadoop ?

  1. TextInputFormat
  2. ByteInputFormat
  3. SequenceFileInputFormat
  4. KeyValueInputFormat

Q103. Which of the following is a valid flow in Hadoop ?

  1. Input -> Reducer -> Mapper -> Combiner -> -> Output
  2. Input -> Mapper -> Reducer -> Combiner -> Output
  3. Input -> Mapper -> Combiner -> Reducer -> Output
  4. Input -> Reducer -> Combiner -> Mapper -> Output

Q104. MapReduce was devised by ...

  1. Apple
  2. Google
  3. Microsoft
  4. Samsung

Q105. Which of the following is not a phase of Reducer ?

  1. Map
  2. Reduce
  3. Shuffle
  4. Sort

Q106. How many instances of Job tracker can run on Hadoop cluster ?

  1. 1
  2. 2
  3. 3
  4. 4

Q107. Which of the following is not the Dameon process that runs on a hadoop cluster ?

  1. JobTracker
  2. DataNode
  3. TaskTracker
  4. TaskNode

Q108-As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including:

  1. Improved data storage and information retrieval
  2. Improved extract, transform and load features for data integration
  3. Improved data warehousing functionality
  4. Improved security, workload management and SQL support

Q109-Point out the correct statement :

  1. Hadoop do need specialized hardware to process the data
  2. Hadoop 2.0 allows live stream processing of real time data
  3. In Hadoop programming framework output files are divided in to lines or records
  4. None of the mentioned

Q110-. According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop ?

  1. Big data management and data mining
  2. Data warehousing and business intelligence
  3. Management of Hadoop clusters
  4. Collecting and storing unstructured data

Q111- Point out the wrong statement :

  1. Hardtop’s processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data
  2. Hadoop uses a programming model called “MapReduce”, all the programs should confirms to this model in order to work on Hadoop platform
  3. The programming model, MapReduce, used by Hadoop is difficult to write and test
  4. All of the mentioned

Q112- What was Hadoop named after?

  1. Creator Doug Cutting’s favorite circus act
  2. Cutting’s high school rock band
  3. The toy elephant of Cutting’s son
  4. A sound Cutting’s laptop made during Hadoop’s development

Q113- All of the following accurately describe Hadoop, EXCEPT:

  1. Open source
  2. Real-time
  3. Java-based
  4. Distributed computing approach

Q114- ______can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.

  1. MapReduce
  2. Mahout
  3. Oozie
  4. All of the mentioned

Q115- ______has the world’s largest Hadoop cluster.

  1. Apple
  2. Datamatics
  3. Facebook
  4. None of the mentioned

Q116- Facebook Tackles Big Data With ______based on Hadoop.

  1. ‘Project Prism’
  2. ‘Prism’
  3. ‘Project Big’
  4. ‘Project Data’

Q 117- What is the main problem faced while reading and writing data in parallel from multiple disks?

  1. Processing high volume of data faster.
  1. Combining data from multiple disks.
  1. The software required to do this task is extremely costly.
  1. The hardware required to do this task is extremely costly.

Q118 - Under Hadoop High Availability, Fencing means

  1. Preventing a previously active namenode from start running again.
  1. Preventing the start of a failover in the event of network failure with the active namenode.
  1. Preventing the power down to the previously active namenode.
  1. Preventing a previously active namenode from writing to the edit log.

Q119 - The default replication factor for HDFS file system in hadoop is

  1. 1
  1. 2
  1. 3
  1. 4

Q120 - The hadfs command put is used to

  1. Copy files from local file system to HDFS.
  1. Copy files or directories from local file system to HDFS.
  1. Copy files from from HDFS to local filesystem.
  1. Copy files or directories from HDFS to local filesystem.

Q121 - The namenode knows that the datanode is active using a mechanism known as

  1. heartbeats
  1. datapulse
  1. h-signal
  1. Active-pulse

Q122 - When a machine is declared as a datanode, the disk space in it

  1. Can be used only for HDFS storage
  1. Can be used for both HDFS and non-HDFs storage
  1. Cannot be accessed by non-hadoop commands
  2. cannot store text files.

Q123 - The data from a remote hadoop cluster can

  1. not be read by another hadoop cluster
  1. be read using http
  1. be read using hhtp
  1. be read suing hftp

Q124 - Which one is not one of the big data feature?

  1. Velocity
  1. Veracity
  1. volume
  1. variety

Q125 - What is HBASE?

  1. Hbase is separate set of the Java API for Hadoop cluster.
  1. Hbase is a part of the Apache Hadoop project that provides interface for scanning large amount of data using Hadoop infrastructure.
  1. Hbase is a "database" like interface to Hadoop cluster data.
  1. HBase is a part of the Apache Hadoop project that provides a SQL like interface for data processing.

Q125 - Which of the following is false about RawComparator ?

  1. Compare the keys by byte.
  1. Performance can be improved in sort and suffle phase by using RawComparator.
  1. Intermediary keys are deserialized to perform a comparison.

Q 126 - Zookeeper ensures that

  1. All the namenodes are actively serving the client requests
  1. Only one namenode is actively serving the client requests
  1. A failover is triggered when any of the datanode fails.
  1. A failover can not be started by hadoop administrator.

Q 127 - Which scenario demands highest bandwidth for data transfer between nodes in Hadoop?

  1. Different nodes on the same rack
  1. Nodes on different racks in the same data center.
  1. Nodes in different data centers
  1. Data on the same node.

Q128 - The hadoop frame work is written in

  1. C++
  1. Python
  1. Java
  1. GO

Q129 - When a client contacts the namenode for accessing a file, the namenode responds with

  1. Size of the file requested.
  1. Block ID of the file requested.
  1. Block ID and hostname of any one of the data nodes containing that block.
  1. Block ID and hostname of all the data nodes containing that block.

Q130 - Which of the following is not a goal of HDFS?

  1. Fault detection and recovery
  1. Handle huge dataset
  1. Prevent deletion of data
  1. Provide high network bandwidth for data movement

Q 131 - In HDFS the files cannot be

  1. read
  1. deleted
  1. executed
  1. Archived

Q132 - The number of tasks a task tracker can accept depends on

  1. Maximum memory available in the node
  1. Not limited
  1. Number of slots configured in it
  1. As decided by the jobTracker

Q133 - When using HDFS, what occurs when a file is deleted from the command line?

  1. It is permanently deleted if trash is enabled.
  1. It is placed into a trash directory common to all users for that cluster.
  1. It is permanently deleted and the file attributes are recorded in a log file.
  1. It is moved into the trash directory of the user who deleted it if trash is enabled.

Q134 - The org.apache.hadoop.io.Writable interface declares which two methods? (Choose 2 answers.)