Question1. Which of the following is a monitoring solution for hadoop?
- Sirona
- Sentry
- Slider
- Streams
Question2. ______is a distributed machine learning framework on top of spark
- MLlib
- Spark Streaming
- GraphX
- RDDs
Question3. Point out the correct statement?
- Knox is a stateless reverse proxy framework
- Knox also intercepts REST/HTTP calls and provides authentication
- Knox scales linearly by adding more knox nodes as the load increases
- All of the mentioned
Question4. PCollection, PTable, and PGroupedTable all support a ______operation.
- Intersection
- Union
- OR
- None of the mentioned
Question5. How many types of mode are present in Hama?
1. 2
2. 3
3. 4
4. 5
Question6. The IBM ______Platform provides all the foundational building blocks of trusted information, including data integration, data warehousing, master data management, big data and information governance.
- Infostream
- Infosphere
- Infosurface
- Infodata
Question7. ______is the name of the archive you would like to create.
- Archive
- Archive name
- Name
- None of the mentioned
Question 8. Ambari provides a ______API that enables integration with existing tools, such as Microsoft System Center.
- Restless
- Web services
- Restful
- None of the mentioned
Question9. ______forge software for the development of software projects.
- Oozie
- Allura
- Ambari
- All of the mentioned
Question10. Posting format now uses a ______API when writing postings just like doc values.
- Push
- Pull
- Read
- All of the mentioned
Question11. Point out the correct statement
- Building Pylucene requires CNU make, a recent version of ant capable of building java lucene and a c++ compiler
- Pylucene is supported on Mac OS X, linux, SOlaries and windows
- Use of the setuptools is recommended for lucene
- All the mentioned
- Question12. ______builds virtual machines of branches trunk and 0.3 for KVM, VMWare and virtual box.
- Bigtop-trunk-pakagetest
- Bigtop-trunk-repository
- Bigtop-VM-matrix
- None of the mentioned
Question13. Zookeeper is used for configuration, leader election in cloud edition of
- Solr
- Solur
- Solar101
- Solr
Question14. How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of Mapreduce?
- Keys are presented to reducer in sorted order; values for a given key are not sorted
- Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order
- Keys are presented to reducer in random order; values for a given key are not sorted
- Keys are presented to reducer in random order; values for a given key are sorted in ascending order
Question15. Datastage RTI is real time integration pack for:
- STD
- ISD
- EXD
- None of the above
Question16. Which mapreduce stage serves as a barrier, where all the previous stages must be completed before it may proceed?
- Combine
- Group (a.k.a. ‘shuffle’)
- Reduce
- Write
Question17. Which of the following format is more compression aggressive?
- Partition compressed
- Record compressed
- Block compressed
- Uncompressed
Question18. ______is the way of encoding structured data in an efficient yet extensible format.
- Thrift
- Protocol buffers
- Avro
- None of the above
Question19. Which of the following argument is not supported by import-all-table tool?
- Class name
- Package name
- Database name
- Table name
Question20. Which of the following operating system is not supported by big top?
- Fedora
- Solaris
- Ubuntu
- SUSE
Question21. Distributed modes are mapped in the _____ file.
- Groomservers
- Grervers
- Grsvers
- Groom
Question22. ______is the architectural center of hadoop that allows multiple data processing engines.
- YARN
- Hive
- Incubator
- Chuckwa
Question23. Users can easily run spark on top of amazons______
- ‘infosphere
- ‘EC2
- EMR
- None of the above
Question24. Which of the following projects is interface definition language for hadoop?
- Oozie
- Mahout
- Thrift
- Impala
Question25. Output of the mapper is first written on the local disk for sorting and _____ process.
- Shuffling
- Secondary sorting
- Forking
- Reducing
Question26. HDT projects work with eclipse version _____ and above
- 3.4
- 3.5
- 3.6
- 3.7
Question27. Which of the following language is not supported by spark?
- Java
- Pascal
- Scala
- Python
Question28. Data analytics scripts are written in ______
- Hivw
- CQL
- Piglatin
- Java
Question29. Ripper is a browser based mobile phone emulator designed to aid in the development of ______bases mobile application.
- Javascript’
- Java
- C++
- HTML5
Question30. If you set the inline LOB limit to ____, all large objects will be placed in external storage.
- 0
- 1
- 2
- 3
Question31. Hadoop archives reliability by replicating the data across multiple hosts, and hence does not require _____ storage on hosts.
- RAID
- Standard RAID levels
- ZFS
- Operating system
Question32. The configuration file must be owned by the user running
- Data manager
- Node manager
- Validation manager
- None of the above
Question33. ______is non blocking a synchronous event driven high performance web framework
- AWS
- AWF
- AWT
- ASW
Question34. Falcon provides seamless integration with
- HCatalog
- Metastore
- HBase
- Kafka
Question35. One supported datatype that deserves special mention are:
- Money
- Counters
- Smallint
- Tinyint
Question36. ______are chukwa processes that actually produce data
- Collectors
- Agents
- Hbase table
- HCatalog
Question37. Which of the following hadoop file formats is supported by impala?
- Sequencefile’
- Avro
- Rcfile
- All of the above
Question38. Avro is said to be the future ______layer of hadoop
- RMC
- RPC
- RDC
- All of the above
Question39. ______nodes are the mechanism by which a workflow triggers the execution of a computation/processing task
- Server
- Client
- Mechanism
- Action
Question40. The ______attribute in the join node is the name of the workflow join node
- Name
- To
- Down
- All of the above
Question41. Yarn commands are invoked by the _____ script
- Hive
- Bin
- Hadoop
- Home
Question42. Which of the following function is used to read data in PIG?
- Write
- Read
- Load
- None of the above
Question43. Which of the following hive commands is not supported by hcatalog?
- Alter index rebuild
- Create new
- Show functions
- Drop table
Question44. Apache hadoop development tools is an effort undergoing incubation at
- ADF
- ASF
- HCC
- AFS
Question45. Kafka users key value pairs in the ______file format for configuration
- RFC
- Avro
- Property
- None of the above
Question46. Facebook tackles big data with ______based in hadoop
- Project prism
- Prism
- Project big
- Project data
Question47. The size of block in HDFCs is
- 512 bytes
- 64 mb
- 1024 kb
- None of the above
Question48. Which is the most popular NoSQL databases for scalable big data store with hadoop?
- Hbase
- mongoDB
- Cassandra
- None of the above
Question 49. A ______- can route requests to multiple knox instances
- Collector
- Load balancer
- Comparator
- All of the above
Question50. Hcatalog is installed with hive, starting with hive release
- 0.10..0
- 0.9.0
- 0.11.0
- 0.12.0
Question51. Table metadata in hive is:
- Stored as metadata on the name node
- Stored along with the data in HDFCs
- Stored in the metastore
- Stored in zookeeper
Question52. Avro schemes are defined with ______
- JSON
- XML
- JAVA
- All of the above
Question53. Spark was initially started by ______at uc Berkeley AMPlab in 2009
- Matei Zaharia
- Mahek Zaharia
- Doug cutting
- Stonebreaker
Question54. ______does rewrite data and pack rows into column for certain time periods
- Open TS
- Open TSDB
- Open TSD
- Open DB
Question55. Which of the following phrases occur simultaneously
- Shuffle and sort
- Reduce and sort
- Shuffle and map
- All of the above
Question56. ______command fetches the contents of row or a cell
- Select
- Get
- Put
- None of the above
Quesiotn57. ______are encoded as a series of blocks
- Arrays
- Enum
- Unions
- Maps
Question58. Hive also support custom extensions written in
- C#
- Java
- C
- C++
Question59. How many types of nodes are present in storm cluster?
- 1
- 2
- 3
- 4
Question60. All decision nodes must have a ______element to avoid bringing the workflow into an error state if none of the predicates evaluates to true.
- Name
- Default
- Server
- Client
Question61. ______is a rest API for Hcatalog
- Web hcat
- Wbhcat
- Inphcat
- None of the above
Question62. Streaming supports streaming commands option as well as ______command options
- Generic
- Tool
- Library
- Task
Questio63. By default collectors listen on port
- 8008
- 8070
- 8080
- None of the above
Question64. ______communicate with the client and handle data related operations.
- Master server
- Region server
- Htable
- All of the above
Question65. We can declare the scheme of our data either in ______file
- JSON
- XML
- SQL
- VB
Question66. ______provides a couchbase server hadoop connector by means of sqoop
- Memcache
- Couchbase
- Hbase
- All of the above
Question67. Storm integrates with ______via apache slider
- Scheduler
- Yarn
- Compaction
- All of the above
Question68. Avro-backed table can simply be created by using ______in a DDL statement
- Stored as avro
- Stored as hive’
- Stored as avrohive
- Stored as serd
Question69. Drill analyze semistructured/nested data coming from ______applications
- RDBMS
- NoSQL
- newSQL
- none of the above
Question70. The hadoop list includes the HBase Database, the apache mahout ______system and matrix operations.
- Machine learning
- Pattern recognition
- Statistical classification
- Articficial classification
Question71. Oozie workflow jobs are directed ______graphs of actions
- Acyclical
- Cyclical
- Elliptical
- All of the above
Question72. ___ is an open source SQL query engine for apache Hbase
- Pig
- Phoenix
- Pivot’
- None of the above
Question73. $ pig x tez_local will enable _____ mode in pig
- Mapreduce
- Tez
- Local
- None of the above
Question74. In comparison to SQl, pig uses
- Lazy evaluation
- ETL
- Supports pipelines splits
- All of the above
Question75. For Apache ______users, storm utilizes the same ODBC interfaces
- C takes
- Hive
- Pig
- Oozie
Question76. In one or more actions started by the workflow job are executed when the ______node is reached, the actions will be killed.
- Kill’
- Start
- End
- Finish
Question77. Which of the following data type is supported by hive?
- Map
- Record
- String
- Enum
Question78. Hcatalog supports reading and writing files in any format for which a _____ can be written
- SerDE
- SaerDear
- Doc Sear
- All
Question79. ______is python port of the core project
- Solr
- Lucene core
- Lucy
- Pylucene
Question80. Apache storm added open source, stream data processing to ______data platform
- Cloudera
- Hortonworks
- Local cloudera
- Map R
Question81. Which of the following is spatial information system?
- Sling
- Solr
- SIS
- All of the above
Question82. ______properties can be overridden by specifying them in a job-xml file or configuration element.
- Pipe
- Decision
- Flag
- None of the above
Question83. CDH process and control sensitive data and facilities:
- Multi-tenancy
- Flexibility
- Scalability
- All of the above
Qyestion84. Avro supports ______kinds of complex types
- 3
- 4
- 6
- 7
Question85. With ______we can store data and read it easily with various programming languages.
- Thrift
- Protocol buffers
- Avro
- None of the above
Question86. A float parameter defaults to 0.0001f, which means we can deal with 1 error every ______rows
- 1000
- 10000
- 1 million rows
- None of the above
Question87. The ______data mapper framework makes it easier to use a database with Java or.NET applications
- iBix
- Helix
- iBATIS
- iBAT
Question88. ______is the most popular high level java API in Hadoop Ecosystem
- scalding
- HCatalog
- Cascalog
- Cascading
Question89. Spark includes a collection over ______operations for transforming data and familier data frame APIs for manipulating semi-structured data
- 50
- 60
- 70
- 80
Question90. Zookeper’s architecture supports high ______through redundant services
- Flexibilty’
- Scalability
- Availability
- Interactivity
Question91. The Lucene ______is pleased to announce the availability of Apache Lucene 5.0.0 and Apache solr 5.0.0
- PMC
- RPC
- CPM
- All of the above
Question92. EC2 capacity can be increased or decreased in real time from as few as one to more than ______virtual machines simultaneousl
- 1000
- 2000
- 3000
- None of the above
Question93. HTD has been tested on______- and Juno. And can work 0n kepler as well
- Raibow
- Indigo
- Idiavo
- Hadovo
Question94. Each kafka partition has one server which acts as the ______
- Leaders
- Followers
- Staters
- All of the above
Question95. The right numbers of reduces seems to be
- 0.9
- 0.8
- 0.36
- 0.95
Question96. Which of the following is a configuration management system?
- Alex
- Puppet
- Acem
- None of the above
Question97. Which of the following is the only for storage with limited compute?
- Hot
- Cold
- Warm
- All_SSD
Question98. Grooms servers start up with a ______instance and a RPC proxy to contact the bsp master
- RPC
- BSP Peer
- LPC
- None of the above
Question99. A ______represents a distributed, immutable collection of elements of type t.
- Pcollect
- Pcollection
- Pcol
- All of the above
Question100. ______is used to read data from bytes buffers
- Write{}
- Read{}
- Readwrite{}
- All of the above
Q101-Which is the default Input Formats defined in Hadoop ?
- SequenceFileInputFormat
- ByteInputFormat
- KeyValueInputFormat
- TextInputFormat
Q102. Which of the following is not an input format in Hadoop ?
- TextInputFormat
- ByteInputFormat
- SequenceFileInputFormat
- KeyValueInputFormat
Q103. Which of the following is a valid flow in Hadoop ?
- Input -> Reducer -> Mapper -> Combiner -> -> Output
- Input -> Mapper -> Reducer -> Combiner -> Output
- Input -> Mapper -> Combiner -> Reducer -> Output
- Input -> Reducer -> Combiner -> Mapper -> Output
Q104. MapReduce was devised by ...
- Apple
- Microsoft
- Samsung
Q105. Which of the following is not a phase of Reducer ?
- Map
- Reduce
- Shuffle
- Sort
Q106. How many instances of Job tracker can run on Hadoop cluster ?
- 1
- 2
- 3
- 4
Q107. Which of the following is not the Dameon process that runs on a hadoop cluster ?
- JobTracker
- DataNode
- TaskTracker
- TaskNode
Q108-As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including:
- Improved data storage and information retrieval
- Improved extract, transform and load features for data integration
- Improved data warehousing functionality
- Improved security, workload management and SQL support
Q109-Point out the correct statement :
- Hadoop do need specialized hardware to process the data
- Hadoop 2.0 allows live stream processing of real time data
- In Hadoop programming framework output files are divided in to lines or records
- None of the mentioned
Q110-. According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop ?
- Big data management and data mining
- Data warehousing and business intelligence
- Management of Hadoop clusters
- Collecting and storing unstructured data
Q111- Point out the wrong statement :
- Hardtop’s processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data
- Hadoop uses a programming model called “MapReduce”, all the programs should confirms to this model in order to work on Hadoop platform
- The programming model, MapReduce, used by Hadoop is difficult to write and test
- All of the mentioned
Q112- What was Hadoop named after?
- Creator Doug Cutting’s favorite circus act
- Cutting’s high school rock band
- The toy elephant of Cutting’s son
- A sound Cutting’s laptop made during Hadoop’s development
Q113- All of the following accurately describe Hadoop, EXCEPT:
- Open source
- Real-time
- Java-based
- Distributed computing approach
Q114- ______can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.
- MapReduce
- Mahout
- Oozie
- All of the mentioned
Q115- ______has the world’s largest Hadoop cluster.
- Apple
- Datamatics
- None of the mentioned
Q116- Facebook Tackles Big Data With ______based on Hadoop.
- ‘Project Prism’
- ‘Prism’
- ‘Project Big’
- ‘Project Data’
Q 117- What is the main problem faced while reading and writing data in parallel from multiple disks?
- Processing high volume of data faster.
- Combining data from multiple disks.
- The software required to do this task is extremely costly.
- The hardware required to do this task is extremely costly.
Q118 - Under Hadoop High Availability, Fencing means
- Preventing a previously active namenode from start running again.
- Preventing the start of a failover in the event of network failure with the active namenode.
- Preventing the power down to the previously active namenode.
- Preventing a previously active namenode from writing to the edit log.
Q119 - The default replication factor for HDFS file system in hadoop is
- 1
- 2
- 3
- 4
Q120 - The hadfs command put is used to
- Copy files from local file system to HDFS.
- Copy files or directories from local file system to HDFS.
- Copy files from from HDFS to local filesystem.
- Copy files or directories from HDFS to local filesystem.
Q121 - The namenode knows that the datanode is active using a mechanism known as
- heartbeats
- datapulse
- h-signal
- Active-pulse
Q122 - When a machine is declared as a datanode, the disk space in it
- Can be used only for HDFS storage
- Can be used for both HDFS and non-HDFs storage
- Cannot be accessed by non-hadoop commands
- cannot store text files.
Q123 - The data from a remote hadoop cluster can
- not be read by another hadoop cluster
- be read using http
- be read using hhtp
- be read suing hftp
Q124 - Which one is not one of the big data feature?
- Velocity
- Veracity
- volume
- variety
Q125 - What is HBASE?
- Hbase is separate set of the Java API for Hadoop cluster.
- Hbase is a part of the Apache Hadoop project that provides interface for scanning large amount of data using Hadoop infrastructure.
- Hbase is a "database" like interface to Hadoop cluster data.
- HBase is a part of the Apache Hadoop project that provides a SQL like interface for data processing.
Q125 - Which of the following is false about RawComparator ?
- Compare the keys by byte.
- Performance can be improved in sort and suffle phase by using RawComparator.
- Intermediary keys are deserialized to perform a comparison.
Q 126 - Zookeeper ensures that
- All the namenodes are actively serving the client requests
- Only one namenode is actively serving the client requests
- A failover is triggered when any of the datanode fails.
- A failover can not be started by hadoop administrator.
Q 127 - Which scenario demands highest bandwidth for data transfer between nodes in Hadoop?
- Different nodes on the same rack
- Nodes on different racks in the same data center.
- Nodes in different data centers
- Data on the same node.
Q128 - The hadoop frame work is written in
- C++
- Python
- Java
- GO
Q129 - When a client contacts the namenode for accessing a file, the namenode responds with
- Size of the file requested.
- Block ID of the file requested.
- Block ID and hostname of any one of the data nodes containing that block.
- Block ID and hostname of all the data nodes containing that block.
Q130 - Which of the following is not a goal of HDFS?
- Fault detection and recovery
- Handle huge dataset
- Prevent deletion of data
- Provide high network bandwidth for data movement
Q 131 - In HDFS the files cannot be
- read
- deleted
- executed
- Archived
Q132 - The number of tasks a task tracker can accept depends on
- Maximum memory available in the node
- Not limited
- Number of slots configured in it
- As decided by the jobTracker
Q133 - When using HDFS, what occurs when a file is deleted from the command line?
- It is permanently deleted if trash is enabled.
- It is placed into a trash directory common to all users for that cluster.
- It is permanently deleted and the file attributes are recorded in a log file.
- It is moved into the trash directory of the user who deleted it if trash is enabled.
Q134 - The org.apache.hadoop.io.Writable interface declares which two methods? (Choose 2 answers.)