Medha Banda

Senior Hadoop Developer

PROFESSIONAL SUMMARY:

Ø  Having 9+ years of overall IT experience with 4+ Years of comprehensive experience as anApacheHadoopDeveloper.

Ø  Expertise in writingHadoopJobs for analyzing structured and unstructured data using HDFS, Hive, HBase, Pig, Spark, Kafka, Scala, Oozie and Talend ETL.

Ø  Good knowledge ofHadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce concepts.

Ø  Experience in working with different kind of MapReduce programs using Hadoopfor working with Big Data analysis.

Ø  Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.

Ø  Experience in importing/exporting data using Sqoop into HDFS from Relational Database Systems and vice-versa.

Ø  Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.

Ø  Experience in providing support to data analyst in running Pig and Hive queries.

Ø  Experience in writing shell scripts to dump the shared data from MySQL servers to HDFS.

Ø  Experience in designing both time driven and data driven automated workflows using Oozie.

Ø  Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.

Ø  Experience in automating the Hadoop Installation, configuration and maintaining the cluster by using the tools like Puppet.

Ø  Experience in working with flume to load the log data from multiple sources directly into HDFS.

Ø  Strong debugging and problem solving skills with excellent understanding of system development methodologies, techniques and tools.

Ø  Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) in different application domain involving different technologies varying from object oriented technology to Internet programming on Windows NT, Linux and UNIX/ Solaris platforms and RUP methodologies.

Ø  Familiar with RDBMS concepts and worked on Oracle 8i/9i, SQL Server 7.0., DB2 8.x/7.x

Ø  Involved in writing shell scripts, Ant scripts for Unix OS for application deployments to production region.

Ø  Having very good POC and Development experience on Apache Flume, Kafka, Spark, Storm, and Scala.

Ø  Ability to articulate different complex Statistical Concepts and identify KEY INSIGHTS from DATA.

Ø  Familiar with Data Mining, machine learning and modeling.

Ø  Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.

Ø  Good working knowledge on Hadoop hue ecosystems.

Ø  Good knowledge in evaluating big data analytics libraries and use of Spark-SQL for data exploratory.

Technical Skills:

BigData Technologies / Hadoop, MapReduce, HDFS, Hive, Pig, Zookeeper, Sqoop, Oozie, Flume, IMPALA, HBASE, Kafka, Storm.
Big Data Frameworks / HDFS, YARN, Spark.
Hadoop Distributions / Cloudera (CDH3, CDH4, CDH5), Horton works, Amazon EMR, EC2.
Programming Languages / Java, shell scripting, Scala.
Databases / RDBMS, MySQL, Oracle11g/10g, Microsoft SQL Server, Teradata, DB2, PL/SQL, CASSANDRA, MongoDB.
IDE and Tools / Eclipse, NetBeans, Tableau.
Operating System / Windows, Linux/Unix.
Frameworks / Spring, Hibernate, JSF, EJB, JMS.
Scripting Languages / JSP & Servlets, JavaScript, XML, HTML, Python.
Application Servers / Apache Tomcat, Web Sphere, Web logic, JBoss.
Methodologies / Agile, SDLC, Waterfall.
Web Services / Restful, SOAP.
ETL Tools / Talend, Informatica.
Others / Solr, elastic search.

EDUCATION:

B. Tech from Jawaharlal Nehru Technological University.

PROFESSIONAL EXPERIENCE:

Cerner Corporation, Kansas City, Missouri. May’14 – Till Date

Sr. Hadoop Developer

Description:

Cerner Corporation (CERN), a supplier of health care information technology solutions, is differentiating itself in the industry and from its competitors with its population health.

Responsibilities:

Ø  Worked on analyzingHadoopcluster using different big data analytic tools including Flume, Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, Spark and Kafka.

Ø  Developed Spark code using scala and Spark-SQL/Streaming for faster testing and processing of data.

Ø  Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

Ø  Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.

Ø  Co-ordinated with the team that analyzed large data sets to provide strategic direction to the company.

Ø  Experienced with batch processing of data sources using Apache Spark, Elastic search.

Ø  Experience migrating MapReduce programs into Spark transformations using Spark and Scala.

Ø  Configured Sqoop and developed scripts to extract data from MySQL into HDFS.

Ø  Hands-on experience with production analyzing Hadoop applications viz. development, configuration management, monitoring, debugging and performance tuning.

Ø  Translated functional and technical requirements into detail programs running on Hadoop MapReduce and Spark.

Ø  Real time streaming the data using Spark with Kafka.

Ø  Worked on Amazon Web Services ( EC2, ELB, VPC, S3, CloudFront, IAM)

Ø  Development of software using core java with integration of Apache Storm, Apache Kafka.

Ø  Worked on analytics dashboards which process large amount of data.

Ø  Managing CDN on Amazon CloudFront (Origin Path: Server / S3) to improve site performance.

Ø  Experience with NoSQL databases like MongoDB and Elastic search.

Ø  Worked on various CMS systems like Joomla and WordPress.

Ø  Co-ordinated with a team that worked on concurrency collections and closer traits of AKKA for the processing of PDL files.

Ø  Implemented Event Sourcing with AKKA.

Ø  Integrated various state-of-the-art Big Data technologies into the overall Experiences in designing, reviewing and optimizing data transformation processes using Hadoop and Apache-Storm.

Ø  Created HBase tables to store various data formats of PII data coming from different portfolios.

Ø  Cluster co-ordination services through Zookeeper.

Ø  Has been part of high throughput messaging processing system development using Kafka & Spark.

Ø  Spark Streaming collects this data from Kafka in near-real-time and performs necessary.

Ø  Installed and configured Hive and also written Hive UDFs in java and python.

Ø  Helped with the sizing and performance tuning of the HBase cluster.

Ø  Involved in the process of HBase data modeling and building efficient data structures.

Ø  Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem.

Ø  Usage of different Hadoop Component like Hive, Pig, Spark.

Ø  Responsible for architecting Hadoop clusters.

Ø  Written shell scripts and Python scripts for automation of job.

Ø  Assist with the addition of Hadoop processing to the IT infrastructure

Ø  Perform data analysis using Hive and Pig.

Environment: Hadoop, HDFS, Spark, MapReduce, Pig, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Python, Java, SQL Scripting and Linux Shell Scripting, Cloudera, Cloudera Manager, EC2, EMR, S3, AWS

BNY Mellon, Uniondale, NY Oct’12 – May’14

Sr. Hadoop Developer

Description: BNY Mellon acquired lot of business over period and hence there were duplication of processes. There are 17 different file transfer servers to send reports, feeds from BNY Mellon to users and external systems. This project was aimed at building a single solution that reports and monitors the 17 systems.

Responsibilities:

Ø  Involved in review of functional and non-functional requirements.

Ø  Facilitated knowledge transfer sessions.

Ø  Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.

Ø  Develop MapReduce jobs for the users. Maintain, update and schedule the periodic jobs which range from updates on periodic MapReduce jobs to creating ad-hoc jobs for the business users.

Ø  Importing and exporting data into HDFS and Hive using Sqoop.

Ø  Experienced indefining jobflows.

Ø  Creating mapping from Source to Target in Talend.

Ø  Experienced in managing andreviewingHadooplog files.

Ø  Extracted files from Couch DB through Sqoop and placed in HDFS and processed.

Ø  Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.

Ø  Load and transform large sets of structured, semi structured and unstructured data.

Ø  Responsible to manage data coming from various sources.

Ø  Got pleasant experience with NOSQL database.

Ø  Supported Map Reduce Programs those are running on the cluster.

Ø  Involved in loading data from UNIX file system to HDFS.

Ø  Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.

Ø  Developed a custom File System plug in for Hadoop so it can access files on Data Platform.

Ø  This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.

Ø  Designed and implemented Mapreduce-based large-scale parallel relation-learning system

Ø  Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.

Ø  Setup and benchmarked Hadoop/HBase clusters for internal use.

Ø  Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.

Environment: Java 6, Eclipse, Oracle 10g, Sub Version, Hadoop, Hive, HBase, Linux, MapReduce, HDFS, Hive, Java (JDK 1.6),HadoopDistribution of Horton Works, Cloudera, MapReduce, DataStax, IBM DataStage 8.1, Oracle 11g / 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting.

Client: GTL Limited, India Jul’11 – Sep’12 Hadoop Developer

GTL, a Global Group Enterprise, is a leading Infrastructure Services company focused on telecom and Power. In the telecom segment the company provides Network Services to Telecom Operators, OEM’s and Tower Companies. In the power sector the company offers EPC services, Distribution Franchisee and Smart Grid solutions to Utilities and distribution companies.

Responsibilities:

Ø  Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.

Ø  Worked on Cassandra for the User behavior analysis and lightning speed execution

Ø  Developed mapping parameters and variables to support SQL override.

Ø  Used existing ETL standards to develop these mappings.

Ø  Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.

Ø  Importing and exporting data into HDFS and Hive using Sqoop.

Ø  Used UDF’s to implement business logic in Hadoop

Ø  Extracted files from Oracle and DB2through Sqoop and placed in HDFS and processed.

Ø  Load and transform large sets of structured, semi structured and unstructured data.

Ø  Responsible to manage data coming from different sources.

Ø  Supported Map-Reduce Programs those are running on the cluster.

Ø  Involved in loading data from UNIX file system to HDFS.

Ø  Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.

Ø  Worked on JVM performance tuning to improve Map-Reduce jobs performance

Environment: Hadoop, MapReduce, HDFS, Hive, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6.

VJ Soft tech, Hyderabad, India Sep’09 – Jun’11

Sr. Java Developer

The project is to design and implement web applications and develop concise & clear technical design documents based on analysis of business requirements.

Responsibilities:

Ø  Involved in designing Class and Sequence diagrams with UML and Data flow diagrams.

Ø  Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Models using Microsoft Visio.

Ø  Worked on server side implementation using Struts MVC framework.

Ø  Developed JSP’s with STRUTS custom tags and implemented JavaScript validation of data

Ø  Developed programs for accessing the database using JDBC thin driver to execute queries, prepared statements, Stored Procedures and to manipulate the data in the database.

Ø  Used JavaScript for validating the Front-End Web pages

Ø  Written SQL code blocks using cursors for shifting records from various tables based on checks

Ø  Written procedures and triggers for validating the consistency of metadata.

Ø  Used AJAX to make the Restful web service calls.

Ø  Developed Message Driven Beans for asynchronous processing of alerts.

Ø  Used IBM Clear case for source code control and JUNIT for unit testing.

Ø  Log4J used as logging framework.

Ø  Application versions were managed by SVN.

Ø  Followed coding and documentation standards

Environment: Java, JSP, Struts MVC, Oracle 10G, SQL, PL/SQL, JBOSS, JUnit, SQL Developer, Ajax, MAVEN, Eclipse, SVN, Log4j, REST.

Bravo Minds, Bangalore, India Jul’07 – Sep’09

Java Developer

Bravominds is Software Services Company, with operations in India. Since 2005, Bravominds has provided comprehensive IT solutions to clients around the world in public sector and private sector. Bravominds is focused on helping governments and enterprises leverage information technology to achieve business goals.

Responsibilities:

Ø  Used message driven beans for asynchronous processing alerts to the customer.

Ø  Used Struts framework to generate Forms and actions for validating the user request data.

Ø  Developed Server side validation checks using Struts validators and Java Script validations.

Ø  With JSP’s and Struts custom tags, developed and implemented validations of data.

Ø  Developed applications, which access the database with JDBC to execute queries, prepared statements, and procedures.

Ø  Developed programs to manipulate the data and perform CRUD operations on request to the database.

Ø  Worked on developing Use Cases, Class Diagrams, Sequence diagrams, and Data Models.

Ø  Developed and Deployed SOAP Based Web Services on Tomcat Server

Ø  Coding of SQL, PL/SQL, and Views using IBMDB2 for the database.

Ø  Working on issues while converting JAVA to AJAX.

Ø  Supported in developing business tier using the stateless session bean.

Ø  Extensively used JDBC to access the database objects.

Ø  Using Clear case for source code control and JUNIT testing tool for unit testing.

Ø  Reviewing the code and perform integrated module testing.

Environment: Java 5, J2EE 1.4, AJAX, Struts 1.0, Web Services, SOAP, HTML, XML, JSP, JDBC, ANT, XML, IBM, Tomcat, JUNIT, DB2, Rational Rose, Eclipse Helios, CVS.