Sr. Big Data/Hadoop Developer

Sreekanth

Sr. Big Data/Hadoop Developer

Summary:

Above 9+ years of professional IT experience including design and development of object oriented, web based enterprise applications Hadoop/Big data Ecosystem.
Excellent knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node,DataNode and Map Reduce programming paradigm.
Hands on experience with Hadoop ecosystem components like Hadoop Map Reduce, Impala, HDFS, Hive, Pig, HBase, Flume, Storm, Sqoop, Oozie, Kafka, Spark, and Zookeeper.
Experience in developing applications using waterfall, RAD and Agile (XP and Scrum), Test Driven methodologies and good understanding of Service orientation architecture.
Experience with running Hadoopstreaming jobs to process terabytes of xml formatdatausing Flume and Kafka.
Have deep knowledge and experience with the Cassandra, related software tools, and performance optimization
Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
Excellent understanding and hands on experience using NOSQL databases like Cassandra, Mongo DB and HBase.
Experience with statistical software, including SAS, MATLAB, and R
Experienced in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
Experience in deploying, configuring and administering application servers such as IBM Web sphere, BEA Web logic server, Jboss and ApacheTomcat.
Extensive knowledge in creating PL/SQL Stored Procedures, Packages, Functions, Cursors against Oracle (12c/11g) and MYSQL server.
Experienced in preparing and executing UnitTest Plan and Unit Test Cases using JUnit, MRUnit.
Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Textfiles and XML files.
Experience in developing inter-operable Web Services and its related technologies like SOAP, WSDL, UDDI, XML related technologies/tools such as JAXB,JAXP, ExtJS, XSL, XQuery, Xpath with
Good understanding of JAX-WS, JAX-RS, ETL, JAX-RPC inter-operable issues.
Experienced working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
Experienced with build tools like Maven,Ant and CItools like Jenkins.
Hands on experience in advanced Big-Data technologies like Spark Ecosystem (Spark SQL, MLlib, SparkR and Spark Streaming), Kafka and Predictive analytics (MLlib, R ML packages including Oxdata’s ML library H2O).
Experienced with Hadoop and QA to develop test plans, test scripts and test environments and to understand and resolve defects.
Experienced in Database development, ETL and Reporting tools using SQLServerDTS, SQL, SSIS, SSRS, CrystalXISAPBO.
Experience in J2EE, JDBC, Servlets, Struts, Hibernate, Ajax, JavaScript, JQuery, CSS, XML and HTML.
Experience in using IDEs like Eclipse, VisualStudio and experience in DBMS like SQLServer and MYSQL.
Experience in importing and exportingdatausing Sqoop from HDFS to RelationalDatabaseSystems and vice-versa.
Good experience in handling different file formats like text files, Sequence files and ORCdatafiles using different SerDe's in Hive.
Experience in optimization of Mapreduces algorithm using combiners and partitioners to deliver the best results.

Technical Skills:

Big data/Hadoop: HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka and Spark
NoSQL Databases: HBase, MongoDB & Cassandra
Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC,
JAX- WS
Programming Languages: Java , Python, SQL, PL/SQL, AWS, HiveQL, Unix Shell Scripting, Scala
Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)
Database: Oracle 12c/11g, MYSQL , SQL Server 2016/2014
Web/ Application Servers: WebLogic, Tomcat, JBoss
Web Technologies: HTML5, CSS3, XML, JavaScript, JQuery, AJAX, WSDL, SOAP
Tools and IDE: Eclipse, NetBeans, Maven, DB Visualizer, Visual Studio 2008, SQL Server Management Studio

Nike, Seattle, WAAug 16 -Present

Sr. Big Data/Hadoop Engineer

Responsibilities:

Worked withHadoopEcosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with ClouderaHadoopdistribution.
Worked onDataSerialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
Developed PIG and Hive UDF's in java for extended use of PIG and Hive.
Written PigScripts for sorting, joining, filtering and grouping thedata.
Created Hivetables, loadeddataand wrote Hivequeries that helped market analysts spot emerging trends by comparing freshdatawith EDW reference tables and historical metrics.
Implemented Partitioning, DynamicPartitioning and Bucketing in Hive.
Created a Hiveaggregator to update the Hive table after running thedataprofiling job.
Issued SQLqueries via Impala to process thedatastored in HDFS and HBase.
Involved in developing Impalascripts for extraction, transformation, loading ofdatain todatawarehouse.
Exported the analyzeddatato the databases such as Teradata, MySQL and Oracle using Sqoop for visualization and to generate reports for the BI team.
Collecting and aggregating large amounts of logdatausing ApacheFlume and stagingdatain HDFS for further analysis.
Created HBasetables to load large sets of structured, semi-structured and unstructureddatacoming from UNIX, NoSQL and a variety of portfolios.
Used Cassandra to store the analyzed and processeddatafor scalability.
Enabled speedy reviews and first mover advantages by using Oozie to automatedataloading into the HDFS and to run multiple Hive and Pig jobs.
Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
Managed and reviewedHadooplog files.
Developed ETL workflow which pushes web server logs to an Amazon S3 bucket.
Used AWSDataPipeline to schedule an AmazonEMRcluster to clean and process web server logs stored in Amazon S3 bucket.
Analyzed large amounts ofdatasets to determine optimal way to aggregate and report on thesedatasets.
Performed datavalidation and transformation using Python and Hadoopstreaming.
Involved in theBigDatarequirements review meetings and partnered with business analysts to clarify any specific scenarios.
Automated workflows using shell scripts and Control-M jobs to pull data from various databases into Hadoop Data Lake.
Involved in daily meetings to discuss the development/progress and was active in making meetings more productive.
Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
Involved in Requirement gathering, Create Test Plan, Constructed and executed positive/negative test cases in-order to prompt and arrest all bugs within QA environment.

Environment: Hadoop, MapReduce, Flume, Impala, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Zookeeper, Cassandra, Teradata, MySQL, Oracle, Scala, JAVA, UNIX Shell Scripting, AWS.

Charter Communications, St. Louis, MOJun 15–Jul 16

Sr. Big Data/HadoopEngineer

Responsibilities:

Responsible for building scalable distributeddatasolutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
Wrote the Sparkcode in Scala to connect to Hbase and read/writedatato the HBasetable.
Extracteddatafrom different databases and to copy into HDFS using Sqoop and have an expertise in using compression techniques to optimize thedatastorage.
Developed the technical strategy of using ApacheSpark on Apache Mesos as a next generation, BigDataand "FastData" (Streaming) platform.
Implemented ETLcode to loaddatafrom multiple sources into HDFS using pigscripts.
Implemented Flume, Sparkframework for real timedataprocessing.
Developed simple to complex Map Reduce jobs using Hive and Pig for analyzing thedata.
Used different Serdes for converting JSONdatainto pipe separateddata.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
Configured and optimized the Cassandra cluster and developed real-time java based application to work along with the Cassandra database.
Developed big data ingestion framework to process multi TB data including data quality checks, transformation, and stored as efficient storage formats like parquet and loaded into Amazon S3 using SparkScalaAPI and Spark.
Worked on cloud computing infrastructure (e.g. Amazon Web Services EC2) and considerations for scalable, distributed systems
Created the SparkStreaming code to take the source files as input.
Used Oozieworkflow to automate all the jobs.
Exported the analyzeddatainto relational databases using Sqoop for visualization and to generate reports for the BI team.
Developed sparkprograms using Scala, Involved in creating SparkSQLQueries and Developed Oozie workflow for sparkjobs
Built analytics for structured and unstructureddataand managing largedataingestion by using Avro, Flume, Thrift, Kafka and Sqoop.
Developed Pig UDF's to know the customer behavior and Pig Latin scripts for processing the data in Hadoop.
Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing the data with Pig and Hive.
Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, ApacheSpark and ApacheStorm etc.
Ingested streamingdatainto Hadoop using Spark, StormFramework and Scala.
Copied the data from HDFS to MongoDB using pig/Hive/Map reduce scripts and visualized the streaming processed data in Tableau dashboard.
Exported the patterns analyzed back to Teradata using Sqoop.
Continuously monitored and managed the HadoopCluster using ClouderaManager.

Environment: Hadoop, Map Reducer, Cloudera Manager, HDFS, Hive, Pig, Spark, Storm, Flume, Thrift, Kafka, Sqoop, Oozie, Impala, SQL, Scala, Java (JDK 1.6), Hadoop (Cloudera), AWS S3, Tableau, Eclipse

Apria Healthcare - Keene, NH Apr 13– May 15

Sr. Java/Hadoop Developer

Responsibilities:

Gathered the business requirements from the Business Partners and Subject Matter Experts.
Supported HBase Architecture Design with the HadoopArchitect team to develop a Database Design in HDFS.
Supported Map Reduce Programs those are running on the cluster and also Wrote MapReduce jobs using JavaAPI.
Involved in HDFS maintenance and loading of structured and unstructureddata.
Importeddatafrom mainframe dataset to HDFS using Sqoop. Also handled importing ofdatafrom variousdatasources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
Created the Mock-ups using HTML and JavaScript to understand the flow of the web application
Integration of Cassandra with Talend and automation of jobs.
Used Strutsframework to develop the MVC architecture and modularized the application
Wrote Hivequeries fordataanalysis to meet the business requirements.
Involved in managing and reviewing Hadooplogfiles.
Developed Scripts and Batch Job to schedule various Hadoop Program.
Utilized AgileScrumMethodology to help manage and organizewith developers and regular code review sessions.
Upgraded the Hadoop Cluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
Analyzed thedataby performing Hive queries and running Pigscripts to know user behavior.
Continuous monitored and managed the Hadoopcluster through ClouderaManager.
Developed Hivequeries to process thedataand generate thedatacubes for visualizing.
Optimized the mappings using various optimization techniques and also debugged some existing mappings using the Debugger to test and fix the mappings.
Used SVNversioncontrol to maintain the different version of the application
Updated maps, sessions and workflows as a part of ETL change and also modified existing ETLCode and document the changes.
Involved in coding, maintaining, and administering EJB, Servlets, and JSP components to be deployed on a Web LogicServer.
Installed Oozieworkflow engine to run multiple Hive and Pigjobs.
Extracted meaningful data from unstructured data on HadoopEcosystem.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required

Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, DB2, SQL Server, Oracle 11g, MYSQL, Web Logic Application Server 8.1, EJB 2.0, Struts 1.1

Wells Fargo - St. Louis, MONov 11 – Mar 13

Sr. Java/J2EE Developer

Responsibilities:

Developed high-level design documents, Use case documents, detailed design documents and Unit Test Plan documents and created Use Cases, ClassDiagrams and SequenceDiagrams using UML.
Extensive involvement in database design, development, coding of stored Procedures, DDLDML statements, functions and triggers.
Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
Developed Portlet kind of user experience using Ajax, jQuery.
Used spring IOC for creating the beans to be injected at the run time.
Involved in Use Case Realization, Use Case Diagrams, Sequence Diagrams and Class Diagram for various modules.
Involved in writing ANTScripts for building the web application.
Used SVN for versioncontrol of the code and configuration files.
Created POJO layer to facilitate the sharing of data between the front end and the J2EE business objects
Used server side Springframework and Hibernate for Object Relational Mapping of the database structure created in Oracle.
Used Oraclecoherence for real-time cache updates, live event processing, and in-memory grid computations.
Used Apache Tomcat Application Server for application deployment in the clustered window environment
Developed Web services by using RestletAPI and a Restlet implementation as a Restful framework
Created JUnit test suites with related test cases (includes set up and tear down) for unit testing application.
Implemented Message Driven beans to develop the asynchronous mechanism to invoke the provisioning system when a new service request saved in the database used JSM for this.
Transformed XML documents using XSL.
Used JavaScript for client while server validation through Expression Language
Created PL/SQL Stored Procedure Functions for the Database Layer by studying the required business objects and validating them with Stored Procedures using Oracle , also used JPA with Hibernate provider
Built a custom cross-platform architecture usingJava, Spring Core/MVC, Hibernate through EclipseIDE
Involved in writing PL/SQL for the stored procedures.
Designed UI screens using JSP, Struts tags, HTML, jQuery.
Used JavaScript for client side validation.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, HBase,Java, Cloudera Linux, XML, MYSQL Workbench,Java6, Eclipse, Cassandra.

Glansa Solutions – Hyderabad, INJul 07 – Oct 11

Java Developer

Responsibilities:

Developed EntityJavaBeans (EJB) classes to implement various business functionalities (session beans).
Developed various end users screens using JSF, Servlet technologies and UI technologies like HTML, CSS and JavaScript.
Performed necessary validations of each screen developed by using AngularJS and JQuery.
Configured spring configuration file to make use of Dispatcher Servlet provided by SpringIOC.
Separated secondary functionality from primary functionality using SpringAOP.
Developed a Stored Procedures for regular cleaning of database.
Prepared test cases and provided support to QAteam in UAT.
Consumed Web Service for transferring data between different applications using RESTful APIs along with Jersey API and JAX-RS.
Built the application using TDD (Test Driven Development) approach and involved in different phases of testing like UnitTesting.
Responsible for fixing bugs based on the test results.
Involved in SQLstatements, stored procedures, handled SQL Injections and persisted data using Hibernate Sessions, Transactions and Session Factory Objects.
Responsible for HibernateConfiguration and integrated Hibernateframework.
Analyzed and fixed the bugs reported in QTP and effectively delivered the bug fixes reported with a quick turnaround time.
Extensively usedJavaCollections API like Lists, Sets and Maps.
Use PVCS for version control and deploy the application in JBOSSserver.
Used Jenkins to deploy the application in testing environment.
Involved in Unittesting of the application using JUnit.
Used for SharePoint for collaborative work.
Involved in configuring JMS and JNDI in rational applicationdeveloper(RAD).
Implemented Log4j to maintain system log.
Used SpringRepository to load data from oracledatabase to implement DAOlayer.

Environment: JDK1.5, EJB, JSF, Servlets, Html, CSS, JavaScript, AngularJS, JQuery, Spring IOC & AOP, REST, Jersey, JAX-RS, JBOSS, JUnit, Log4J, JMS, JNDI, SharePoint, RAD, JMS API.