HARSHA

Sr. Hadoop Developer

PROFESSIONAL SUMMARY:

  • Adept and experienced Hadoop developerwith over 9 years of experience in programming world and 5 years of proficiency in Hadoop ecosystem and Bigdatasystems
  • In-depth experience and solid subjective knowledge of HDFS, Map Reduce, Hive, Pig, Sqoop, Yarn/MRv2, Spark,Kafka, Impala, HBase and Oozie.
  • Currently working on Spark and Spark Streaming frameworks extensively using Scala as the main programming language.
  • Used Spark Data frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building.
  • Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
  • Has strong fundamental understanding of distributed computing and distributed storage concepts for highly scalable data engineering.
  • Worked with Pig and Hiveand developedcustom UDF’s for building various datasets.
  • Worked on MapReduce framework using Java programming language extensively.
  • Strong experience troubleshooting and performance fine-tuning spark, MapReduce and hive applications.
  • Worked with Click Stream Data extensively for creating various behavioral patterns of the visitors and allowing data science team to run various predictive models.
  • Worked on No-SQL data-stores, primarily HBase using the Java API of HBase and Hive Integration.
  • Extensively worked on data migrations from diversified databases intoHDFS and Hive using Sqoop.
  • Implemented DynamicPartitions and Buckets in HIVE for efficient data access.
  • Significant experience in working with cloud environment like AMAZONWEBSERVICES (AWS) EC2 and S3.
  • Strong expertise in Unixshellscript programming.
  • Expertise in creating Shell-Scripts, RegularExpression and CronAutomation.
  • Dexterous in visualizing data using Tableau, QlikView, MicroStrategy and MSExcel.
  • Exposure to Mesos, Zookeeper cluster environment for application deployments and dock containers
  • Knowledge on Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schemaandTeradata.
  • Highly proficient in Scala programming Knowledge
  • Experience with web technologies which include HTML, CSS, Java Script, Ajax, JSON and frameworks like J2EE, AngularJS, Spring.
  • Good Knowledge in RESTWebservices, SOAP programming, WSDL, XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
  • Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills.
  • Good experience in Customer support role as Training, resolving production issues based on priority.

TECHNICAL SKILLS:

Hadoop Ecosystems / HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, YARN, Oozie, Zookeeper, Impala, Spark, Spark SQL, Spark Streaming, Storm, HUE, SOLR
Languages / C, C++, Java, Scala, Python, Swift, C#, SQL, PL/SQL
Frameworks / J2EE, Spring, Hibernate, Angular JS
Web Technologies / HTML, CSS, Java script, jQuery, Ajax, XML, WSDL, SOAP, REST API
No-SQL / HBase, Cassandra, Mongo DB
Security / Kerberos, OAuth
Cluster Management and Monitoring / Coudera Manager, Hortonworks Ambari, Apache Mesos
Relational Databases / Oracle 11g, MySQL, SQL-Server, Teradata
Development Tools / Eclipse, NetBeans, Visual Studio, IntelliJ IDEA,XCode
Build Tools / ANT, Maven, sbt, Jenkins
Application Server / Tomcat 6.0, WebSphere7.0
Business Intelligence Tools / Tableau, Informatica, Splunk, Qlik View
Version Control / GitHub, Bit Bucket, SVN

EDUCATION DETAILS:

Bachelors of Engineering 2003 – 2007Osmania University

PROFESSIONAL EXPERIENCE:

Client : Wells Fargo Nov 2015 - Present

Location : St. Louis, MO

Role : Sr. Hadoop Developer

Project Description:Wells Fargo & company is an international banking and financial services holding company. Project involved analysis on huge datasets for company banking sector utilization and optimization. I am part of digital marketing group. We analyzed the data from all transactions to find out the customer and company profitable plan with services which assists in determining new strategic plans into market.

Responsibilities:

  • Ingested Click-Stream data from FTP servers and S3 buckets using custom Input Adaptors.
  • Designed and developed Spark jobs to enrich the click stream data.
  • Implemented Spark jobs using Scala, used SparkSQL to access hive tables into spark for faster processing of data.
  • Involved in performance tuning of Spark jobs using cache and using complete advantage of cluster environment.
  • Worked with Data-science team to gather requirements for data mining projects.
  • Developed Kafka Producer and Spark Streaming consumer for working with Live Click Stream feeds.
  • Worked on different file formats (PARQUET, TEXTFILE) and different compression codecs (GZIP, SNAPPY, LZO).
  • Written complex Hive queries involving external dynamicpartitioned on Hive tables which stores rolling window time-period user viewing history.
  • Worked with data science team to build various predictive models with Spark MLLIB.
  • Experience in troubleshooting various Spark applications using spark-shell, spark-submit.
  • Good experience in writing Map Reduce programs in Java on MRv2 / YARN environment.
  • Developed java code to generate, compare and merge Avro schema files.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs,Scala and Python.
  • Designed and developed External and Managed Hive Tables with data formats such as Text, Avro, Sequence File, RC, ORC, Parquet.
  • Implemented SparkRDD transformations, actions to migrate Map reduce algorithms.
  • Implemented Sqoop job to perform import / incrementalimport of data from any relational tables into Hadoop in different formats such as text, Avro and Sequence into Hive table.
  • Good hands on experience in writing HQL statements as per the requirement.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive table.
  • Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud AWSS3 and used Elastic Map Reduce(EMR) to run a Map-Reduce.
  • Responsible in analysis, design, testing phases and responsible for documenting technical specifications.
  • Coordinated effectively with offshore team and managed project deliverable on time.
  • Used Impala and Tableau to create various reporting dashboards.

Environment:Spark, Hive, Impala, Sqoop, HBase, Tableau, Scala, Talend, Eclipse, YARN, Oozie, Java, Cloudera Distro, Kerberos.

Client : USAA Aug 2014- Oct 2015

Location : San Antonio, TX

Role : Sr. Hadoop Developer

Project Description:United Services Automobile Association (USAA) is a Texas-based diversified financial services group of companies and Department of insurance regulated reciprocal inter-insurance exchange and subsidiaries offering banking, investing, insurance to people. The objective of the project is to migrate all the data warehousing data to Hadoop platform and perform ETL transformations.

Responsibilities:

  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Issued SQL queries via Impala to process the data stored in HBase and HDFS.
  • Used Spark – Cassandra Connector to load data to and from Cassandra.
  • Used Fast Load for loading into empty tables.
  • Wrote python scripts to parse XML documents and load the data in database.
  • Good Experience with Amazon AWS for accessing Hadoop cluster components.
  • Experienced in transferring data from different data sources into HDFS systems using Kafkaproducers, consumers, and Kafkabrokers.
  • Implemented modules using Core Java APIs, Java collection and integrating the modules.
  • Loading data from different source (database & files) into Hive using Talend Tool.
  • Used Oozie and Zookeeper operational services for coordinating cluster and schedulingworkflows
  • Knowledge in developing Nifi flow prototype for data ingestion in HDFS.
  • Created and maintained Technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Developed CustomInput Formats in MapReduce jobs to handle custom file formats and to convert them into key-value pairs.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written custom writable classes for Hadoop Serialization and De-serialization of time series Tuples.
  • Developed Sqoop import Scripts for importing reference data from Netezza.
  • Used Shell scripting for Jenkins job automation with Talend.
  • Created Hive externaltables on the map reduce output before partitioning, bucketing is applied on top of it.
  • Comprehensive Knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, Scrum data manipulation.
  • Worked with Data Governance team to ensure metadata management and best practices
  • Implemented Daily Cron – jobs that automate parallel tasks of loading the data into HDFS and pre -processing with Pig using Oozie coordinator jobs.
  • Cluster coordination services through Zookeeper.
  • Worked with BI teams in generating reports and designing ETL workflows on Tableau.

Environment: Apache Hadoop, Hive, Scala, PIG, HDFS, Cloudera, Java Map-Reduce, Maven, GIT, Jenkins, Eclipse, Oozie,Sqoop, Flume, SOLR, Nifi, OAuth, Teradata, FastLoad, Multi Load, Netezza, Zookeeper.

Client : Cerner Corporation Aug 2013 – July 2014

Location : Kansas City, MO

Role : Hadoop Developer

Project Description:Cerner Corporation is an American supplier of health information technology (HIT) solutions, services, devices and hardware. Cerner as the world’s largest health informatics properties mediating petabyte-scale health data.

Responsibilities:

  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Involved in planning and implementation of an additional 10 node Hadoop clusters for data warehousing, historical data storage in HBase and sampling reports.
  • Used Sqoop extensively to import data from RDBMS sources into HDFS.
  • Performed Data transformations, cleaning, and filtering on imported data using Hive, MapReduce, and loaded final data into HDFS.
  • Developed PigUDF’s to pre-process data for analysis.
  • Worked with business teams and created Hive queries for ad hoc process.
  • Responsible for creating Hive tables, partitions, loading data and writing Hive queries.
  • Created PigLatinscripts to sort, group, join, filter the enterprise wise data.
  • Worked on Oozie to automate job flows.
  • Handled Avro and JSON data in Hive using HiveSerDe
  • Integrated Elastic search and implemented dynamic faceted search
  • Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked in Agile Environment.
  • Communicated effectively and made sure that business problem is solved.
  • Creating files and tuned the SQLqueries in Hive utilizing HUE.
  • Created the Hive external tables using Accumulo connector.
  • Generated summary reports utilizing Hive and Pig and exported these results via Sqoop for Businessreporting and intelligenceanalysis.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Java, Eclipse, SQL Server, Apache Flume, Shell Scripting, Zookeeper.

Client : Acxiom Nov2012 – July 2013

Location : Conway, AR

Role : Hadoop Developer

Project Description:Acxiom Corporation is a marketing technology and information management services, including multichannel marketing, addressable advertising and database management. The company wanted to retire the legacy SQL server database due to increasing customer base and growing data.

Responsibilities:

  • Developed complex MapReduce jobs in Java to perform data extraction, aggregation, transformation and performed rule checks on multiple file formats like XML, JSON, CSV.
  • Implemented schedulers on the Job tracker to share resources of the cluster for the MapReduce jobs given by cluster.
  • Used Sqoop to import and export the data from HDFS.
  • Moved data from HDFS to Cassandra using MapReduce and BulkOutputFormat class.
  • Participated with the admin team in designing and migrating the cluster from CDH to HDP.
  • Developed some helper class for abstracting Cassandra cluster connection act as core toolkit.
  • Involved in Agile methodologies, dailyScrummeetings, Sprint planning.
  • Wrote Query Mappers and MQ Experience in Junit test Cases.
  • Created dashboards in Tableau to create meaningful metrics for decision making.
  • Involved in designing the next generation data architecture for the unstructured and semi structured data.
  • Worked with team which analyzes system failures, identifying the root cause and taking necessary action

Environment: HDFS, MapReduce, Cassandra, Pig, Hive, Sqoop, Maven, Log4j, Junit, Tableau

Client : Polaris Jan 2009 – Oct 2012

Location : Chennai, India

Role : Java Developer

Project Description:Polaris is a financial technologyproduct, legacy modernization services and consulting for core banking, corporate banking, and wealth and asset management insurance. This is an FPX Maintenance systems for handling all internal process like Users, Role based access, Merchants along with their modilewise charges and all reporting.

Responsibilities:

  • Involved in client meetings to gather the Systemrequirements.
  • Generated Use case, class and sequence diagrams using RationalRose.
  • Written JavaScript, HTML, CSS, Servlets and JSP for designing GUI of the application.
  • Strong hands-on knowledge of Core JAVA, Web-based Applications and OOPS concepts.
  • Developed the application using Agile/Scrum methodology which involves daily stand ups.
  • Developed Server-Side technologies using Spring, Hibernate, Servlets/JSP, Multi-threading.
  • Extensively worked with the retrieval and manipulation of data from the Oracle database by writing queries using SQL and PL/SQL.
  • Implemented Persistence layer using Hibernate that uses the POJO’s to represent the persistence database.
  • Used JDBC to connect the J2EE server with the relational database.
  • Involved on development of RESTFUL web services using JAX-RS in spring based project.
  • Web application development by setting up an environment, configuring an application and Web Logic Application Server.
  • Implemented back-end service using springannotations to retrieve user data information from the database.
  • Involved in writing AJAX scripts for the requests to process quickly.
  • Used Dependencyinjection feature and AOP features of Spring
  • Used and implemented Unit test cases using Junit Framework.
  • Issue findings from the production system and providing the information to the app support team.
  • Involved in the Bug Traise meetings in QA, UAT teams.

Environment:Spring, Hibernate, CSS, AJAX, HTML, Java Script, Rational Rose, UML, Junit, Servlets, JDBC, RESTful API, JSF, JSP, Oracle, SQL, PL/SQL.

Client : Ideas April 2008– Dec 2008

Location : Bangalore, India

Role : Java Developer

Project Description:Ideas is a commercial credit and invoice management hub providing cash flow visibility to both businesses and lenders.

Responsibilities:

  • Involved in various SDLC phases like Requirements gathering and analysis, Design, Development and Testing.
  • Developed the business methods as per the IBM Rational Rose UML Model.
  • Extensively used Core Java, Servlets, JSP and XML.
  • Used HQL, NativeSQL and Criteria programming to retrieve from the database.
  • Understanding New Crs and Service requests and giving Development Estimation Time and designing the database according to the business requirement.
  • Writing client side and server side validations.
  • Writing JSP, Spring Controllers, DAO, Serviceclasses and writing business logic, CRUD screens
  • Used AJAX for faster interactive front-end.
  • Designed and implemented the architecture for the project using OOAD, UML design patterns.
  • Worked with the Testing team in creating new test cases and created the use cases for the module before the testing phase.
  • Provide support to resolve performance testing issues, profiling and cachemechanism.
  • Developed DAO classes using Hibernate framework for persistence management and involved in integrating the frameworks for the project.
  • Worked with Rational Application Developer as development Environment.
  • Designed error logging flow and error handling flow.
  • Used Apachelog4j Logging framework for logging.
  • Followed Scrum development cycle for streamline processing with iterative and incremental development.
  • Perform code reviews to ensure consistency to style standards and code quality

Environment: Java, Spring MVC, Hibernate, Oracle, Java Script, jQuery, AJAX, Rational Application Developer (RAD), log4j, HTML, CSS.