Big Data (Hadoop/Spark)

SANTOSH MOTUKURI

Big Data (Hadoop/Spark)

Summary of experience:

Ø  Over 15 years of Experience in all phases of SDLC (Analysis, Design, Development, Testing and Implementation) on BigDate and other Data warehouse applications in Servicing/ ISP/ Financial/ Manufacturing industries. Including over 3 years of extensive experience in Big Data/Hadoop and Analytic Technologies with expertise in Hive, Pig, Sqoop, Spark, NOSQL DB, Flume.

Ø  Capable of processing large sets of structured and semi-structured data also data in JSON, parquet.

Ø  Experience in developing strategic methods for deploying Big data/Data warehouse technologies to efficiently solve Big Data/Data wearehouse processing requirement.

Ø  Hands on experience in using Hadoop Eco-System like HIVE, SQOOP, PIG, Flume, Phoenix.

Ø  Exposer on NOSQL Databases like (Hbase, cassandra, mongo DB).

Ø  Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python. processing these into IMDB using Spark, Spark-SQL.

Ø  Configured 20 nodes Hadoop cluster on Linux Environment and setting up Pig, Hive, Sqoop, Zookeeper, Python, Scala, Spark, HBASE, Phoinix.

Ø  Implemented Data Integration capability from different data sources (Relational DB and NoSQL DB) of different data types/format to the HDFS using Sqoop, Flume and Spark Streaming.

Ø  Performed transformations using Hive, Pig, spark on Imported Data

Ø  Experienced working with ETL Ab-initio, Oracle, Sybase, Redbrick, SAS, Stored Procedures, Shell, C

Ø  Extensive experience in developing UNIX Shell scripts, Ab Initio Graphs, SQL and PL/SQL (Coding Procedures, Functions and Database Packages).

Ø  Proficient in ETL Processes using Oracle PL/SQL, Unix Scripts and SQL*Loader for data migration to Enterprise Data Warehouse with large data volume.

Ø  Experienced in developing / tuning complex SQL queries, PL/SQL blocks stored packages, procedures, functions, Partitions, triggers, and views.

Ø  Experience in importing and exporting data utilizing Sqoop from HDFS to RDBMS.

Ø  Worked on different Data migration projects using SQL Loader, Ab-initio, Sqooop

Ø  Hands-on experience working Agile Environments.

Ø  Good leadership qualities to lead a team with ability to complete the task on time and a personal commitment to deliver.

Ø  Involved in Design, Development as well as support, maintenance or enhancement of existing applications.

Technical Experience:

Bigdata Skills : Spark, Spark-SQL, Spark-Sreatming, Hadoop, HDFS,

MapReduce, YARN, Zookeeper, Hive, Hbase, Pig,

Sqoop, Flume, Kafka, Oozie

Databases : Oracle 7.x/8.0/8i/9i/10i/11G, Mysql, Sybase 10.x/11.x, Redbricks

NoSql DataBases : Hbase, Cassandra, MongoDB

Scripting Languages : Unix Shell scripting, Korn-shell

Languages : Python, Scala, C, COBOL, Korn-shell, R, java, pro-c

ETL : Ab Initio GDE 3.15,1.x, .2.x Co->Op 3, 2.8.x, 2.x, 2.13, 2.14

Client/Server : PowerBuilder 5.x/6.0/6.5/7.x, Developer 2000, Forms4.5/5

OS : Windows NT/95/98/2000,HP UNIX 10.x//11i, Linux, AIX, Solaris

Other : TOAD, Rapid SQL, SQL Navigator, PL-SQL, SQL, SQL-Loader

Professional Achievements and Certification:

Ø  Certified PowerBuilder Developer (C.P.D.).

Ø  Certified UNIX Korn Shell Script programming by Brain Bench.

Professional Experience

Quantum Vision LLC / May 2012 to till date
Princeton Information Ltd / Mar 2011 to Dec 2011
Pinnacle Software Inc, MD, USA / Nov 1998 to Mar 2011
Americus Global Software, GA, USA / Oct 1997 to Nov 1998
Nevada enterprises, India / Feb 1997 to Oct 1997
Sonata, India / Aug 1996 to Feb 1997
Vijaya Industrial Gas Pvt Ltd Aluminum India / Mar 1991 to Aug 1996

Sr. Software Engineer, NPD New York, Jul 2012 – Dec 2016

NPD's expert industry analysis and advisory services help retailers and manufacturers identify market trends to make smarter business decisions

Data Ingestion and Integration from various sources like RDBMS and SDL files into Hortonworks Hadoop environment. Data cleansing and business transformations are implanted in Hadoop ecosystem and ETL code. Involved in ETL Code, korn-shall scripting, promoting code to higher environment for delivering a successful data. Installation and upgrade of AbInitio.

Responsibilities:

·  Promoting code to higher environments.

·  Wrote Stored Procedures / Functions / Packages using PL/SQL in Oracle.

·  Developed and maintaining Ab Initio graphs for cleansing and business transformations output of this graph written to HDFS and Local file system.

·  Wrote wrapper scripts for implementing utilities like executing Hive, Pig Scripts, backup, refreshing environment.

·  worked on script to back-up and restore from Hadoop environment to Linux vice versa.

·  Developed script to read structured, semi structured data using spark.

·  Data transferred from RDBMS to HDFS, Hive tables using Sqoop.

·  Data Cleaning using Spark, Hive.

·  Using spark data read from external files, cleanse data, applied business transformations then loaded to hive tables.

·  Maintained version of the code in EME

Environment: ab initio version 3.2.2.10, Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production, PL/SQL, TOAD/SQL Developer, Shell Scripting, Linux-x86-gcc4p64, Hadoop, Hive, Pig, Sqoop, Scala, Spark, Python, Hbase

Sr. Software Engineer, Citi Bank, Delaware, Mar 2011 – Dec 2011

The project involved designing, developing and writing ETL code for delivering a successful data Global Receivable Marts Warehouses at Citi. It involved development of the Data warehouse application using Ab-Initio, SQL, PL/SQL, Autosys , UNIX. And maintenance of existing marts which were developed PL/SQL, Pro*c:

§  LATAM, Aisa Pacific (Hongkong, Singapoor), CR192, TD7377

Responsibilities:

·  Tune SQL statements, create and maintain/modify PL/SQL packages, mentor others with the creation of complex SQL statements, perform data modeling and create/maintain and data migration scripts.

·  Optimized and Tuned SQL queries and PL/SQL blocks to eliminate Full Table scans to reduce Disk I/O and Sorts

·  Responsible for maturing the data quality processes and technology of the data warehouses.

·  Responsible for ensuring the end to end operations on data warehouses

Environment: Ab-Initio 1.15/1.14, Co->Op: 3/2.14, Oracle 10g/9i, PL/SQL, TOAD/SQL Developer, Shell Scripting, Windows 2000, UNIX (Sun Solaris/AIX), Autosys

Senior ETL Developer, Fannie Mae, Dec 2005 – Mar 2011

The project involved designing, developing and writing ETL code for delivering a successful data warehouse for Securities Cost Basis and Subledger work stream at Fannie Mae. It involved development of the Data warehouse application using Ab-Initio, SQL, PL/SQL, Autosys and UNIX. Worked on below work streams:

Responsibilities:

·  ETL (Extraction, Transformation and Load) architect for world-class data warehouse.

Environment: Ab-Initio 1.13.20/1.14.39, Co->Op: 2.13/2.14,Oracle 10g/9i, PL/SQL, Toad, Shell Scripting, Windows 2000, UNIX (Sun Solaris), Autosys

ETL Team Lead/Senior Business Anlayst/Developer, AOL Time Warner, Jan 2004 – Dec 2005

1. Implementation of Financial warehouse (Billing Revenue Repository)

BRR is a SOX (Sarbanes-Oxley) compliant financial data warehouse system. This provides finance a unified view of transactions merged from various AOL billing system. This is auditable and validated billing repository used for finance revenue and non-revenue reporting. There are audit controls, validation process and extensive error handling mechanism. Since this is being used by finance, we have robust recycle mechanism of rectified erroneous records.

·  Act as Sr. developer lead for Business users, Technology Management, and Development team

Environment/Tools: Ab-initio GDE 1.10.5/1.13.1, Co->Op 2.11.8/2.13.1, UNIX korn Shell, RedBrick, Oracle 9i, Visio, HPUX11i-PARISC-n32.

2. Implementation of Financial Revenue report:

Sr. Business Analyst

Revenue report is one the complex report for finance. Revenue report categorizes revenue in terms of A/R and Deferred based on earning/reporting period. Revenue is further detailed in different buckets like BE, BU, UE, UU, PE, PU, BP, PP, and UP. Complex calculation is involved in deriving these buckets. This report is then fed to ledger system to derive Balance sheet and P&L (Profit and Loss). Validation piece has been implemented in order to verify the input and output of the report. Revenue report is run once in a month and process the data in order of 120-150 millions (i.e. 30-40GB) records. Additional reports like Billed and Not-billed reports were developed for finance.

·  Involved in Design and development of SOX compliant revenue report.

·  AB_LOCAL expression was used to parallelize unloading of data from database table.

·  Ab-initio components were extensively used for complex calculation of Revenue Amount and Buckets. (Earned and Unearned revenue).

·  Write Multi-file component was used to generate the files dynamically.

·  responsibilities from Reqt. Analysis, Data Analysis, Design, Development, Testing

Environment/Tools: Ab-initio GDE 1.13.1, Co->Op 2.11.8/2.13.1, Unix Shell, RedBrick, Oracle 9i, Visio, HPUX11i-PARISC-n32.

3. Implementation of A/R system (Subscription/Write-off/Payments Modules):

AR system to support subscription based billing of all Narrowband, Broadband, and Premium Services, partial payments tracking, line item level and payment method level balance tracking. This will create a single centralized account balance management and payment processing system.

Environment/Tools: Ab-initio GDE 1.13.1, Co->Op 2.13.1, Shell, Oracle 9i, PL/SQL, Visio, Linux-x86-g32.

Senior ETL Developer, Bank One, Sep 2002 – Jan 2004

Implementation of Business Cards Rehosting, Non Monetary, Automatic scoring engine, and Portfolio offer engine.

I was involved in multiple projects as mentioned above. I was responsible for writing graph to transfer data OLTP to OLAP. Processing files come FDR and loading into Data warehouse.

·  Involved in data conversion from COBOL to Oracle. Used cobol-to-dml utility to convert COBOL data structure to abinitio dml.

Environment/Tools: AbInitio GDE 1.10.9.1, 1.11.12, Co>Operating System 2.10.1, AIX UNIX, Oracle 8i, 9i, SAS

Senior Developer/Tech Lead, U.S. Department of Small Business Administration, Apr 2001 – May 2002

Implementation of EDWSBA (Enterprise Data Warehouse for Small Business Admin)

EDWSBA(Enterprise Data Warehouse for Small Business Administration ) deals with small business certifications (SDB) and surety bond guaranty (SBG). The application deals with 8 databases.

·  Involved in Reqt. Analysis, design and development of the system.

Environment/Tools: Oracle 8i, SQL *Loader, PL/SQL, SQL, ERWIN 5.2, PowerBuilder, Shell Scripting, Sybase SQL Server 11, SQL Anywhere, Windows NT/2000, HPUNIX 10.x

Senior Software Engineer/Tech Lead, American Standard, Jan 1999 – Mar 2001

Implementation of SFCS Data warehouse

The SFCS (Shop Floor Control System) Data Warehouse integrates Data from the Order Entry Data from SAP and P&I database. Number of decision making reports and management reports prepared out of the SFCS Data Warehouse. Data from this SFCS DW is used enterprise-wide for all other down stream applications like Order Status Application, Track and measure production process application for the company worldwide for manufacturing chinaware in their plants. Reports prepared provide information needed to identify and improve manufacturing efficiency that will in turn reduce costs.

·  Involved in Understanding the requirements of Business analysts and Corporate Managers. Involved in the preparation of the baseline Documentation for the Field mapping between different source databases and Target database (Data warehouse).

Environment/Tools: Oracle 7.3/8i, SQL *Loader, PL/SQL, SQL, ERWIN 5.2, PowerBuilder, Shell Scripting, Windows NT/95/98, HPUNIX 10.x

Systems Analyst/Developer, Nov 1998 – Jan 1999

Implementation Paper inventory Mgmt and Reporting system

This application would report the details about Paper Manufacturers/Retailers, paper types, and their inventory levels tied to the different types of Magazines published.

·  Involved in the Requirement gathering, analysis, Design and development of required reports for the system.

·  Developed stored procedures, function, triggers and packages in Oracle PL/SQL.

Environment/Tools: Oracle 7.3, Developer 2000, Forms 4.5, PL/SQL, SQL, Windows NT/95, and UNIX SERVER.

Software Developer, U.S. Chamber of Commerce, Dec 1997 – Oct 1998

Implementation of ESCS Data Warehouse

The ESCS is a repository of financial transactions from legacy system.

·  Involved in Analysis, Mapping and Data Modeling using the Erwin tool.

·  Involved in Extraction, Transformation and Loading (ETL) processes

·  Created number of Oracle stored procedures, functions and Packages required in the ETL process.

·  Used extensively PL/SQL, SQL* LOADER and Unix Shell scripting required for the ETL process.

·  Created required reports (Circulation, Revenue Receivable, Payable reports Etc.,)

Environment/Tools: Oracle Applications Version 10SC, Oracle 7.3, SQL*PLUS, PL/SQL, SQL, SQL *Loader, Developer-2000, Oracle Reports 2.5, Windows 95/NT Workstations, UNIX Server, UNIX Shell Script.

Lead/Developer, Vijaya Industrial Gas, Jan 1991 – Sep 1997Sales and Stock Information System.

The Sales and Stock Information System is aimed to facilitate users to keep track of Stock Position and Sales Details.

Environment: PowerBuilder 5.0, Oracle 7.x, Windows 95

Consultant Information and Invoicing System,

Involved in the Consultant Information and Invoicing System that facilitates keeping track of potential consultants being recruited for project assignments.

Environment: Oracle 7.1, Developer 2000, Forms 4.5, SQL *Plus, PL/SQL, SQL * Loader

Stores Management System,

This package was developed for the computerization of Vijaya Packaging System, a manufacturer of industrial strength packing bags.

Environment: Oracle 7.1, SQL, PL/SQL, Developer 2000, Forms 4.5, Report 2.5 and Win 95

Industrial Information System

System facilitates to maintain the full-fledged information regarding all types of industries situated in the Hyderabad, Secunderabad and suburbs of these cities.

Environment: Oracle, Forms 3.0, PL/SQL, SQL *Plus, Pro* C

Worked on various projects like Inventory, Invoicing, accounting etc… using Foxpro, dbase, COBOL, Clipper.

Environment: Foxpro, dBase, Clipper, COBOL.