An experienced Software Developer focused on creating clean and maintainable software as prescribed by Software Improvement Group (SIG), armed with a Big Data & Analytics Certification from Bits Pilani. Highly skilled and proficient in developing distributed and scalable ETL, big data, data science, and machine learning solutions.
Have hands-on experience in various Big Data and Data Science tools like Core Java, Python, Numpy, Pandas, MatplotLib, SciKit Learn, Rapids AI, pySpark, Hadoop, HDFS, Spark, Hive, HBase, Flume, Sqoop, MapReduce, SQL, MySQL, MongoDB, Neo4J, Cloud, AWS, etc.
Following are some of the projects and case studies that I have worked on:
1. Designed a MapReduce program with a custom partitioner to find the trending songs based on days of the week.
2. Created ETL pipeline by using Scoop, Hive, and Hbase. Ingested data into HDFS using Scoop and imported it into Hive-Hbase integrated table, and analyzed it using Hive.
3. Created a Spark clustering program to cluster users with the same preferences using the K-Means clustering algorithm and ALS matrix factorization method.
4. Developed a scalable blog for an educational institute by deploying MongoDB as backend server. Deployed the MVC Architecture using a document-based NoSQL database with replication. Facilitated scalability, solved impedance mismatch problem and optimized usage of pagination & sorting.