Shrey Soni is a Data Engineer with 8+ years of experience in Data Engineering and Marketing Automation, specializing in Python, AWS, and Snowflake. He has a proven track record of optimizing data processes and enhancing operational efficiency.
Engineered an advanced data storage solution leveraging Amazon S3, optimizing data accessibility.
Automated 15+ ETL processes, reducing data processing time by 45%.
Implemented robust automated data validation mechanisms, guaranteeing data accuracy.
Migrated from SQL Server to Databricks, reducing operation costs by 52.35%.
Designed a data pipeline to organize data from 100+ sources while ensuring 99.8% uptime.
Reduced processing time by 85% through automated bug triage for SQL data ingestion.
Achieved a 52.35% reduction in operation costs by migrating to Databricks.
Increased performance by 37% after remodeling and migrating ETL processes.
NLP Sentiment Analysis — open-source web scraping + word cloud + count vectorizer. Live evidence: GitHub personal project.
Key outcomes:
Saved an average of 23 mins per product review session by automating sentiment scoring across web-scraped reviews.
Overview: Developed an NLP model to analyze product reviews. Responsibilities: Utilized web scraping and word cloud techniques to determine sentiment, saving time in review processes.
Key outcomes:
Reduced data-processing time by 45% via 15+ ETL automations in Databricks.
Reduced operational cost by 52.35% + increased performance by 37% by migrating SQL Server to Databricks Delta tables.
Pipeline organising 100+ data sources with 99.8% uptime.
Overview: Conducted a comprehensive analysis of the ATC dataset using Big Data tools. Responsibilities: Created partitioning of desired columns in Hive, performed data transformation using PySpark, and stored data into Azure Data Lake for visualization in Power BI.
Key outcomes:
Reduced processing time by 85% via automated SQL-ingestion duplicate detection across merged Anaplan modules.
Improved forecasting precision through automated data-validation across the ETL lifecycle.
Key outcomes:
Improved efficiency by 19% through fishbone RCA and increased throughput by 23% through stakeholder-aligned process recommendations.
Personal ATC dataset analysis with Hive partitioning + PySpark + Azure Data Lake.
Key outcomes:
Live Hive + PySpark pipeline feeding Synapse SQL Pools and Power BI dashboards for ATC dataset analytics.
Shrey Soni
Data Engineer