Shrey Soni · Senior Snowflake Data Engineer · 8+ yrs

Senior

8+ years experienceremote

Available within 48 hrs

Proof of scale

99.8% uptime

85% reduction in processing time

52.35% reduction in operation cost

85% reduction in processing time99.8% uptime52.35% reduction in operation cost37% increase in performance

Built for

LululemonEveryday Health Group

About Shrey

Shrey Soni is a Data Engineer with 8+ years of experience in Data Engineering and Marketing Automation, specializing in Python, AWS, and Snowflake. He has a proven track record of optimizing data processes and enhancing operational efficiency.

8+ years of commercial experience in

EdTech Logistics & Supply Chain Manufacturing & Industrial

Skills(24)

PythonAWSSnowflakeSQLPySparkAzureMongoDBPostgreSQLdbtApache SparkNLPWeb ScrapingCount VectorizerWord CloudHDFSHiveAzure Data LakePower BIAzure SynapseHadoopAzure Data FactoryData ModelingGCP BigQueryVS Code

Why hire Shrey?

Production deploy authorityMentored juniorsRecognized contributor in open source

Engineered an advanced data storage solution leveraging Amazon S3, optimizing data accessibility.

Automated 15+ ETL processes, reducing data processing time by 45%.

Implemented robust automated data validation mechanisms, guaranteeing data accuracy.

Migrated from SQL Server to Databricks, reducing operation costs by 52.35%.

Designed a data pipeline to organize data from 100+ sources while ensuring 99.8% uptime.

Reduced processing time by 85% through automated bug triage for SQL data ingestion.

Achieved a 52.35% reduction in operation costs by migrating to Databricks.

Increased performance by 37% after remodeling and migrating ETL processes.

Project highlights(5)

NLP Sentiment Analysis – Personal / Open-Source

NLP Sentiment Analysis — open-source web scraping + word cloud + count vectorizer. Live evidence: GitHub personal project.

PythonNLPWeb ScrapingCount VectorizerWord Cloud

Key outcomes:

Saved an average of 23 mins per product review session by automating sentiment scoring across web-scraped reviews.

NLP Sentiment Analysis – Data Scientist

Overview: Developed an NLP model to analyze product reviews. Responsibilities: Utilized web scraping and word cloud techniques to determine sentiment, saving time in review processes.

PythonNLPWeb Scraping

Key outcomes:

Reduced data-processing time by 45% via 15+ ETL automations in Databricks.
Reduced operational cost by 52.35% + increased performance by 37% by migrating SQL Server to Databricks Delta tables.
Pipeline organising 100+ data sources with 99.8% uptime.

ATC Dataset Analysis – Data Engineer

Overview: Conducted a comprehensive analysis of the ATC dataset using Big Data tools. Responsibilities: Created partitioning of desired columns in Hive, performed data transformation using PySpark, and stored data into Azure Data Lake for visualization in Power BI.

PysparkHDFSHiveAzure Data LakePower BIAzure Synapse

Key outcomes:

Reduced processing time by 85% via automated SQL-ingestion duplicate detection across merged Anaplan modules.
Improved forecasting precision through automated data-validation across the ETL lifecycle.

ATC Live Dataset — Big Data Analytics – Associate Data Engineer

Comprehensive Data Pipeline using Azure Data Factory + Synapse to study the ATC Live Dataset (aviation).

Worked across 13+ Big Data Modules including Hadoop core, Azure Cloud, PySpark, NoSQL and SQL databases.
Devised event-scheduled ADF pipelines + Azure Synapse SQL pools.
Performed fishbone-diagram root-cause analysis; brainstormed recommendations.

SnowflakedbtPythonSQLPySparkHiveHadoopMongoDBAzure Data FactoryAzure SynapseApache SparkData ModelingGCP BigQuery

Key outcomes:

Improved efficiency by 19% through fishbone RCA and increased throughput by 23% through stakeholder-aligned process recommendations.

ATC Dataset Analysis using BigData (Personal Project, 2021) – Personal Project

Personal ATC dataset analysis with Hive partitioning + PySpark + Azure Data Lake.

PySparkHDFSHiveAzure Data LakePower BIAzure SynapseVS Code

Key outcomes:

Live Hive + PySpark pipeline feeding Synapse SQL Pools and Power BI dashboards for ATC dataset analytics.

Industry experience

EdTech

Reported in resume

Logistics & Supply Chain

Reported in resume

Manufacturing & Industrial

1 project

•ATC Live Dataset — Big Data Analytics— Associate Data EngineerSnowflake · dbt · Python · SQL +9

Ready to work with Shrey?

Schedule an interview and onboard within 48 hours. No long hiring cycles.

At a Glance

Experience8+ years

Work moderemote

Starting from₹1.5 L/mo

Direct hirePossible

Start within48 hours

From₹1.5 L/ month

Single contract. No agency markup confusion.

Typically responds within 4 business hours.

5-day replacement guarantee

48-hour onboarding, single invoice

Direct chat — no recruiter middleman

Seniority signals

Owns production deploysGreenfield architectSystem ownerCode reviewerMentor / leads juniorsRecognised OSS contributor

Vetted by Witarist

Technical skills assessed & verified

Background & identity checked

English communication verified

Ready to onboard in 48 hours

Not sure if this is the right fit?

Tell us your requirements and we'll match you with the best candidates.

Shrey Soni

Data Engineer