Shrey Soni  ·  Senior Snowflake Data Engineer  ·  8+ yrs

Senior
8+ years experienceremote
Available within 48 hrs

Proof of scale

99.8% uptime
85% reduction in processing time
52.35% reduction in operation cost
85% reduction in processing time99.8% uptime52.35% reduction in operation cost37% increase in performance
Built for
LululemonEveryday Health Group

About Shrey

Shrey Soni is a Data Engineer with 8+ years of experience in Data Engineering and Marketing Automation, specializing in Python, AWS, and Snowflake. He has a proven track record of optimizing data processes and enhancing operational efficiency.

8+ years of commercial experience in

Skills(24)

PythonAWSSnowflakeSQLPySparkAzureMongoDBPostgreSQLdbtApache SparkNLPWeb ScrapingCount VectorizerWord CloudHDFSHiveAzure Data LakePower BIAzure SynapseHadoopAzure Data FactoryData ModelingGCP BigQueryVS Code

Why hire Shrey?

Production deploy authorityMentored juniorsRecognized contributor in open source

Engineered an advanced data storage solution leveraging Amazon S3, optimizing data accessibility.

Automated 15+ ETL processes, reducing data processing time by 45%.

Implemented robust automated data validation mechanisms, guaranteeing data accuracy.

Migrated from SQL Server to Databricks, reducing operation costs by 52.35%.

Designed a data pipeline to organize data from 100+ sources while ensuring 99.8% uptime.

Reduced processing time by 85% through automated bug triage for SQL data ingestion.

Achieved a 52.35% reduction in operation costs by migrating to Databricks.

Increased performance by 37% after remodeling and migrating ETL processes.

Project highlights(5)

NLP Sentiment AnalysisPersonal / Open-Source

NLP Sentiment Analysis — open-source web scraping + word cloud + count vectorizer. Live evidence: GitHub personal project.

PythonNLPWeb ScrapingCount VectorizerWord Cloud

Key outcomes:

  • Saved an average of 23 mins per product review session by automating sentiment scoring across web-scraped reviews.

NLP Sentiment AnalysisData Scientist

Overview: Developed an NLP model to analyze product reviews. Responsibilities: Utilized web scraping and word cloud techniques to determine sentiment, saving time in review processes.

PythonNLPWeb Scraping

Key outcomes:

  • Reduced data-processing time by 45% via 15+ ETL automations in Databricks.

  • Reduced operational cost by 52.35% + increased performance by 37% by migrating SQL Server to Databricks Delta tables.

  • Pipeline organising 100+ data sources with 99.8% uptime.

ATC Dataset AnalysisData Engineer

Overview: Conducted a comprehensive analysis of the ATC dataset using Big Data tools. Responsibilities: Created partitioning of desired columns in Hive, performed data transformation using PySpark, and stored data into Azure Data Lake for visualization in Power BI.

PysparkHDFSHiveAzure Data LakePower BIAzure Synapse

Key outcomes:

  • Reduced processing time by 85% via automated SQL-ingestion duplicate detection across merged Anaplan modules.

  • Improved forecasting precision through automated data-validation across the ETL lifecycle.

ATC Live Dataset — Big Data AnalyticsAssociate Data Engineer

  • Comprehensive Data Pipeline using Azure Data Factory + Synapse to study the ATC Live Dataset (aviation).
  • Worked across 13+ Big Data Modules including Hadoop core, Azure Cloud, PySpark, NoSQL and SQL databases.
  • Devised event-scheduled ADF pipelines + Azure Synapse SQL pools.
  • Performed fishbone-diagram root-cause analysis; brainstormed recommendations.
SnowflakedbtPythonSQLPySparkHiveHadoopMongoDBAzure Data FactoryAzure SynapseApache SparkData ModelingGCP BigQuery

Key outcomes:

  • Improved efficiency by 19% through fishbone RCA and increased throughput by 23% through stakeholder-aligned process recommendations.

ATC Dataset Analysis using BigData (Personal Project, 2021)Personal Project

Personal ATC dataset analysis with Hive partitioning + PySpark + Azure Data Lake.

PySparkHDFSHiveAzure Data LakePower BIAzure SynapseVS Code

Key outcomes:

  • Live Hive + PySpark pipeline feeding Synapse SQL Pools and Power BI dashboards for ATC dataset analytics.

Industry experience

EdTech

Reported in resume

Logistics & Supply Chain

Reported in resume

Manufacturing & Industrial

1 project
  • ATC Live Dataset — Big Data AnalyticsAssociate Data EngineerSnowflake · dbt · Python · SQL +9

Ready to work with Shrey?

Schedule an interview and onboard within 48 hours. No long hiring cycles.

At a Glance

Experience8+ years
Work moderemote
Starting from₹1.5 L/mo
Direct hirePossible
Start within48 hours
From₹1.5 L/ month

Single contract. No agency markup confusion.

Typically responds within 4 business hours.

5-day replacement guarantee
48-hour onboarding, single invoice
Direct chat — no recruiter middleman
Seniority signals
Owns production deploysGreenfield architectSystem ownerCode reviewerMentor / leads juniorsRecognised OSS contributor
VerifiedVetted by Witarist
Technical skills assessed & verified
Background & identity checked
English communication verified
Ready to onboard in 48 hours

Not sure if this is the right fit?

Tell us your requirements and we'll match you with the best candidates.

Shrey Soni

Data Engineer