Soumyadeep · Senior Multi-Cloud Data Engineer · 5+ yrs

Mid-Level

5+ years experienceremote

Available within 48 hrs

About Soumyadeep

Soumyadeep is a Data Engineer with 8+ years of experience in building and modernizing data solutions across multiple industries. He has extensive expertise in cloud technologies and big data processing, demonstrating strong leadership in managing data pipelines and analytics.

5+ years of commercial experience in

FinTech HealthTech Insurance Logistics & Supply Chain

Skills(39)

PythonAWSAzureGCPDatabricksKafkaSnowflakePowerBITableauTerraformHiveAWS EMRAWS LambdaGoogle Data ProcAzure Data Lake Storage Gen2SynapseAWS DataFactoryRedshiftBigQueryGoogle Data FusionGoogle Cloud StorageSQLHBaseCassandraMongoDBDrillPrestoPhoenixAthenaAWS GlueAWS S3SparkScalaKinesisHDFSMapReducePigScoopFlume

Why hire Soumyadeep?

Production deploy authorityMentored 5+ juniorsLed big data initiatives

Ownership of full data pipeline lifecycle from ingestion to visualization across multiple projects.

Extensive experience with multi-cloud deployments (AWS, Azure, GCP) and hybrid solutions.

Proven ability to migrate and modernize ETL processes using cloud-native services.

Strong background in big data processing and real-time streaming architectures.

Technical leadership experience demonstrated in managing big data initiatives.

Robust scalable pipelines with Databricks + DataFactory + Data Fusion

Streaming ingestion with Kafka + Kinesis

CDC into Synapse + Snowflake (via snowpipes)

Project highlights(4)

Cloud Data Pipeline Development – Data Engineer

Overview: This project focuses on modern data engineering practices using cloud-native services to build robust and scalable data solutions. Responsibilities: Installed and configured softwares, services, VMs, and databases leveraging Terraform with Azure Resource Manager on AWS, Google Cloud, and Azure servers. Engaged in data streaming, filtering, extraction, and analysis activities utilizing Hive, PySpark, Kafka, AWS EMR, Lambda, Data Proc, and Databricks. Developed a scalable data pipeline using Databricks and DataFactory to extract parquet files from ADLS2, transform them, and load into Synapse for advanced analytics.

TerraformAzureAWSGCPHivePythonKafkaAWS EMRAWS LambdaGoogle Data ProcDatabricksAzure Data Lake Storage Gen2SynapseAWS DataFactory

Key outcomes:

Successfully configured multi-cloud infrastructure using Terraform.
Implemented data streaming and analysis pipelines with a variety of big data tools.
Created a scalable data pipeline for advanced analytics.

Data Warehousing and ETL Solutions – Data Engineer

Overview: This project focused on data ingestion, transformation, and analysis within a data warehousing context, including streaming and NoSQL data. Responsibilities: Performed data ingestion, transformation, and analysis using Redshift, BigQuery, Synapse, and Snowflake data warehouse platforms. Constructed scalable data pipelines with Data Fusion and Data Proc to extract CSV/JSON from GCS, transform, and load into BigQuery. Built a streaming ingestion solution using Kafka to capture CDC in SQL databases and write records to Synapse for analysis. Worked with NoSQL databases (Hbase, Cassandra, MongoDB) and distributed SQL query engines (Drill, Presto, Phoenix, Athena) for data analysis. Developed an ETL pipeline using AWS Glue to extract CSV/JSON from S3, transform, and load into Redshift.

RedshiftBigQuerySynapseSnowflakeGoogle Data FusionGoogle Data ProcGoogle Cloud StorageKafkaSQLHBaseCassandraMongoDBDrillPrestoPhoenixAthenaAWS GlueAWS S3

Key outcomes:

Implemented streaming CDC solutions using Kafka.
Managed diverse data sources including NoSQL and distributed SQL engines.
Developed comprehensive ETL pipelines for cloud data warehouses.

Big Data Transformation and Visualization – Data Engineer

Overview: This project involved big data transformation, pipeline setup, and visualization using various cloud and open-source tools. Responsibilities: Wrote programs in PySpark and Scala for big data transformation activities. Set up scalable data pipelines and data ingestion activities using Databricks, DataFactory, Data Fusion, and Glue to transform unstructured/semi-structured data. Developed Dashboards and visualizations using PowerBI and Tableau. Built a streaming ingestion pipeline using Kinesis/Data Firehose to ingest data from Cloudwatch logs and write to Redshift.

PythonSparkScalaDatabricksAWS DataFactoryGoogle Data FusionAWS GluePowerBITableauKinesisRedshift

Key outcomes:

Successfully transformed big data using PySpark and Scala.
Implemented scalable data pipelines for diverse data formats.
Delivered data insights through dashboards and visualizations.

Big Data Initiatives Leadership – Technical Lead

Overview: This project involved leading big data initiatives focusing on streaming, extraction, analysis, and transformation, along with cloud integration. Responsibilities: Involved in various big data streaming, extraction, filtering, analysis, and transformation activities using HDFS, MapReduce, Pig, Hive, PySpark, Python, Scala, Scoop, Flume, and Hbase. Utilized AWS, Azure, and Google Cloud as service providers for big data products. Loaded data into Snowflake using snowpipes from S3 buckets, creating external stages and snowflake streams for change data capture. Wrote Python, Scala, and Spark programs to analyze and store big data workloads (CSV, JSON, Parquet, relational/NoSQL databases) into data warehouses for analytical purposes.

HDFSMapReducePigHivePythonSparkScalaScoopFlumeHBaseAWSAzureGCPSnowflakeAWS S3SQL

Key outcomes:

Led big data streaming and transformation efforts across multiple tools.
Implemented Change Data Capture (CDC) into Snowflake from S3.
Developed custom programs for big data workload analysis and warehousing.

Industry experience

FinTech

Reported in resume

HealthTech

1 project

•Cloud Data Pipeline Development— Data EngineerTerraform · Azure · AWS · GCP +10

Insurance

Reported in resume

Logistics & Supply Chain

2 projects

•Data Warehousing and ETL Solutions— Data EngineerRedshift · BigQuery · Synapse · Snowflake +14
•Big Data Initiatives Leadership— Technical LeadHDFS · MapReduce · Pig · Hive +12

Ready to work with Soumyadeep?

Schedule an interview and onboard within 48 hours. No long hiring cycles.

At a Glance

Experience5+ years

Work moderemote

Starting from₹1.6 L/mo

Direct hirePossible

Start within48 hours

From₹1.6 L/ month

Single contract. No agency markup confusion.

Typically responds within 4 business hours.

5-day replacement guarantee

48-hour onboarding, single invoice

Direct chat — no recruiter middleman

Seniority signals

Owns production deploysGreenfield architectSystem ownerCode reviewerMentor / leads juniors

Vetted by Witarist

Technical skills assessed & verified

Background & identity checked

English communication verified

Ready to onboard in 48 hours

Not sure if this is the right fit?

Tell us your requirements and we'll match you with the best candidates.

Soumyadeep

Data Engineer