Saidulu · Senior Azure Data Engineer · 6+ yrs

Mid-Level

6+ years experienceremote

Available within 48 hrs

About Saidulu

Saidulu is a Data Engineer with 6+ years of experience specializing in Azure Data Engineering technologies. He has a proven track record in building scalable data infrastructures and optimizing performance for large datasets.

6+ years of commercial experience in

Logistics & Supply Chain

Skills(17)

Azure Data FactoryAzure DatabricksPySparkSparkSQLPythonHiveAzure Data Lake StorageMySQLOracleSparkSQLHadoopHDFSSqoopParquetAvroORC

Why hire Saidulu?

Production deploy authorityMentored 5+ juniors

Led the migration of on-premises workloads to Azure Cloud, leveraging Azure Data Factory and Azure Databricks.

Implemented performance tuning techniques in Spark and Azure Data Factory, optimizing data processing and pipeline efficiency.

Designed and built scalable infrastructure on Azure for collecting, processing, and analyzing large datasets.

Developed dynamic data pipelines using parameterization and control tables, enhancing flexibility and reusability.

Successfully interacted with business users to gather requirements and report project progress, ensuring alignment with business needs.

On-prem → Azure Cloud migration

Spark + ADF performance tuning

Project highlights(3)

Azure Data Migration Project – Azure Data Engineer

Overview: This project focused on data extraction, integration, and migration to Azure Data Lake from various sources. Responsibilities: Responsible for data extraction and integration from different data sources into Azure Data Lake using Azure Data Factory and Azure Databricks ETL pipelines. Converted data into appropriate formats to optimize reads, memory, and calculate key metrics. Implemented Spark using Python in Databricks, leveraging Data Frames and Spark-SQL API for faster data processing. Tuned ADF copy activity efficiently and transformations using ADF Mapping Data Flows. Migrated data from on-premises and database/legacy applications to Azure using ADF and ADLS. Transformed SQL, T-SQL, and SSIS flows into Spark SQL and Data Frames in ADB, extracting from MySQL and loading into ADLS to create unmanaged tables. Validated results and created test documents for migrated tables; pushed notebooks to Azure Repos and managed pipeline changes. Fetched and processed data from Data Lake Gen2 or SQL databases in Azure Databricks. Parameterized datasets for dynamic object discovery and movement into curated zones. Fetched files from raw Data Lake containers for transformations, loading curated data into database objects, and creating snapshots/incremental data. Validated, debugged, and published pipelines, and created daily triggers for scheduling.

Azure Data FactoryAzure DatabricksPySparkSparkSQLAzure Data Lake Storage

Key outcomes:

Tuned Azure Data Factory copy activity and transformations using Mapping Data Flows efficiently.
Validated migrated data results and created comprehensive test documents.
Managed notebook version control and deployment using Azure Repos, including branching and merging strategies.

Demand Forecasting Processor – Pyspark Developer

Overview: This project focused on demand forecasting, processing large datasets from HDFS to identify user behavior and product expectations. Responsibilities: Participated in requirements gathering, project inception, and story sizing. Developed technical specifications based on client requirements. Analyzed data using Spark SQL queries and scripts to understand user behavior and identify desired facilities from product history. Involved in a Demand Forecasting processor to process data from HDFS and store it in Hbase. Created and partitioned Hive tables to store processed results in a tabular format. Developed Data Frames using Case classes for required input data. Created RDDs and Data Frames for input data, performing data transformations with Spark-core. Wrote SQL queries to process data using Spark SQL.

PySparkSparkSQLHiveHadoopHDFS

Key outcomes:

Analyzed user behavior and product expectations by performing Spark SQL queries and scripts.
Involved in processing demand forecasting data from HDFS.
Developed Data Frames using Case classes for efficient data processing.

Dynamic Data Pipeline Creation – Azure Data Engineer

Overview: This project involved creating dynamic data pipelines using Azure Data Factory and Databricks, focusing on data ingestion, transformation, and export. Responsibilities: Created dynamic pipelines using parameterization and control tables. Involved in creating Hive tables, loading data, and writing Hive queries. Imported data from Oracle to Hive using Sqoop. Replaced the default Derby metadata storage system for Hive with MySQL. Loaded and transformed large sets of structured and semi-structured (logs) data. Implemented business logic using Databricks with PySpark and ADF. Imported required tables from RDBMS to Azure using ADF, mapping data to the target data model using mapping data flows. Used Hive to form an abstraction on top of structured data in HDFS, implementing Partitions, Dynamic Partitions, and Buckets on HIVE tables. Exported analyzed data to relational databases using Sqoop for visualization and Power BI reporting. Worked on Spark SQL performance tuning techniques including Execution Plan Analysis, Data Management (Catching, Broadcasting), Tungsten Leverages, and Catalyst Optimizer. Loaded files to HDFS and wrote Hive queries; used Hive queries in Spark-SQL for analysis. Experienced in using Parquet, Avro, and ORC file formats for efficient compression.

PySparkAzure DatabricksAzure Data FactorySQLHiveOracleMySQLHDFSSqoopParquetAvroORC

Key outcomes:

Created dynamic pipelines with parameterization and control tables, improving pipeline flexibility.
Implemented business logic efficiently using Databricks with PySpark and Azure Data Factory.
Optimized Hive tables with Partitions, Dynamic Partitions, and Buckets for efficient data abstraction and querying.
Improved Spark SQL performance through various tuning techniques.

Industry experience

Logistics & Supply Chain

Reported in resume

Ready to work with Saidulu?

Schedule an interview and onboard within 48 hours. No long hiring cycles.

At a Glance

Experience6+ years

Work moderemote

Starting from₹1.6 L/mo

Direct hirePossible

Start within48 hours

From₹1.6 L/ month

Single contract. No agency markup confusion.

Typically responds within 4 business hours.

5-day replacement guarantee

48-hour onboarding, single invoice

Direct chat — no recruiter middleman

Seniority signals

Owns production deploysGreenfield architectSystem ownerCode reviewerMentor / leads juniors

Vetted by Witarist

Technical skills assessed & verified

Background & identity checked

English communication verified

Ready to onboard in 48 hours

Not sure if this is the right fit?

Tell us your requirements and we'll match you with the best candidates.

Saidulu

Big Data Engineer