Nitesh Kumar is a Data Engineer with 5 years of experience in developing and managing data integration and ETL pipelines. He has a strong proficiency in AWS services and Python, contributing to various data engineering projects.
Expertise in AWS services for data warehousing and ETL pipeline development.
Proficient in Python and SQL, utilized across multiple data engineering projects.
Developed a full-stack web application for database querying and management.
Implemented AWS Textract for modernizing data solutions by extracting unstructured data.
Delivered ECAN Customer 360 for Holcim loading plant data from 6+ source systems into Redshift
Implemented AWS Textract for modernising unstructured-PDF data extraction in pharma DAP
Built incremental data-loading mechanisms supporting hourly, daily and monthly frequencies
Developed full-stack DATALAB web app for cross-region PostgreSQL CRUD with React.js
Overview: This project involves loading plant-related data into a Redshift Data Warehouse for a prominent building material company. Responsibilities: Performed Data Ingestion, ETL Pipeline Development, and View Creation as per requirements. Modernized data solutions by implementing AWS Textract for extracting unstructured data from PDFs.
Key outcomes:
Developed efficient incremental data pipelines for hourly, daily, and monthly data loads.
Built and deployed scheduled and event-triggered ETL pipelines using AWS Glue.
Overview: This project for a pharmaceutical company involves onboarding medical datasets through a Data Analytics Platform. Responsibilities: Built Airflow pipelines for ETL processes and platform tasks scheduled as cron jobs. Developed Airflow tasks using shell scripts, Python scripts, and Kubernetes pod operations.
Key outcomes:
Managed end-to-end data ingestion and ETL pipeline development for critical medical datasets.
Ensured data quality through source vs. target DQ checks.
Overview: Developed a web application enabling users to perform basic queries on a PostgreSQL database across different EC2 regions. Responsibilities: Implemented user authentication against a PostgreSQL database, enabling users to perform CRUD operations.
Key outcomes:
Developed a full-stack web application for secure database querying and schema management.
Created a serverless framework application for efficient CRUD operations.
Key outcomes:
Automated AWS S3 bucket and Glacier vault creation with a custom Python CLI tool.
Integrated the CLI command into Airflow pipelines, enabling automated resource provisioning.
Implemented input validation and resource tagging/policy application for robust automation.
Nitesh Kumar
Data Engineer