Imran is a Data Engineer with 9+ years of experience in building and executing end-to-end data pipelines. He has a strong background in the healthcare domain and is proficient in AWS, Apache Spark, and Airflow.
Ownership of end-to-end data pipeline development across multiple projects.
Developed a common big data platform on AWS for product data processing and analytics.
Automated unit testing platforms and contributed to CI/CD pipeline integration using GitHub/Jenkins.
Onboarded country-specific data from 15 countries to data lakes, improving clinical trial success rates.
Implemented Airflow for efficient job scheduling and managing complex data dependencies.
Successfully built and executed end-to-end ETL data pipelines using Apache Spark, Hive, and Presto.
Contributed to building scalable frameworks that reduce healthcare costs and improve patient care.
Overview: This project involved building a common big data platform on AWS to store and process the company's product data. Responsibilities: Built data pipelines using Apache Spark (Spark SQL and Data Frames). Developed a data input pipeline, storing raw JSON events/database data in an AWS S3 Data Lake. Utilized Pig, Hive, PySpark, and Presto to process data and create tables. Employed Airflow to schedule tasks and manage dependencies for ETL processing. Stored processed output in a data warehouse for analytics and model creation, and developed an automated platform for unit testing.
Key outcomes:
Built end-to-end data pipelines on AWS for product data processing.
Developed an automated platform for unit testing.
Overview: PIMS is a scalable framework designed to enable WellPoint/Anthem to contract with Primary Care Physicians. Responsibilities: Understood Process Design Documents and project requirements for data processing. Analyzed and worked with BTEQ scripts and created customized HQLs. Developed Spark jobs using SparkSQL and DataFrames. Used Control-M for job scheduling and employed Informatica to pull data from multiple sources. Prepared and shared daily status updates with stakeholders and attended project scrum meetings.
Key outcomes:
Developed Spark jobs for data processing using SparkSQL and DataFrames.
Overview: The ACOE project focused on loading country-specific data into individual data lakes and automating syndicated analytics. Responsibilities: Understood Process Design Documents and gathered requirements for data pipeline construction. Extracted data from sources like SQL Server, flat files, and Oracle using Sqoop and Spark. Prepared customized HQLs and validated transformation logic. Scheduled jobs using an internal Job Scheduler and provided training to new team members.
Key outcomes:
Onboarded data from 15 countries into data lakes.
AppScript — platform for discovering + prescribing + tracking digital patient engagement tools + electronic prescribing of digital health apps + devices + content.
Key outcomes:
Wrote test cases and prepared SQL logic for data copying and validation.
Involved in automation testing using Java.
Prepared data for web and mobile applications.
Mobile Intelligence (MI Touch + MI Online) — web-enabled solution for life science companies to streamline product launches + promotional activities.
Key outcomes:
Loaded flat file data into the database using KOMODO.
Implemented Slowly Changing Dimensions (SCD).
Validated source-to-target transformation logic.
Imran
Data Engineer