A professional data engineer with over three years of experience, proficient in designing, building, and maintaining optimal data pipeline architectures. Skilled in implementing complex, scalable big data projects, with a deep understanding of database structures, principles, and best practices. Well-versed in multi-cloud platforms.
0 + Projects completed
Seasoned Data Engineer with 3+ years of experience building robust data pipelines and infrastructure to empower data-driven decision making. Proven expertise in data extraction, transformation, and loading (ETL), cloud platforms, and big data processing.
CGPA: 4.23 / 4.30
CGPA: 3.64 / 4.00 (Cum Laude)
Below are the sample Data Engineering projects on Python, SQL and Cloud Technologies.
Implemented distributed database system for healthcare industry with ERD design, database fragmentation logic, Global Data Catalog (GDC), and Java-based data parsing for optimal multi-region data management.
Serverless application generating captions for images using AWS Lambda, S3 for storage, DynamoDB for metadata management, and Streamlit for an interactive frontend interface.
Multi-cloud AI-powered web application transforming Canadian Reddit posts into news articles. Features AI-generated news summaries, conversational AI assistant, and interactive dashboard with real-time data visualization using AWS Lambda, Step Functions, GCP, and HuggingFace LLMs.
Comprehensive vacation home management system integrating AWS Lex chatbots, Lambda functions, SNS notifications, and GCP services. Features virtual assistant, message passing, notification system, and BigQuery-powered data analytics with Looker Studio visualizations.
Telecommunications customer churn prediction system using ensemble learning (Random Forest, AdaBoost, Stacking Classifier) with explainability features. Implements cost-sensitive learning, LIME explanations, and counterfactual analysis for personalized retention strategies, achieving 88.83% accuracy.
End-to-end Azure data analytics pipeline processing Tokyo 2021 Olympics dataset. Implements data ingestion via Azure Data Factory, transformation in Azure Databricks, and analytics with Azure Synapse Analytics on Data Lake Gen 2 for comprehensive Olympic athlete and medal insights.
End-to-end multi-cloud data processing pipeline for NYC taxi data. Combines AWS services (Lambda, Glue, S3, Step Functions, EventBridge, SNS) with Google Cloud (BigQuery, Looker Studio). Implements Bronze-Silver-Gold ETL architecture with real-time failure monitoring and CloudWatch alarms.
Comprehensive analysis of temporary residents' impact on Canadian housing market. Combines PCA dimensionality reduction, k-Means clustering, ARIMA time-series forecasting, and LLM-generated insights. Interactive Plotly dashboard with scatter plots, radar charts, and predictive visualizations for housing affordability trends.
Below are the details to reach out to me!
Calgary, AB, Canada