โœจ Available for new opportunities

Hi, I'mMayank

Data Engineer & Full-stack Developer

About Me

I'm a passionate Data Engineer and Full-stack Developer who loves creating amazing web experiences and building robust data pipelines. With expertise in modern web technologies and data engineering tools, I build scalable applications and data solutions that make a difference.

Python
Apache Spark
AWS
PostgreSQL
React
Next.js
TypeScript
Docker

Featured Projects

Some of my recent work and side projects that showcase my skills

๐ŸŽจ

AI Design Tool

A modern AI-powered design tool that generates logos, color palettes, and design suggestions using OpenAI APIs.

Next.jsTypeScriptOpenAI APIFramer Motion
View Project
๐Ÿ›’

E-Commerce Platform

A full-stack e-commerce solution built with Next.js, Stripe, and PostgreSQL.

Next.jsTypeScriptStripePostgreSQL
View Project
๐Ÿ“‹

Task Management App

A collaborative task management tool with real-time updates and team features.

ReactNode.jsSocket.ioMongoDB
View Project

Interactive Data Pipeline

Click "Run Pipeline" to see how data flows through my ETL process

๐Ÿ—„๏ธ
PostgreSQL
๐Ÿงน
Data Cleaning
โšก
Apache Spark
๐Ÿ”ง
Feature Engineering
๐Ÿ“Š
Data Warehouse

Extract data from PostgreSQL database

5
Pipeline Stages
99.9%
Uptime
10TB+
Data Processed

Live Code Editor

Try out real data engineering code examples. Click "Run Code" to see the results!

Apache Spark ETL Pipeline - PYTHON
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, isnan, isnull

# Initialize Spark session
spark = SparkSession.builder \
    .appName("CustomerDataETL") \
    .getOrCreate()

# Read data from source
df = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql://localhost:5432/customers") \
    .option("dbtable", "customer_data") \
    .load()

# Data cleaning and transformation
cleaned_df = df \
    .filter(col("age").isNotNull()) \
    .filter(col("age") > 0) \
    .withColumn("age_group", 
        when(col("age") < 25, "Young")
        .when(col("age") < 50, "Middle")
        .otherwise("Senior")) \
    .withColumn("is_premium", col("purchase_amount") > 1000)

# Write to data warehouse
cleaned_df.write \
    .format("parquet") \
    .mode("overwrite") \
    .save("s3://data-lake/customers/cleaned/")

print(f"Processed {{cleaned_df.count()}} records")

Apache Spark ETL Pipeline

Extract, transform, and load data using PySpark

๐Ÿ
Python & PySpark
Real data processing code
โšก
Live Execution
Run code and see results
๐Ÿ“Š
Real Examples
Production-ready patterns

My Projects

Explore my portfolio of data engineering, web development, and ML projects

Category

Status

Complexity

Sort by:

Showing 8 of 8 projects

Featuredcompleted
advanced

Real-time ETL Pipeline

Built a scalable ETL pipeline using Apache Spark, Kafka, and PostgreSQL for processing 10TB+ of customer data daily.

Apache SparkKafkaPostgreSQL+2 more
Featuredcompleted
advanced

ML Recommendation Engine

Developed a collaborative filtering recommendation system using PySpark MLlib and deployed with MLflow.

PySparkMLlibMLflow+2 more
Featuredcompleted
intermediate

Interactive Data Dashboard

Created a real-time analytics dashboard using React, D3.js, and FastAPI for visualizing business metrics.

ReactD3.jsFastAPI+2 more
in progress
advanced

ML Feature Store

Built a centralized feature store using Feast for managing ML features across multiple models.

FeastRedisPostgreSQL+2 more
planned
advanced

Real-time Stream Processing

Implemented real-time data processing using Apache Flink and Apache Pulsar for event-driven architecture.

Apache FlinkApache PulsarJava+2 more
completed
intermediate

A/B Testing Platform

Developed a statistical analysis platform for A/B testing with automated experiment evaluation.

PythonPandasScipy+2 more
in progress
intermediate

Data Pipeline Monitoring

Built a comprehensive monitoring system using Grafana, Prometheus, and custom alerting for data pipelines.

GrafanaPrometheusPython+2 more
completed
advanced

Cloud Data Warehouse

Designed and implemented a cloud-based data warehouse using Snowflake and dbt for modern analytics.

SnowflakedbtAirflow+2 more

Technical Blog

Insights, tutorials, and best practices from my data engineering journey

Category

Difficulty

Sort by

Showing 8 of 8 articles

Featured Articles

Featured
advanced

Optimizing Apache Spark for Large-Scale Data Processing

Learn advanced techniques to optimize your Spark jobs for processing terabytes of data efficiently.

January 15, 202412 min read
๐Ÿ‘๏ธ 1250โค๏ธ 89
Featured
intermediate

ETL Pipeline Design Patterns and Best Practices

Comprehensive guide to building robust, scalable ETL pipelines with real-world examples.

January 10, 20248 min read
๐Ÿ‘๏ธ 980โค๏ธ 67

Recent Articles

advanced

Optimizing Apache Spark for Large-Scale Data Processing

Learn advanced techniques to optimize your Spark jobs for processing terabytes of data efficiently.

Apache SparkPerformance+2
12 min
๐Ÿ‘๏ธ 1250โค๏ธ 89
intermediate

ETL Pipeline Design Patterns and Best Practices

Comprehensive guide to building robust, scalable ETL pipelines with real-world examples.

ETLData Pipeline+2
8 min
๐Ÿ‘๏ธ 980โค๏ธ 67
advanced

MLOps with MLflow: Managing Machine Learning Lifecycle

Complete guide to implementing MLOps practices using MLflow for model management and deployment.

MLOpsMLflow+2
15 min
๐Ÿ‘๏ธ 750โค๏ธ 45
intermediate

Building Interactive Data Visualizations with D3.js

Step-by-step tutorial on creating stunning, interactive data visualizations for your web applications.

D3.jsData Visualization+2
10 min
๐Ÿ‘๏ธ 650โค๏ธ 38
beginner

SQL Query Optimization: 10 Essential Tips for Data Engineers

Master SQL optimization techniques to write faster, more efficient queries for large datasets.

SQLOptimization+2
6 min
๐Ÿ‘๏ธ 1200โค๏ธ 92
advanced

Modern Data Warehouse Architecture: From Star Schema to Data Vault

Explore different data warehouse modeling techniques and when to use each approach.

Data WarehouseStar Schema+2
14 min
๐Ÿ‘๏ธ 890โค๏ธ 56
advanced

Building Real-Time Analytics with Apache Kafka and ClickHouse

Learn how to build a real-time analytics system using Kafka for streaming and ClickHouse for storage.

Real-timeKafka+2
11 min
๐Ÿ‘๏ธ 720โค๏ธ 43
beginner

Essential Python Libraries for Data Engineering

A curated list of must-know Python libraries for data engineers, from Pandas to Airflow.

PythonData Engineering+2
7 min
๐Ÿ‘๏ธ 1100โค๏ธ 78