Chirudeep Tupakula

Oklahoma State University 25'
Master's in Computer Science

Data Engineer

|

About Me

I’m Chirudeep Tupakula—a Python Full-Stack GenAI Developer & Data Engineer with 2 + years leading scalable GenAI APIs and big-data solutions in AWS and Azure. I build enterprise-grade microservices with Flask / FastAPI, OpenAI, RAG, and Llama, then surface insights through modern React.js front-ends.

My toolkit spans Spark, PySpark, Spark ML, TensorFlow for large-scale data engineering and ML pipelines, plus Databricks for seamless end-to-end workflows. Recent work includes a GenAI-powered FastAPI service on AWS Lambda that cut retrieval latency 40 %, and lakehouse architectures ingesting 1 M + events/day with Kafka, Delta Lake, Redshift, and Athena.

I’m comfortable containerising with Docker, automating releases via CI/CD, and collaborating in agile, cross-functional teams. Currently pursuing an M.S. in Computer Science at Oklahoma State University, I thrive at the intersection of data engineering and AI—delivering robust, customer-focused GenAI products that drive measurable value.

Skills

Python

SQL

PySpark

AWS (S3, Lambda, Redshift)

FastAPI

Flask

Databricks

Apache Spark

Apache Kafka

ETL Pipelines

Docker

Airflow

PostgreSQL

Snowflake

Delta Lake

Tableau

Google BigQuery

Power BI

Azure Databricks

Kubernetes

LangChain

Vector Embeddings (FAISS, Pinecone)

Projects

GenAI-Powered Customer Retention Analysis

Built a RAG solution using LLMs and Pinecone to provide explainable churn predictions. Delivered real-time recommendations via FastAPI APIs, with Power BI monitoring dashboards connected through AWS Lambda.

Sales Analytics Pipeline on Azure Databricks

Implemented a Lakehouse architecture using AWS S3, Delta Lake, AWS Glue, and Azure Databricks. Designed parallelized PySpark ETL jobs and exposed aggregated sales metrics via Dockerized FastAPI endpoints consumed by Power BI.

Ambiguity Detection with Human-in-the-Loop Validation

Developed a machine learning system to detect ambiguous input cases and validate predictions via crowdsourced survey graphs. Human feedback was looped back to fine-tune the model iteratively, ensuring higher explainability and prediction trust.

Cowboy Connect – Volunteer Tracker & LMS with AI Chatbot

Built a full-stack web app to manage student volunteer hours and classroom lessons. Integrated a ChatGPT-style LLM chatbot using OpenAI for real-time help and query routing. Includes authentication, club management, and assignment tracking.

Experience

GenAI Application Developer

Oklahoma State University

09/2023 – 05/2025

Designed Python logic to classify, summarise, and answer university-policy queries with Generative AI.
Built FastAPI backend using Retrieval-Augmented Generation, OpenAI embeddings, and FAISS for context-aware responses.
Developed React.js frontend (CSS / HTML / Tailwind) enabling seamless interaction with the GenAI service.
Wrote pytest suites for API reliability and maintainability.
Containerised backend and frontend in Docker; deployed to Azure Kubernetes Service (AKS) for scalable, resilient ops.
Leveraged Azure Blob Storage for secure document and application data management.
Owned end-to-end Azure CI/CD pipeline from automated testing through production rollout.

Python Full Stack Developer

Cloudtaru

05/2022 – 07/2023

Developed scalable GenAI APIs/microservices with Flask & FastAPI on AWS, integrating OpenAI, RAG, embeddings, and FAISS.
Delivered interactive React.js front-ends and Tableau dashboards for rich analytics.
Engineered PySpark and Databricks SQL pipelines processing multi-terabyte datasets on AWS.
Integrated Spark ML and TensorFlow models into ETL flows for real-time anomaly detection.
Automated ETL pipelines with Airflow and Jenkins, improving reliability and reducing manual effort.
Collaborated with US product owners and offshore teams to convert business needs into technical solutions.

Big Data Engineer

Forsys Inc

05/2021 – 04/2022

Re-architected legacy ETL with Spark, PySpark, and Delta Lake on Hadoop, cutting compute costs 30%.
Managed large-scale data workflows via Hive, SQL, and advanced scripting for robust analytics.
Designed Tableau dashboards using SQL window functions, halving report runtimes.
Administered Hadoop clusters, Hive Metastore, and Unix shell workflows to maintain 99%+ data availability.
Integrated MongoDB, Oracle, and MySQL with modern data platforms, boosting data accessibility.

Certifications

AWS Certified Cloud Practitioner

Amazon Web Services

Tableau Desktop Specialist