Chirudeep Tupakula

Oklahoma State University 25'
Master's in Computer Science

Data Engineer

|

Chirudeep Tupakula

About Me

I’m Chirudeep Tupakula—a Python Full-Stack GenAI Developer & Data Engineer with 2 + years leading scalable GenAI APIs and big-data solutions in AWS and Azure. I build enterprise-grade microservices with Flask / FastAPI, OpenAI, RAG, and Llama, then surface insights through modern React.js front-ends.

My toolkit spans Spark, PySpark, Spark ML, TensorFlow for large-scale data engineering and ML pipelines, plus Databricks for seamless end-to-end workflows. Recent work includes a GenAI-powered FastAPI service on AWS Lambda that cut retrieval latency 40 %, and lakehouse architectures ingesting 1 M + events/day with Kafka, Delta Lake, Redshift, and Athena.

I’m comfortable containerising with Docker, automating releases via CI/CD, and collaborating in agile, cross-functional teams. Currently pursuing an M.S. in Computer Science at Oklahoma State University, I thrive at the intersection of data engineering and AI—delivering robust, customer-focused GenAI products that drive measurable value.

Skills

Python
SQL
PySpark
AWS (S3, Lambda, Redshift)
FastAPI
Flask
Databricks
Apache Spark
Apache Kafka
ETL Pipelines
Docker
Airflow
PostgreSQL
Snowflake
Delta Lake
Tableau
Google BigQuery
Power BI
Azure Databricks
Kubernetes
LangChain
Vector Embeddings (FAISS, Pinecone)

Projects

GenAI-Powered Customer Retention Analysis

Built a RAG solution using LLMs and Pinecone to provide explainable churn predictions. Delivered real-time recommendations via FastAPI APIs, with Power BI monitoring dashboards connected through AWS Lambda.

Sales Analytics Pipeline on Azure Databricks

Implemented a Lakehouse architecture using AWS S3, Delta Lake, AWS Glue, and Azure Databricks. Designed parallelized PySpark ETL jobs and exposed aggregated sales metrics via Dockerized FastAPI endpoints consumed by Power BI.

Ambiguity Detection with Human-in-the-Loop Validation

Developed a machine learning system to detect ambiguous input cases and validate predictions via crowdsourced survey graphs. Human feedback was looped back to fine-tune the model iteratively, ensuring higher explainability and prediction trust.

Cowboy Connect – Volunteer Tracker & LMS with AI Chatbot

Built a full-stack web app to manage student volunteer hours and classroom lessons. Integrated a ChatGPT-style LLM chatbot using OpenAI for real-time help and query routing. Includes authentication, club management, and assignment tracking.

Experience

GenAI Application Developer

Oklahoma State University

09/2023 – 05/2025

  • Designed Python logic to classify, summarise, and answer university-policy queries with Generative AI.
  • Built FastAPI backend using Retrieval-Augmented Generation, OpenAI embeddings, and FAISS for context-aware responses.
  • Developed React.js frontend (CSS / HTML / Tailwind) enabling seamless interaction with the GenAI service.
  • Wrote pytest suites for API reliability and maintainability.
  • Containerised backend and frontend in Docker; deployed to Azure Kubernetes Service (AKS) for scalable, resilient ops.
  • Leveraged Azure Blob Storage for secure document and application data management.
  • Owned end-to-end Azure CI/CD pipeline from automated testing through production rollout.

Python Full Stack Developer

Cloudtaru

05/2022 – 07/2023

  • Developed scalable GenAI APIs/microservices with Flask & FastAPI on AWS, integrating OpenAI, RAG, embeddings, and FAISS.
  • Delivered interactive React.js front-ends and Tableau dashboards for rich analytics.
  • Engineered PySpark and Databricks SQL pipelines processing multi-terabyte datasets on AWS.
  • Integrated Spark ML and TensorFlow models into ETL flows for real-time anomaly detection.
  • Automated ETL pipelines with Airflow and Jenkins, improving reliability and reducing manual effort.
  • Collaborated with US product owners and offshore teams to convert business needs into technical solutions.

Big Data Engineer

Forsys Inc

05/2021 – 04/2022

  • Re-architected legacy ETL with Spark, PySpark, and Delta Lake on Hadoop, cutting compute costs 30%.
  • Managed large-scale data workflows via Hive, SQL, and advanced scripting for robust analytics.
  • Designed Tableau dashboards using SQL window functions, halving report runtimes.
  • Administered Hadoop clusters, Hive Metastore, and Unix shell workflows to maintain 99%+ data availability.
  • Integrated MongoDB, Oracle, and MySQL with modern data platforms, boosting data accessibility.

Certifications

AWS Certified Cloud Practitioner

Amazon Web Services

Tableau Desktop Specialist

Tableau

Contact Me