Assitant-Tiya / DS_portfolio.md
dhanvanth183's picture
Update DS_portfolio.md
f0ee547 verified

A newer version of the Streamlit SDK is available: 1.52.1

Upgrade

RESUME

Personal Info

Full Legal Name: Venkata Dhanvanthar Murthy Voona
Current Street address: 115 Belmont Ave, 07304 Current City: Jersey City, NJ, United States
Contact Number: 224-619-6584
Country Code: +1
Email-id: dhanvanthv28@gmail.com
Linkedin: https://www.linkedin.com/in/dv-63192b18b/
Social Security Number: 210-17-5700 Github: https://github.com/dhanvanth342
Portfolio: https://www.datascienceportfol.io/vvoona
Expected Salary: 75,000 to 85,000 USD {mention this salary range ONLY when no salary range is mentioned in the Job description}
Are you legally authorized to work in the United States? (Yes/No): Yes
**Current Visa Status: ** I am currently on my initial OPT (F1 visa) and authorized to work until July 16, 2026. After that, I am eligible for a two-year STEM OPT extension starting on July 16, 2026. I will not require visa sponsorship while on OPT or STEM OPT. ** Citizenship: ** Indian, not U.S. Citizen. Will you now or in the future require employer sponsorship (e.g., H-1B)? (Yes/No): Yes
Gender: Male
Race/Ethnicity: South Asian / Asian
Veteran status: Not Veteran
Voluntary self-identification of disability: Not disable
Preferred Locations: I am open to Relocate, while these are by preferred locations, New York, Jersey City, Chicago, Seattle, Boston, Tampa. [If locations given in the job description, mention them instead of my preferences.] Relatives Information: No relative, family, friend or anyone is working in any company in USA. {So answer that I do not have any relatives when question asked about it} ** Stem Degree: ** Completed both my Masters {Data Science} and Bachelors {Electronics and Communication Engineering} in STEM Degree.

SUMMARY

Detail-oriented and innovative Data Scientist with over 3 years of experience delivering end-to-end analytics solutions across marketing, customer behavior, and AI-driven product development. Experienced in generative AI frameworks and retrieval-augmented workflows to enhance content reliability and user engagement. Skilled at building scalable data pipelines using Spark and Hive and deploying machine-learning models such as XGBoost and neural networks in Sagemaker.

SKILLS

  • Programming & Cloud Services: Python, R, MATLAB, SQL, GCP, Amazon S3, Sagemaker, Athena, Glue, EC2, ECS, Bedrock.
  • Data Engineering & ETL: Apache Hive, Fast API, Excel, ETL pipeline development, data integration (batch & streaming), Amazon Redshift, Event Bridge, Apache Spark, Apache Hadoop, Data Bricks, Azure Data Factory, GCP Dataflow, Git.
  • Modeling & Analytics: Predictive modeling & statistical analysis (regression, A/B testing), machine learning (scikit-learn, XGBoost), deep learning (TensorFlow, PyTorch), NLP (NLTK), GenerativeAI (Langchain, LangGraph, CrewAI), Computer Vision
  • Visualization & BI Platforms: Power BI (DAX, Power Query), Tableau, Google Charts, dashboard design & storytelling, reporting automation (Excel macros, SQL Reporting Services)

WORK EXPERIENCE

AI/ML Data Scientist (MAY 2025 – Present)

AIDO, Chicago, Illinois About AIDO: AIDO presents itself as an AI-driven platform for international enrollment in higher education. According to their website, they help universities with automation, data insights and recruitment of international students. My experience:

  • Prototyped and productionized domain-specific AI agents for automated information gathering and insight generation using LLMs, FastAPI, and serverless workflows, enabling business teams to self-serve and reducing Data Engineering involvement by 60%.
  • Engineered and optimized hybrid retrieval pipelines with adaptive chunking, searching strategies, improving retrieval accuracy by 40% across 150k+ samples, supporting high-performance insight generation at scale.
  • Led cross-functional collaboration with business and platform teams to design, iterate, and deploy AI solutions, driving end-to-end integration, observability, and continuous improvement on cloud infrastructure.

Data Scientist (JAN – MAY 2025)

LabelMaster, Chicago, Illinois About LabelMaster: Labelmaster manages a B2B customer base — shippers, manufacturers, and logistics partners who rely on them for dangerous goods compliance tools, packaging materials, and software services. These customers often interact across multiple channels (website, CRM, email campaigns, trade shows, and training). My experience:

  • Deployed Apache Spark to merge 2 million rows of web analytics with CRM datasets, accelerating data pipeline performance by 40% and enriching customer behaviour insights.
  • Conducted SEO analytics with Pandas, NumPy, and SQL to evaluate ad campaign performance, reallocating budgets to top-performing campaigns, leading to a 15% boost in clicks and impressions, and enhancing online visibility and engagement.
  • Executed K-means for customer segmentation on 250K accounts using sales, email, and web interaction data, developing tailored marketing strategies that reduced costs by 28%, increased sales by 12%, and enhanced email performance by 23%.

Data Scientist (JAN 2022 – FEB 2024)

Liminal XR Solutions, Mumbai, India About Liminal: Liminal XR Solutions is an Indian agency based in Mumbai specializing in extended reality (XR) services: augmented reality (AR), virtual reality (VR), mixed reality (MR) and web-XR. They handled clients HP, Capgemini, Hero to work on customer journey in their product pages of websites. My experience:

  • Analysed customer data with Python and Power BI, identifying a new market opportunity that expanded customer base by 10% through development of a new product line.
  • Utilized Apache Hive to execute SQL-like queries on HDFS-stored data, optimizing partitioning strategies to cut query execution time by 50% when analyzing datasets with 600K plus records.
  • Built an AI Agentic RAG framework with Quadrant Vector DB for a restaurant recommender system using client-sourced data.
  • Developed a sentiment analysis pipeline with Python to examine user feedback from XR applications, pinpointing key areas for improvement and driving data-driven enhancements that boosted user engagement by 15% within 6 months.
  • Built, trained, and deployed machine learning models (Random Forest, XGBoost) in AWS SageMaker for customer churn prediction, enabling proactive retention strategies that improved customer retention by 20%.

Machine Learning Research Intern (JUN 2021– FEB 2022)

Samsung, Prism, Chennai About Prism Program at Samsung: The Samsung PRISM program (full name: Samsung PRISM (Preparing and Inspiring Student Minds)) is an industry‐academia initiative launched by Samsung’s Bangalore R&D centre (SRI-B) to engage engineering college students and faculty in real R&D projects across topics like AI, machine learning (ML), IoT and 5G. Out of 4000 students competed from my university, I was one among the 60 students they have selected for this program. My experience:

  • Researched on "AI Based Reflection Scene Category Classification" in Pytorch environment and achieved an accuracy of 93% in identifying different types of reflections in real time for a dataset of 100,000 plus images.
  • Built a hybrid neural network and applied pruning technique to reduce inference time by 60%, achieving a 93% classification accuracy, comparable to state of-the-art (SOTA) models [EfficientNet, MobileNetV3] for real-time reflection scene classification.

PROJECTS

Enhancing RAG: unstructured data Extraction and Vector DB Evaluation (Timeline: MAY 2025 - Present)

  • Developed an end-to-end framework using Docling to extract content from PDFs and images, including hierarchical metadata such as headings, titles, and table names; upserted data into a self-hosted Qdrant collection, improving PDF querying accuracy by 25% and reducing retrieval and storage costs by 100%.
  • Researched, designed, and developed an evaluation framework to assess relevance of retrieved chunks across various vector databases and identify the optimal upsertion method; framework leverages bagging of metrics including cosine similarity, BERTScore, context relevance, and faithfulness.
  • Currently developing a novel chunk upsertion method focused on dense and partial sparse embeddings, projected to reduce overall sparse embedding storage by 95%.

Text2Block (Timeline: OCT 2024 - OCT 2025)

  • “A picture can speak a thousand words” — that’s the inspiration behind creating Text2Block.
  • Designed and launched Text2Block, a GenAI application that transforms plain text into AI-driven flowchart visualizations with noise-free embedded text for enhanced clarity and improved learning. The platform aims to leverage AI for education, attracting 450 unique users and processing 3,000 requests within its first two weeks.
  • Integrated GA4 with GTM tags for A/B testing to assess user preferences, increasing website interactions by 45%.
  • Streamlined RAG pipeline by orchestrating a chain of LLMs within LangChain framework to generate Python programming course materials, reducing hallucination in generated content by 66%.

EDUCATION

Master of Data Science (Timeline: AUG 2023 - MAY 2025)

Major of Study in Computer Science and Mathematics
Illinois Institute of Technology, Chicago, IL

  • Graduate Pathway Scholarship Awardee ($ 10,000) awarded for Academic Merit at Illinois Institute of Technology.
  • Coursework – Machine Learning, Database Organization, Generative AI, Data Preparation & Analysis, Big Data Technologies.

Bachelor of Technology in Electronics and Communication Engineering (Timeline: July 2019 - APR 2023)

Vellore Institute of Technology, Chennai, TN

  • Published a research paper in fields of Deep Learning, Computer Vision, and NLP in journal Applied Sciences.

Additional Info

RESEARCH

Over-volume vehicle classification using Deep CNN models (Timeline: DEC 2021- MAY 2022)

  • Collected real-time image data and performed image classification to keep a check on over-volume vehicles in the absence of human surveillance to assist the commuters in rural and terrain areas.
  • Noticed an increase in performance by 12% after performing transfer learning, fine tuning and hyper-parameter tuning.
  • Achieved an accuracy of 96% using EfficientNet model and published our work in applied sciences journal.

Tech Stack Used: Deep learning, Image augmentation, Computer Vision, Pytorch, Tensorflow, Team Leader, Neural Network, hyperparameter tuning, Google Colab, TPU Training, K-fold Cross Validation

Nanyang Technological University (NTU) (Timeline: AUG 2022 - AUG 2023)

  • Conducted a multimodal analysis on memes, achieving a 89% accuracy in emotion recognition by integrating OCR for text extraction and deep learning for image analysis within a TensorFlow environment.
  • Led research on facial emotion recognition for classroom applications, utilizing advanced computer vision techniques to enhance real-time student engagement monitoring.
  • Developed and deployed a first-of-its-kind classroom emotion dataset, reducing cost of data collection by 100% through efficient web scraping, landmark detection, and facial unit mapping processes.
  • Achieved a significant classification accuracy of 82.73% for complex emotions like boredom and frustration in classroom settings by integrating dataset with state-of-the-art models using TensorFlow/Keras.
  • Published the research at The European Conference on Education 2023 Official Conference Proceedings.

Tech Stack Used: Deep learning, Image augmentation, Data Collection, Web Scraping, Model Building from Scratch, Computer Vision, NLP, Multi-modal training, Pytorch, Tensorflow, Team Leader, Neural Networks, hyperparameter tuning, Google Colab, TPU Training, K-fold Cross Validation, AWS sagemaker.

Further Clarifying information about myself I would like to provide to be considered in this application

Beyond what's listed on my resume, I'd highlight my adaptability and drive to quickly learn and apply new technologies as one of my core strengths. In every project, I make it a point to go beyond comfort zones — whether that means picking up a new cloud service, experimenting with orchestration frameworks, or rethinking how a pipeline should be structured for scale.

At Labelmaster, for example, I joined a project midstream where the data infrastructure and model deployment processes were still evolving. I quickly familiarized myself with their tech stack, optimized workflows for large datasets, and contributed to refining model performance and insight delivery timelines. That experience reinforced my ability to learn systems from the inside out, align them with business goals, and deliver measurable impact under tight deadlines.

As a Data Scientist, I take pride in bridging experimentation and production. I adapt to whatever tools best serve the problem — from designing model pipelines in AWS to integrating explainability and monitoring layers. This adaptability has consistently allowed me to convert complex data challenges into clear, actionable results that move projects forward.