Resume

My experience, education, and professional skills

Download PDF

Do Hoang Vu

AI Engineer

Hoang Van Thu Street

Ho Chi Minh City, Phu Nhuan District

Vietnam

Education

The Saigon International University

Bachelor of Artificial Intelligence

Expected graduation: 2027

GPA: 3.62/4.00

Certifications

Machine Learning - Deep Learning Foundation

cole.vn

Exploratory Data Analysis for Machine Learning

IBM - Coursera

Machine Learning Specialization

Stanford - Coursera

Languages

EnglishIntermediate
VietnameseNative

Projects & Competitions

AcaRead: AI-Powered Academic Assessment System

An advanced educational platform that transforms academic papers (PDFs) into interactive IELTS-style reading exams using Gemini 2.5 Flash, bridging passive reading and active comprehension assessment.

  • AI-Driven Exam Generation: Utilizes LLM to analyze text logic, arguments, and generate distractor-based questions.
  • Authentic Material Processing: Handles complex PDF layouts including multi-column structures and citations via Docling and PyMuPDF.
  • Interactive Exam Engine: Features a split-screen 'Focused Mode' for distraction-free reading and real-time state management.
  • Scientific Validation: Implements a multi-level evaluation framework based on CEFR B2-C1 standards and NLP metrics.
  • Modern Cyberpunk UI: Designed with glassmorphism, neon accents, and micro-animations for a premium user experience.

TwinSelf - Digital Twin Chatbot

A sophisticated RAG-based chatbot system using a triple-memory architecture (Semantic, Episodic, Procedural). This system serves as the intelligence behind the AI Chatbot on this portfolio, providing an authentic digital twin experience.

  • Portfolio Integration: Currently acting as the core engine for this website's personal chatbot assistant.
  • Cognitive Memory System: Implements Semantic (facts), Episodic (experience), and Procedural (behavioral rules) memory layers.
  • Vector Search: Highly efficient retrieval using Qdrant vector database and specialized Vietnamese document embeddings.
  • MLOps Integration: Comprehensive tracking with MLflow and automated response quality evaluation using DeepEval.
  • Version Control: Robust data management system supporting snapshots, versioning, and rollback of knowledge bases.

Face Attendance: Kiosk-Based Recognition & Anti-Spoofing

A modular kiosk-based face recognition system for attendance and access control, featuring facial alignment, quality gating, and anti-spoofing mechanisms for near real-time biometric verification.

  • Modular Vision Pipeline: Integrates face detection, 2D similarity transform alignment, and identity matching into a unified flow.
  • Quality Gate System: Automatically filters frames affected by blur, low contrast, or poor lighting to maintain high embedding accuracy.
  • Anti-Spoofing Mechanism: Implements verification layers to detect printed or digital attack attempts in various lighting conditions.
  • High Performance: Optimized to run on limited hardware with a latency of 1-2 seconds from subject detection to result.
  • Robust Data Processing: Standardizes input via similarity transforms, significantly reducing intra-class variation for stable recognition.

Ready4RAG: High-Precision Dual-Layer RAG Pipeline

A next-generation ingestion system converting complex PDFs into value using Vision LLMs and Hybrid Memory (Vector + Graph) for grounded, high-precision answers.

  • Vision-Powered Extraction: Converts PDF to Markdown with near-perfect layout preservation using multimodal Vision LLMs.
  • Dual-Layer Memory: Hybrid storage system using Qdrant for vector similarity and NetworkX for graph-based reasoning.
  • Auto-Graph Construction: Automatically extracts entities (People, Locations, Concepts) and relationships to build a knowledge graph.
  • Hybrid Chatbot Engine: Interactive interface that retrieves and merges context from both vector and graph layers.
  • Multi-Provider Infrastructure: Plug-and-play support for Google Gemini, OpenAI, Groq, and local Ollama models.

Deeplearning-Practice

A comprehensive collection of deep learning implementations coded from scratch, covering everything from computer vision to natural language processing. This repository showcases high-quality implementations of fundamental and advanced deep learning algorithms.

  • Computer Vision models including CNN architectures and ResNet implementations
  • Natural Language Processing models including LSTM with attention mechanisms
  • Regression models with detailed logging for housing price prediction
  • Sentiment analysis on IMDb reviews with deep learning approaches
  • Classification models for Vietnamese news articles
  • All implementations feature clean, well-documented code with detailed explanations

EzClip

A powerful desktop application designed to effortlessly download videos from various online platforms including YouTube, Facebook, and TikTok. Built with Electron.js and leverages yt-dlp for wide format support.

  • Support for multiple platforms including YouTube, Facebook, TikTok
  • Download videos in various formats and resolutions
  • Simple and intuitive user interface with modern design
  • Offline functionality - no server required
  • Built with Electron.js for cross-platform compatibility

Decision Tree Visualization

A web application for building and visualizing decision trees from CSV data. Features include customizing model parameters, interactive visualization of decision trees, and performance metrics calculation.

  • Built with Python, FastAPI, and scikit-learn for the backend
  • Interactive UI with HTML, CSS, JavaScript, and TailwindCSS
  • Supports custom model parameters (max depth, min samples split, criterion)
  • Visualizes decision trees as hierarchical structures
  • Calculates and displays model evaluation metrics

TARS: SOICT 2025

Co-first authored a research paper accepted at SOICT 2025 presenting TARS (Temporal Alignment Retrieval System), a training-free order-aware framework for multi-segment video event retrieval using monotonic dynamic programming alignment over vision-language encoders.

  • Query decomposed into ordered sub-event sequences embedded by complementary vision-language encoders.
  • Monotonic DP alignment finds the best ordered path on the frame-subevent similarity matrix with O(nm) time and O(m) memory.
  • Training-free design requires no additional dataset-specific training beyond base encoders, ensuring robustness under domain shift.
  • Integrates cleanly with standard two-stage candidate retrieval and re-ranking pipelines.
  • Demonstrated 93.15% accuracy on the Ho Chi Minh City AI Challenge 2025 benchmark.

Viettel AI Race 2025

Competed solo in the NLP track of Viettel AI Race 2025, building a pipeline for complex PDF table extraction, numerical data retrieval, and multiple-choice QA. Achieved Top 5 in Round 1 as a solo competitor among team-based entries.

  • Complex PDF table extraction preserving multi-level headers and merged cells via coordinate-based bounding box grouping.
  • Numeric normalization pipeline handling diverse formatting conventions across financial and technical document types.
  • Chain-of-thought prompting for multi-step numerical reasoning over extracted tabular data.
  • Constrained answer selection strategy reducing hallucination on borderline numerical comparisons.

AIC25 COMPETITION

Led team SIU Cerberus at Ho Chi Minh City AI Challenge 2025, deploying the TARS (Temporal Alignment Retrieval System) framework for order-aware video event retrieval. Achieved 93.15% accuracy on the benchmark with a training-free monotonic DP alignment approach.

  • Designed and led implementation of TARS: a training-free, order-aware video retrieval system decomposing queries into ordered sub-event sequences.
  • Monotonic dynamic-programming alignment enforcing strict temporal ordering at inference time with O(nm) time and O(m) memory complexity.
  • Complementary vision-language encoders reducing sensitivity to individual encoder weaknesses across diverse query types.
  • Two-stage retrieve-then-rerank pipeline: FAISS coarse retrieval followed by TARS re-ranking for precision-latency balance.

ZALO AI CHALLENGE 2023

Built deep generative models for symbolic and audio-based music generation using Transformer-based architectures. Fine-tuned temporal coherence and structural consistency to align with evaluation metrics in generative audio tasks.

AIC24 COMPETITION

Developed scalable video understanding pipelines for event retrieval, leveraging contrastive learning and multimodal embeddings to enhance temporal-semantic alignment in untrimmed video datasets.

  • Integrated multi-head self-attention, temporal convolutional networks, and cross-modal fusion to improve mAP and retrieval latency in benchmark datasets.

VIZQUEST: ENHANCED VIDEO EVENT RETRIEVAL USING FUSION AND TEMPORAL MODELING

Co-authored a research paper accepted at SOICT24, introducing a novel framework combining spatio-temporal attention with hierarchical feature fusion to optimize long-range video event detection.

Technical Skills

Programming

Python
JavaScript
SQL
Bash
C/C++

Libraries/Frameworks

PyTorch
LangChain
FastAPI
Transformers
MLFlow
Node.js
Next.js

AI Expertise

LLMs
Multi-Agent Systems
Explainable AI
Graph-RAG
MLOps
Computer Vision

Tools

Git
Github Actions
Docker
Cloudflared
VScode

Soft Skills & Interests

Soft Skills

Analytical & Problem-solvingEffective collaborationPersonable communicationPresentationTime managementResponsible AI Usage

Interests

Generative AILLMsMulti-Agent SystemsMLOpsVoice TransformationMathematics & PhysicsAI Ethics