Portrait of Matome Mbowene

Matome Mbowene

Software & AI Engineer • OCR/CV • Retrieval (RAG) • Backend

Open to Software • AI/ML • Backend/Cloud Location Cape Town Work mode Hybrid / Remote

I build production-focused AI features that ship: document automation, retrieval pipelines, and services, measured, validated, and designed for reliability.

Contact Case studies Resume
Email LinkedIn GitHub Credly matomepontso@gmail.com

Based in Cape Town • Open to hybrid/remote roles

100%
OCR Field-Mapping Accuracy
89.33%
Model Accuracy (FashionMNIST)
35%
Scheduling Efficiency Gain
RAG repo CV repo All repos

Recent Experience

Computer vision & document automation

Recent
Research collaboration (geomatics)
  • Built production OCR + LLM system achieving 100% field-mapping accuracy on scanned surveying diagrams.
  • Implemented 6-layer validation, confidence scoring, and context-aware mapping with color-coded debug highlighting.
  • Created editable PDF form fields (AcroFields), robust fallbacks, and enterprise-grade logging.
Python OpenCV OCR PDF Forms LLM APIs

Retrieval systems (RAG)

Recent
Early-stage startup (equity-based)
  • Developed a RAG conversational agent using FAISS + sentence-transformers + LLM APIs to connect users to relevant information.
  • Delivered end-to-end pipeline: ingestion → embeddings → persistent index → retrieval → Streamlit demo.
  • Hardened CI/CD, dependency management, and production safeguards.
RAG FAISS sentence-transformers Streamlit CI/CD

Embedded / edge fundamentals

Recent
Mobility safety startup
  • Developing sensor fusion pipelines and edge-AI models for real-time hazard detection on motorcycles.
  • Contributing to firmware and data pipeline architecture for LiDAR, IMU, and camera integration.
Embedded Sensor Fusion Edge AI STM32

Skills

AI / ML

Computer vision, OCR, retrieval (RAG), and production-focused evaluation.

Backend / Data

APIs, validation, data modeling, and maintainable services.

Systems / Shipping

Pragmatic shipping discipline: version control, containers, automation, and reliability checks.

Technical Breadth

I’m comfortable moving across domains, from product-facing web systems to low-level reliability and production AI. This range helps me ship end-to-end solutions and communicate across teams.

AI / ML Systems

Computer vision, document automation, RAG pipelines, evaluation, and safe deployment patterns.

CVOCRRAGPyTorch

Backend / Web

APIs, data modeling, and systems that prioritize correctness, observability, and maintainability.

RESTSpring BootSQLTesting

Embedded / Edge

Latency-aware pipelines and hardware integration with an emphasis on signal integrity and robustness.

STM32Sensor FusionC/C++

Systems / Fundamentals

Algorithms, networking, and performance tradeoffs, useful when reliability and efficiency matter.

NetworkingSchedulingPerformance

Featured Projects

Filter by category, then open any card for a short case-study view.

NDA note (what I can share publicly)

Built to protect clients and teams
I can discuss
  • Problem framing, constraints, and tradeoffs
  • Architecture patterns and reliability practices
  • Validation, observability, and delivery workflow
I won’t publish
  • Client names, internal documents, or private metrics
  • Confidential datasets, prompts, or implementation details
  • Anything that violates NDA or privacy

OCR Document Automation

Production OCR + LLM pipeline for document-to-structured-data automation.

Python OpenCV LLMs
  • Outcome: 100% field-mapping accuracy with multi-layer validation.
  • Approach: OCR + LLM mapping with confidence scoring and explainable debug outputs.
  • Reliability: validation layers, fallbacks, and audit logging.
  • Stack: Python, OpenCV, OCR, PDF forms, LLM APIs.
Case study details
What I optimized for: correctness and traceability first. The pipeline uses layered validation (format, geometry, constraints, cross-field rules) so extraction failures are caught early and are explainable.
What I’d improve next: expand evaluation sets, add drift checks on input quality, and tighten “confidence-to-review” thresholds to reduce manual review time.

RAG AI Assistant

RAG assistant connecting and matching African entrepreneurs via fast retrieval + LLM responses.

Python FAISS Streamlit
  • Outcome: fast, relevant retrieval-backed answers for a focused knowledge base.
  • Approach: ingestion → embeddings → persistent index → retrieval → generation.
  • Reliability: deterministic indexing, persistence, and guardrails when retrieval is weak.
  • Stack: Python, FAISS, sentence-transformers, Streamlit, LLM APIs.
Case study details
Focused on reproducibility and safety: deterministic indexing, persistence, and guardrails to avoid low-confidence or hallucinated outputs where retrieval was weak.

Embedded Navigation

Sensor fusion + edge-AI foundations for real-time hazard detection.

C++ STM32 Sensor Fusion
  • Outcome: foundations for real-time hazard detection under latency and power constraints.
  • Approach: sensor fusion pipeline design and edge-AI model integration.
  • Reliability: timing integrity, robust data handling, and fault-tolerant interfaces.
  • Stack: C/C++, STM32, LiDAR/IMU/camera integration.
Case study details
Emphasis on signal integrity and latency. Contributed to firmware/data architecture decisions to ensure consistent sensor timing and robust downstream consumption.

Confidential AI Product Build (NDA)

Ongoing work on an AI-driven product with details limited by NDA.

Architecture Backend LLMs CI/CD
  • Outcome: production-ready AI product foundations and measurable delivery cadence.
  • Approach: system design, backend integration, and safe LLM usage patterns.
  • Reliability: environment hardening, CI/CD safeguards, and operational readiness.
  • Stack: LLM APIs, backend services, CI/CD.
  • Note: details limited by NDA; happy to discuss at a high level.

FashionMNIST Classifier

Neural network image classifier with an end-to-end training and evaluation pipeline.

PyTorch CV Evaluation
  • Outcome: 89.33% test accuracy on FashionMNIST.
  • Approach: training pipeline with preprocessing and reproducible runs.
  • Reliability: metrics, validation, and clear experiment tracking.
  • Stack: Python, PyTorch.
Case study details
What this demonstrates: the full ML loop (data → training → evaluation) with repeatability.
What I’d improve next: add calibration, stronger baselines, and automated experiment tracking to make comparisons faster and more robust.

MyAdvisor (Full‑Stack Web App)

Full-stack web app emphasizing usability, data integrity, and maintainable APIs.

Spring Boot REST MySQL
  • Outcome: improved workflow efficiency and clearer user journeys.
  • Approach: API-first design with solid schema and query patterns.
  • Reliability: validation, error handling, and iterative feedback.
  • Stack: Java, Spring Boot, MySQL.

Scheduling & Networking Systems

Hands-on systems work: scheduling simulations, protocol design, and integrity checks.

Java Python Systems
  • Outcome: measurable efficiency gains through algorithm comparison.
  • Approach: benchmark-driven analysis with clear metrics.
  • Reliability: deterministic tests and integrity verification patterns.
  • Stack: Java, Python, sockets, hashing.

Project timeline (high level)

A quick view of how my work evolved across domains. Details are intentionally high-level where needed.

Full‑stack foundations Web • APIs • Databases
Shipped web app features with API design, data modeling, and UX iteration.
Systems + performance Scheduling • Networking
Benchmarked algorithms and built integrity-first networking components.
Modeling + evaluation PyTorch • CV
Developed training pipelines with reproducible evaluation and clear metrics.
Production AI systems OCR • RAG • Reliability
Built document automation and retrieval pipelines with guardrails and operational readiness.

Certifications

Google Cloud Skill Badges

Verified cloud learning and hands-on labs focused on practical, production-relevant skills.

  • Multimodal RAG (Gemini) (view on Credly)

Programs

Selected programs that strengthened leadership, communication, and technical execution.

Writing

Short, public-safe notes on how I build and ship reliable systems.

Validation-first OCR: why “accuracy” isn’t enough

In production OCR, errors often look “plausible” and silently poison downstream systems. My default is validation-first extraction: explicit constraints, cross-field rules, confidence thresholds, and human-review hooks. The goal is not just extracting text, but producing outputs you can trust and audit.

RAG guardrails: being useful without hallucinating

A good RAG system is as much about “when to refuse” as it is about retrieval. I focus on deterministic indexing, conservative thresholds when evidence is weak, and clear provenance (links back to sources). This keeps the assistant helpful while staying honest and grounded.

Shipping discipline: small checks prevent big failures

What helps me move fast safely: reproducible environments, dependency hygiene, automated checks, and clear “definition of done”. Even lightweight CI, good error handling, and predictable releases reduce firefighting and increase delivery cadence.

About

How I work

I build end-to-end AI features that hold up in production: ingestion, validation, retrieval, model integration, observability, and delivery. My focus is correctness and reliability: clear interfaces, defensive checks, and measurable outcomes.


What I’m looking for

Software engineering, AI/ML engineering, and backend/cloud roles. Open to hybrid or remote opportunities.

Education (approved)

University of Cape Town (Computer Science & Computer Engineering)


Core strengths

Production OCR Computer Vision RAG Backend services CI/CD Reliability

Get In Touch

Note: this form opens your email client with a pre-filled message (no server, no data storage).

Open to opportunities