Jakub Szpunar

Data Scientist & Machine Learning Engineer

About Me

I'm a data scientist and machine learning engineer with expertise in building scalable ML systems and data pipelines. With a strong foundation in computer science and intelligent systems, I specialize in developing end-to-end solutions that leverage machine learning, deep learning, and cloud technologies to solve complex business problems.

My experience spans from developing high-performance data processing pipelines to implementing sophisticated machine learning models for classification and prediction tasks. I'm passionate about creating efficient, production-ready systems that deliver real business value.

Education

Master of Engineering in Computer Science and Intelligent Systems

AGH University of Krakow • 2024–Present

Bachelor of Engineering in Computer Science and Intelligent Systems

AGH University of Krakow • 2020–2024

Professional Experience

Data Scientist

May 2023–Present

Transmission Dynamics, Kraków

  • Built a high-performance time series and geospatial data processing pipeline from video processing IoT devices installed on trains. Containerized with Docker and deployed using GitHub Actions to a private server, ensuring reliability and high availability.
  • Designed and optimized database structures, identified performance bottlenecks, and enhanced query efficiency at scale, optimizing indexes and handling millions of records.
  • Developed high-accuracy deep learning models for electric arcing severity classification and pantograph detection, leveraging MobileNet, YOLO, and ResNet models.
  • Automated data reporting system using Jinja-based PDF templates and enhanced data analysis tools to monitor and predict railway infrastructure issues, improving operational monitoring and decision-making.

Junior Software Engineer

Mar 2023–Apr 2023

Grid Dynamics, Kraków

  • Conducted an in-depth evaluation of the search engine Vespa.ai to assess its capabilities for integrating machine learning models into advanced text and image search solutions, utilizing vector databases.

AI Search Intern

May 2022–Feb 2023

Grid Dynamics, Kraków

  • Earned AWS certification with hands-on experience in deploying and managing cloud-based services.
  • Designed and implemented a multi-module indexer and search service for an eCommerce platform using Java, Spring Boot, leveraging Elasticsearch for efficient indexing, querying, and performance optimization. Containerized services with Docker to streamline deployment and improve operational efficiency.

Projects

Product Reviews Sentiment Analysis
Python, Playwright, Pandas, Scikit-learn, Transformers
Ocular Disease Recognition
Python, TensorFlow, OpenCV, Pandas
Semanthica
Python, TypeScript, FastAPI, Angular, PostgreSQL, Meilisearch, Qdrant, Docker

Skills

Languages

Python
JavaScript
TypeScript
Java
C#

Cloud

AWS Certified (SAA-C03)
S3
EC2
RDS
IAM
CloudWatch

Machine Learning

TensorFlow
PyTorch
scikit-learn
Keras
Transformers
Computer Vision
NLP

Data Engineering

PostgreSQL
MySQL
MongoDB
Qdrant
Pandas

Tools & Workflow

Git
Docker
GitHub Actions
Postman
Jupyter
VS Code

Certifications