Nikolaos Giakoumoglou

Nikolaos Giakoumoglou

Research Postgraduate, Imperial College London, South Kensington, United Kingdom

Biography

Nikolaos Giakoumoglou is currently a postgraduate researcher at Imperial College London in the Department of Electrical and Electronic Engineering (EEE) in the Communications and Signal Processing (CSP) group, where he is pursuing his PhD under the guidance of Professor Tania Stathaki. Before his current role, he worked as a Research Assistant at the Centre for Research and Technology Hellas (CERTH) in the Information Technologies Institute (ITI) department. He obtained his Diploma in Electrical and Computer Engineering in 2021 from the Department of Electrical & Computer Engineering at Aristotle University of Thessaloniki. Nikolaos' research is primarily focused on Artificial Intelligence, Machine Learning, and Deep Learning, with a special interest in applications within the field of Computer Vision.

News

11 March 2026: Started working on Joint Embedding Predictive Architectures following the vision of Yann LeCun
1 February 2026: Started working on Vision-Language Models
13 January 2026: Paper accepted (oral) at ISBI 2026 (London, UK)
19 December 2025: Paper accepted at VISAPP 2026 (Marbella, Spain)
4 November 2025: Paper accepted at MedEurIPS 2025 (Copenhagen, Denmark)
26 August 2025: Paper accepted at ICCVW 2025 (Hawaii, US)
20 May 2025: Paper accepted at ICIP 2025 (Alaska, US)
20 Apr 2025: Paper accepted at CVPRW 2025 (Nashville, US)
2 Apr 2025: Paper presented at Imperial Research Computing Showcase Day 2025 (London, UK)

Research Interests

Education

Notes

Accepted Papers for Publication

Expert Clustering and Knowledge Transfer for Whole Slide Image Classification

K. M. Papadopoulos, N. Giakoumoglou, A. Floros, P. L. Dragotti, T. Stathaki

Accepted for presentation at ISBI 2026 Main Conference (Oral)

Multiple Instance Learning (MIL) is widely adopted for Whole Slide Image (WSI) classification. Existing MIL methods suffer from representation bottlenecks where slide-level aggregation compresses diverse patch information, limiting performance. Our proposed Divide-and-Distill (D&D) framework addresses this by partitioning the feature space into representation-coherent clusters, training specialized expert models on each cluster, and distilling their collective knowledge into a unified model. This reduces information compression loss while maintaining inference efficiency. Experiments across three datasets and six MIL methods demonstrate consistent performance gains without added inference cost.

A Multimodal Approach for Cross-Domain Image Retrieval

L. Iijima, N. Giakoumoglou and T. Stathaki

Accepted for presentation at VISAPP 2026 Main Conference (Poster)

Cross-Domain Image Retrieval (CDIR) is a challenging task in computer vision, aiming to match images across different visual domains such as sketches, paintings, and photographs. Traditional approaches focus on visual image features and rely heavily on supervised learning with labeled data and cross-domain correspondences, which leads to an often struggle with the significant domain gap. This paper introduces a novel unsupervised approach to CDIR that incorporates textual context by leveraging pre-trained vision-language models. Our method, dubbed as Caption-Matching (CM), uses generated image captions as a domain-agnostic intermediate representation, enabling effective cross-domain similarity computation without the need for labeled data or fine-tuning. We evaluate our method on standard CDIR benchmark datasets, demonstrating state-of-the-art performance in unsupervised settings with improvements of 24.0% on Office-Home and 132.2% on DomainNet over previous methods. We also demonstrate our method's effectiveness on a dataset of AI-generated images from Midjourney, showcasing its ability to handle complex, multi-domain queries.

Mitigating Representation Bottlenecks in Multiple Instance Learning

K. M. Papadopoulos, N. Giakoumoglou, A. Floros, T. Stathaki

Accepted for presentation at NeurIPS 2025 Workshop "Medical Imaging meets EurIPS (MedEurIPS)"

Multiple Instance Learning (MIL) is widely used for Whole Slide Image classification in computational pathology, yet existing approaches suffer from a representation bottleneck where diverse patch-level features are compressed into a single slide-level embedding. We propose Divide-and-Distill (D&D), which clusters the feature space into coherent regions, trains expert models on each cluster, and distills their knowledge into a unified model. Experiments demonstrate that D&D consistently improves six state-of-the-art MIL methods in both accuracy and AUC while maintaining single-model inference efficiency.

Cluster Contrast for Unsupervised Visual Representation Learning

N. Giakoumoglou, T. Stathaki

Accepted for presentation at ICIP 2025 Main Conference

We introduce Cluster Contrast (CueCo), a novel approach to unsupervised visual representation learning that effectively combines the strengths of contrastive learning and clustering methods. Inspired by recent advancements, CueCo is designed to simultaneously scatter and align feature representations within the feature space. This method utilizes two neural networks, a query and a key, where the key network is updated through a slow-moving average of the query outputs. CueCo employs a contrastive loss to push dissimilar features apart, enhancing inter-class separation, and a clustering objective to pull together features of the same cluster, promoting intra-class compactness. Our method achieves 91.40% top-1 classification accuracy on CIFAR-10, 68.56% on CIFAR-100, and 78.65% on ImageNet-100 using linear evaluation with a ResNet-18 backbone. By integrating contrastive learning with clustering, CueCo sets a new direction for advancing unsupervised visual representation learning.

Training Self-Supervised Vision Transformers with Synthetic Data and Synthetic Hard Negatives

N. Giakoumoglou, A. Floros, K. M. Papadopoulos, T. Stathaki

Accepted for presentation at ICCV 2025 Workshop "Representation Learning with Very Limited Resources: When Data, Modalities, Labels, and Computing Resources are Scarce"

This paper does not introduce a new method per se. Instead, we build on existing self-supervised learning approaches for vision, drawing inspiration from the adage "fake it till you make it". While contrastive self-supervised learning has achieved remarkable success, it typically relies on vast amounts of real-world data and carefully curated hard negatives. To explore alternatives to these requirements, we investigate two forms of "faking it" in vision transformers. First, we study the potential of generative models for unsupervised representation learning, leveraging synthetic data to augment sample diversity. Second, we examine the feasibility of generating synthetic hard negatives in the representation space, creating diverse and challenging contrasts. Our framework—dubbed Syn2Co—combines both approaches and evaluates whether synthetically enhanced training can lead to more robust and transferable visual representations on DeiT-S and Swin-T architectures. Our findings highlight the promise and limitations of synthetic data in self-supervised learning, offering insights for future work in this direction.

Unsupervised Training of Vision Transformers with Synthetic Negatives

N. Giakoumoglou, A. Floros, K. M. Papadopoulos, T. Stathaki

Accepted for presentation at CVPR 2025 Workshop "Second Workshop on Visual Concepts"

This paper does not introduce a novel method per se. Instead, we address the neglected potential of hard negative samples in self-supervised learning. Previous works explored synthetic hard negatives but rarely in the context of vision transformers. We build on this observation and integrate synthetic hard negatives to improve vision transformer representation learning. This simple yet effective technique notably improves the discriminative power of learned representations. Our experiments show performance improvements for both DeiT-S and Swin-T architectures.

Under Review

Discriminative and Consistent Representation Distillation

N. Giakoumoglou and T. Stathaki

Under review...

What Makes Pretraining Data Good for Self-Supervised Learning?

N. Giakoumoglou, A. Floros, K. M. Papadopoulos, T. Stathaki

Under review...

Open-World Semantic Segmentation with Sensitivity Modeling

A. R. Varvarigos, N. Giakoumoglou, T. Stathaki

Under review...

A Review on Discriminative Self-supervised Learning Methods

N. Giakoumoglou, T. Stathaki, A. Gkelias

Under review...

A Review on Artificial Intelligence Methods for Plant Disease and Pest Detection

N. Giakoumoglou, D. Kapetas, K. M. Papadopoulos, P. Christakakis, T. Stathaki, E. M. Pechlivani

Under review...