Nikolaos Giakoumoglou

Nikolaos Giakoumoglou

Research Postgraduate, Imperial College London, South Kensington, United Kingdom

Biography

Nikolaos Giakoumoglou is currently a postgraduate researcher at Imperial College London in the Department of Electrical and Electronic Engineering (EEE) in the Communications and Signal Processing (CSP) group, where he is pursuing his PhD under the guidance of Professor Tania Stathaki. Before his current role, he worked as a Research Assistant at the Centre for Research and Technology Hellas (CERTH) in the Information Technologies Institute (ITI) department. He obtained his Diploma in Electrical and Computer Engineering in 2021 from the Department of Electrical & Computer Engineering at Aristotle University of Thessaloniki. Nikolaos' research is primarily focused on Artificial Intelligence, Machine Learning, and Deep Learning, with a special interest in applications within the field of Computer Vision.

News

26 August 2025: Paper accepted at ICCVW 2025 (Hawaii, US)
20 May 2025: Paper accepted at ICIP 2025 (Alaska, US)
20 Apr 2025: Paper accepted at CVPRW 2025 (Nashville, US)
2 Apr 2025: Paper presented at Imperial Research Computing Showcase Day 2025 (London, UK)

Research Interests

Education

Accepted Papers for Publication

Fake & Square: Training Self-Supervised Vision Transformers with Synthetic Data and Synthetic Hard Negatives

N. Giakoumoglou, A. Floros, K. M. Papadopoulos, T. Stathaki

Accepted for presentation at ICCV 2025 Workshop "Representation Learning with Very Limited Resources: When Data, Modalities, Labels, and Computing Resources are Scarce"

This paper does not introduce a new method per se. Instead, we build on existing self-supervised learning approaches for vision, drawing inspiration from the adage "fake it till you make it". While contrastive self-supervised learning has achieved remarkable success, it typically relies on vast amounts of real-world data and carefully curated hard negatives. To explore alternatives to these requirements, we investigate two forms of "faking it" in vision transformers. First, we study the potential of generative models for unsupervised representation learning, leveraging synthetic data to augment sample diversity. Second, we examine the feasibility of generating synthetic hard negatives in the representation space, creating diverse and challenging contrasts. Our framework—dubbed Syn2Co—combines both approaches and evaluates whether synthetically enhanced training can lead to more robust and transferable visual representations on DeiT-S and Swin-T architectures. Our findings highlight the promise and limitations of synthetic data in self-supervised learning, offering insights for future work in this direction.

Cluster Contrast for Unsupervised Visual Representation Learning

N. Giakoumoglou, T. Stathaki

Accepted for presentation at ICIP 2025 Main Conference

We introduce Cluster Contrast (CueCo), a novel approach to unsupervised visual representation learning that effectively combines the strengths of contrastive learning and clustering methods. Inspired by recent advancements, CueCo is designed to simultaneously scatter and align feature representations within the feature space. This method utilizes two neural networks, a query and a key, where the key network is updated through a slow-moving average of the query outputs. CueCo employs a contrastive loss to push dissimilar features apart, enhancing inter-class separation, and a clustering objective to pull together features of the same cluster, promoting intra-class compactness. Our method achieves 91.40% top-1 classification accuracy on CIFAR-10, 68.56% on CIFAR-100, and 78.65% on ImageNet-100 using linear evaluation with a ResNet-18 backbone. By integrating contrastive learning with clustering, CueCo sets a new direction for advancing unsupervised visual representation learning.

Unsupervised Training of Vision Transformers with Synthetic Negatives

N. Giakoumoglou, A. Floros, K. M. Papadopoulos, T. Stathaki

Accepted for presentation at CVPR 2025 Workshop "Second Workshop on Visual Concepts"

This paper does not introduce a novel method per se. Instead, we address the neglected potential of hard negative samples in self-supervised learning. Previous works explored synthetic hard negatives but rarely in the context of vision transformers. We build on this observation and integrate synthetic hard negatives to improve vision transformer representation learning. This simple yet effective technique notably improves the discriminative power of learned representations. Our experiments show performance improvements for both DeiT-S and Swin-T architectures.

Under Review

SynCo: Synthetic Hard Negatives for Contrastive Visual Representation Learning

N. Giakoumoglou and T. Stathaki

Contrastive learning has become a dominant approach in self-supervised visual representation learning, but efficiently leveraging hard negatives, which are samples closely resembling the anchor, remains challenging. We introduce SynCo (Synthetic negatives in Contrastive learning), a novel approach that improves model performance by generating synthetic hard negatives on the representation space. Building on the MoCo framework, SynCo introduces six strategies for creating diverse synthetic hard negatives on-the-fly with minimal computational overhead. SynCo achieves faster training and strong representation learning, surpassing MoCo-v2 by +0.4% and MoCHI by +1.0% on ImageNet ILSVRC-2012 linear evaluation. It also transfers more effectively to detection tasks achieving strong results on PASCAL VOC detection (57.2% AP) and significantly improving over MoCo-v2 on COCO detection (+1.0% AP) and instance segmentation (+0.8% AP). Our synthetic hard negative generation approach significantly enhances visual representations learned through self-supervised contrastive learning.

Relational Representation Distillation

N. Giakoumoglou and T. Stathaki

Knowledge distillation involves transferring knowledge from large, cumbersome teacher models to more compact student models. The standard approach minimizes the Kullback-Leibler (KL) divergence between the probabilistic outputs of a teacher and student network using a shared temperature-based softmax function. However, this approach fails to capture important structural relationships in the teacher's internal representations. Recent advances have turned to contrastive learning objectives, but these methods impose overly strict constraints through instance-discrimination, forcing apart semantically similar samples even when they should maintain similarity. This motivates an alternative objective by which we preserve relative relationships between instances. Our method employs separate temperature parameters for teacher and student distributions, with sharper student outputs, enabling precise learning of primary relationships while preserving secondary similarities. We show theoretical connections between our objective and both InfoNCE loss and KL divergence. Experiments demonstrate that our method significantly outperforms existing knowledge distillation methods across diverse knowledge transfer tasks, and sometimes even outperforms the teacher network.

Discriminative and Consistent Representation Distillation

N. Giakoumoglou and T. Stathaki

Knowledge Distillation (KD) aims to transfer knowledge from a large teacher model to a smaller student model. While contrastive learning has shown promise in self-supervised learning by creating discriminative representations, its application in knowledge distillation remains limited and focuses primarily on discrimination, neglecting the structural relationships captured by the teacher model. To address this limitation, we propose Discriminative and Consistent Distillation (DCD), which employs a contrastive loss along with a consistency regularization to minimize the discrepancy between the distributions of teacher and student representations. Our method introduces learnable temperature and bias parameters that adapt during training to balance these complementary objectives, replacing the fixed hyperparameters commonly used in contrastive learning approaches. Through extensive experiments on CIFAR-100 and ImageNet ILSVRC-2012, we demonstrate that DCD achieves state-of-the-art performance, with the student model sometimes surpassing the teacher's accuracy. Furthermore, we show that DCD's learned representations exhibit superior cross-dataset generalization when transferred to Tiny ImageNet and STL-10.

A Multimodal Approach for Cross-Domain Image Retrieval

L. Iijima, N. Giakoumoglou and T. Stathaki

Cross-Domain Image Retrieval (CDIR) is a challenging task in computer vision, aiming to match images across different visual domains such as sketches, paintings, and photographs. Traditional approaches focus on visual image features and rely heavily on supervised learning with labeled data and cross-domain correspondences, which leads to an often struggle with the significant domain gap. This paper introduces a novel unsupervised approach to CDIR that incorporates textual context by leveraging pre-trained vision-language models. Our method, dubbed as Caption-Matching (CM), uses generated image captions as a domain-agnostic intermediate representation, enabling effective cross-domain similarity computation without the need for labeled data or fine-tuning. We evaluate our method on standard CDIR benchmark datasets, demonstrating state-of-the-art performance in unsupervised settings with improvements of 24.0% on Office-Home and 132.2% on DomainNet over previous methods. We also demonstrate our method's effectiveness on a dataset of AI-generated images from Midjourney, showcasing its ability to handle complex, multi-domain queries.

A Review on Discriminative Self-supervised Learning Methods

N. Giakoumoglou, T. Stathaki, A. Gkelias

TBA

Expert Clustering and Knowledge Transfer for Whole Slide Image Classification

K. M. Papadopoulos, N. Giakoumoglou, A. Floros, P. L. Dragotti, T. Stathaki

TBA

SynCo-v2

N. Giakoumoglou, A. Floros, K. M. Papadopoulos, T. Stathaki

TBA