Research

Artificial General Intelligence (AGI) & Multimodal Question Answering (QA)

Flipped-VQA: Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP '23)

NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA
(NeurIPS'23)

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR '23)

RPO: Read-only Prompt Optimization for Vision-Language Few-shot Learning
(ICCV'23)

A superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. This hypothetical ability can also be referred to as Artificial General Intelligence (AGI). There have been many milestones towards such a goal, including GPT-4, ChatGPT, CLIP, and Flamingo. They have shown marvelous performance on diverse tasks compared to task-specific weak AI models, even without specific training. Nowadays, these AGI/foundation models have acquired multi-modality (images, videos, knowledge graphs, etc.), achieving a deeper understanding of the world. Our overarching goal is to develop a general-purpose learning system capable of learning and performing unseen tasks using every modality it can utilize.

Our related publications

[EMNLP '23] Large Language Models are Temporal and Causal Reasoners for Video Question Answering[NeurIPS '23] NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA[ICCV '23] Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models [ICCV '23] Distribution-Aware Prompt Tuning for Vision-Language Models[ICCV '23] Read-only Prompt Optimization for Vision-Language Few-shot Learning[CVPR '23] MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models[AAAI '23] Relation-aware Language-Graph Transformer for Question Answering[MedAGI '23] Concept Bottleneck with Visual Concept Filtering for Explainable Medical Image Classification[CVPR '22] Video-Text Representation Learning via Differentiable Weak Temporal Alignment

Deep Generative Models

DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations (ICLR '24)

CoBO: Advancing Bayesian Optimization
via Learning Correlated Latent Space  (NeurIPS '23)

CPila: Invertible Monotone Operators for Normalizing Flows (NeurIPS '22)

Generative models represent a cornerstone in artificial intelligence, serving as powerful engines for innovation in both drug discovery, image generation and video generation. In drug discovery, these models leverage machine learning like Bayesian optimization to design novel molecules, accelerating the identification of potential therapeutic compounds. Concurrently, in image generation, techniques like diffusion models produce realistic images, enabling creative expression and practical applications across diverse fields. With their ability to generate new data samples and push the boundaries of what's possible, generative models continue to reshape industries and drive progress in science and technology.

Our related publications

[ICLR '24] Domain-agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations[NeurIPS '23] Advancing Bayesian Optimization via Learning Smooth Latent Spaces[NeurIPS '22] Invertible Monotone Operators for Normalizing Flows [ECCV '22] k-SALSA: k-anonymous synthetic averaging of retinal images via local style alignment

Implicit Neural Representation and 3D Computer Vision

UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields (NeurIPS '23)


Visualization Results of UP-NeRF

Semantic-Aware Implicit Template Learning via Part Deformation Consistency (ICCV '23)

3D data (e.g., point cloud, voxel, polygonal mesh) are crucial to diverse fields like robotics, autonomous driving, AI Drones, medical data analysis, and scene reconstruction.  We are interested in the field of 3D Computer Vision and 3D Deep Learning based on 3D data, which has more complex geometry than 2D data. Shape classification, indoor/outdoor scene semantic segmentation, and shape correspondence/registration are representative tasks for point cloud data. In addition, Implicit Neural Representation (INR) is in our interest, which is an emerging paradigm that offers a novel approach to representing complex geometric shapes and scenes. 

Our related publications

[ICLR '24] Domain-agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations[NeurIPS '23] Unconstrained Pose Prior-Free Neural Radiance Field[ICCV '23] Semantic-Aware Implicit Template Learning via Part Deformation Consistency[ICML '23] Robust Camera Pose Refinement for Multi-Resolution Hash Encoding[CVPR '23] Self-positioning Point-based Transformer for Point Cloud Understanding [NeurIPS '22] SageMix: Saliency-Guided Mixup for Point Clouds[ICCV '21] Point Cloud Augmentation with Weighted Local Transformations

Graph Neural Networks and Structured Data Analysis

DGCN: Deformable Graph Convolutional Networks (AAAI '22)


MHAug: Metropolis-Hastings Data Augmentation for Graph Neural Networks (NeurIPS'21)

Neo-GNNs: Neighborhood Overlap-aware Graph Neural Networks for Link Prediction (NeurIPS '21)

GTN: Graph Transformer Networks (NeurIPS'19)

In modern data analysis, highly-structured data frequently occur and they can be viewed as data on non-Euclidean spaces (e.g., graphs, Riemannian manifolds, data manifolds, and functional spaces). Naive algorithms do not respect the geometry of the data space, often break the structure of data, return invalid predictions in the ambient space (not in the data space of interest). For structured data analysis, our focus is to develop geometrically-inspired machine learning methods and apply them to real world applications such as computer vision, brain imaging, and recommender systems. 

Our related publications

[NeurIPS '23] Advancing Bayesian Optimization via Learning Smooth Latent Spaces[AAAI '22] Deformable Graph Convolutional Networks[NN '22] Graph Transformer Networks: Learning Meta-path Graphs to Improve GNNs[NeurIPS '21] Metropolis-Hastings Data Augmentation for Graph Neural Networks[NeurIPS '21] Neighborhood Overlap-aware Graph Neural Networks for Link Prediction[NeurIPS '20] Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs[NeurIPS '19] Graph Transformer Networks[ICML '15] Manifold-valued Dirichlet Processes[CVPR '16] Latent Variable Graphical Model Selection using Harmonic Analysis: Applications to the HCP[Quarterly of Applied MathLocalizing differentially evolving covariance structures via scan statistics[ICCV '15] Interpolation on the manifold of k component Gaussian Mixture Models

Deep Understanding of Visual World


HOTR: End-to-End HOI Detection with Transformers (CVPR'21 Oral)

CPC: Consistency Learning via Decoding Path Augmentation (CVPR '22)

High-level computer vision enables a deeper understanding of the visual world. Object recognition systems detect objects in images and videos. They offer basic information on whether certain objects are in the scene and how many instances are in the scene. But the information may not be sufficient for building personalized and automated systems for smart city: smart home, smart offices, and hospitals. Without a deep understanding of the interaction between humans and objects, it is hard to understand the context of the scene and what kind of services are needed. "Scene Understanding" is one topic to study such interaction and generate metadata such as scene graphs. It allows "Visual Question Answering (VQA)". Security cameras are pervasive in modern cities and computer vision helps anomaly detection: flood, wildfire, dangerous wild animals, and estimate traffic and even temperature. We study algorithms that offer a more accurate and deeper understanding of the visual world and help people to live safer and smarter. 

Our related publications

[NeurIPS '22] TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers[CVPR '22] Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection[CVPR '21] HOTR: End-to-End Human-Object Interaction Detection with Transformers[ECCV '20] UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection[CVPR '18] Tensorize, Factorize and Regularize: Robust Visual Relationship Learning[ECCV '16] Abundant Inverse Regression using Sufficient Reduction and its Applications[ECCV '18] Efficient Relative Attribute Learning using Graph Neural Networks 

Safe AI, Adversarial Examples, and Uncertainty

Machine learning models (or deep neural networks) have been used in a variety of applications including autonomous robots, vehicles, and drones. When deploying AI systems to the physical world, the reliability of algorithms is crucial for safety. Guaranteeing such safety includes specification, robustness, and assurance. Given a concrete purpose of the system (specification), the AI system should be robust to perturbations and attacks (adversarial examples). Further, the uncertainty of predictions by models helps monitor and control the AI system's activity. In this line of thought, we study uncertainty of models (e.g., Bayesian Neural Networks) and adversarial examples from both attacker and defender perspectives. This topic may fall in the intersection of AI and security.

Our related publications

[IEEE ACCESS'21] Search-and-Attack: Temporally SparseAdversarial Perturbations on Videos[ECCV '20] Robust Neural Networks inspired by Strong Stability Preserving Runge-Kutta methods[UAI '19] Sampling-free Uncertainty Estimation in Gated Recurrent Units with Applications to Normative Modeling in Neuroimaging[arxiv '18] Sampling-free Uncertainty Estimation in Gated Recurrent Units with Exponential Families 

Medical Imaging

Riemannian MLGM (CVPR)

Medical imaging or brain imaging inherently has many structured measurements such as diffusion tensor image (DTI), high angular resolution diffusion images (HARDI), ensemble average propagators (EAPs), etc. Common goals in medical imaging are to identify important regions related to a certain disease, detect diseases at the early stage, and model the disease progression. To provide predictions and findings that are rigorously tested by statistics, more powerful pipelines are needed. We study a more powerful representation of medical images and models (mixed effects models for structured data, filtering, dimensionality reduction etc.). We also research few-shot detection, domain-adaptation, and contrastive learning to deal with limited samples and labels in the medical domain.

Our related publications

[CVPR '17] Riemannian Nonlinear Mixed Effects Models: Analyzing Longitudinal Deformations in Neuroimaging [CVPRW '17] Riemannian Variance Filtering: An Independent Filtering Scheme for Statistical Tests on Manifold-valued Data [ECCV '15] Canonical Correlation Analysis on Riemannian Manifolds and its Applications [CVPR '14] MGLM on Riemannian Manifolds with Applications to Statistical Analysis of Diffusion Weighted Images