|
Seongsu Ha
I am a deep learning researcher interested in computer vision and machine learning, with a current focus on agentic video understanding and video deep research.
My research aims to build multimodal systems that can search, retrieve, and reason over large-scale video corpora,
while drawing on broader interests in multimodal representation learning, video understanding, video corpus moment retrieval, video scene boundary segmentation, and visual grounding.
Currently, I am a PhD student at the University of North Carolina at Chapel Hill, advised by Prof. Gedas Bertasius.
Previously, I received Master's degree in Data Science at Seoul National University, advised by Prof. Joonseok Lee.
I also received a Bachelor's degree in Computer Science from the University of Illinois at Urbana-Champaign.
seongsu0311@gmail.com  / 
LinkedIn
|
|
|
News
- 06/2026: Paper on benchmarking strategic video intelligence released. arXiv 2026
- 04/2026: Paper on video large language models accepted at CVPR Findings 2026
- 08/2025: Starting PhD in Computer Science at the University of North Carolina at Chapel Hill
- 01/2025: Started working at EverEx as AI research engineer
- 12/2024: Marengo-2.7, a new SOTA video foundation model of multivector representation, released! Tech Blog
- 08/2024: TWLV-1, analysis and insights from evaluation on video foundation models, released! Tech Report
- 07/2024: Paper on referring image segmentation accepted at ECCV 2024
- 07/2024: Paper on video frame sampling accepted at BMVC 2024
- 03/2024: Pegasus-1, a new SOTA video-to-text generative model, released! Tech Report
- 03/2024: Marengo-2.6, a new SOTA video foundation model for any-to-any search, released! Tech Blog
- 01/2024: Paper on video moment localization accepted at AISTATS 2024
- 09/2023: Started working at Twelve Labs as research scientist
- 06/2023: Started working at Twelve Labs as research intern
- 05/2023: Paper on talking head generation accepted at Sight and Sound, CVPR Workshop 2023
- 06/2022: Paper on scene boundary segmentation accepted at ACCV 2022
- 01/2022: Started working at KakaoBrain as research intern
- 03/2021: Started MS in Data Science at Seoul National University Graduate School of Data Science
|
|
|
SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence
Yulu Pan*, Han Yi*, Seongsu Ha*, Md Mohaiminul Islam*, Benjamin Zhang, Lorenzo Torresani, Gedas Bertasius
ArXiV, 2026
paper
|
|
|
Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
Hyungjin Chung*, Hyelin Nam, Jiyeon Kim, Hyojun Go, Byeongjun Park, Junho Kim, Joonseok Lee, Seongsu Ha, Byunghoon Kim
CVPR Findings, 2026
paper
|
|
|
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Seongsu Ha*, Chaeyun Kim*, Donghwa Kim*, Junho Lee, Sangho Lee, Joonseok Lee
ECCV, 2024
paper
|
|
|
Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space
Junho Lee*, Jeongwoo Shin, Seung Woo Ko, Seongsu Ha, Joonseok Lee
BMVC, 2024
paper
|
|
|
Towards a Complete Benchmark on Video Moment Localization
Jinyeong Chae*, Donghwa Kim*, Kwanseok Kim, Doyeon Lee, Sangho Lee, Seongsu Ha, Jonghwan Mun, Woo-Young Kang, Byungseok Roh, Joonseok Lee
AISTATS, 2024
paper
|
|
|
Disentangled Audio-Driven NeRF: Talking Head Generation with Detailed Identity-Specific Micro expressions
Seoyoung Lee*, Seongsu Ha*, Joonseok Lee
CVPRW, 2023
paper
|
|
|
Boundary-aware Self-supervised Learning for Video Scene Segmentation
Jonghwan Mun*, Minchul Shin*, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim
ACCV, 2022
paper
|
|
|
Marengo 2.7: Pioneering Multi-Vector Embeddings for Advanced Video Understanding
Twelve Labs
Technical Blog, 24.12
blog
|
|
|
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models
Twelve Labs
Technical Report arXiv, 24.08
paper
|
|
|
Pegasus-1: a new SOTA Video-to-Text Generative Model
Twelve Labs
Technical Report arXiv, 24.04
paper
|
|
|
Marengo-2.6: a new SOTA Video Foundation Model for Any-to-Any Search
Twelve Labs
Technical Blog, 24.03
blog
|
|