Seongsu Ha

I am a deep learning researcher interested in computer vision and machine learning, with a current focus on agentic video understanding and video deep research. My research aims to build multimodal systems that can search, retrieve, and reason over large-scale video corpora, while drawing on broader interests in multimodal representation learning, video understanding, video corpus moment retrieval, video scene boundary segmentation, and visual grounding. Currently, I am a PhD student at the University of North Carolina at Chapel Hill, advised by Prof. Gedas Bertasius. Previously, I received Master's degree in Data Science at Seoul National University, advised by Prof. Joonseok Lee. I also received a Bachelor's degree in Computer Science from the University of Illinois at Urbana-Champaign.

seongsu0311@gmail.com  /  LinkedIn

profile photo
News

  • 06/2026: Paper on benchmarking strategic video intelligence released. arXiv 2026
  • 04/2026: Paper on video large language models accepted at CVPR Findings 2026
  • 08/2025: Starting PhD in Computer Science at the University of North Carolina at Chapel Hill
  • 01/2025: Started working at EverEx as AI research engineer
  • 12/2024: Marengo-2.7, a new SOTA video foundation model of multivector representation, released! Tech Blog
  • 08/2024: TWLV-1, analysis and insights from evaluation on video foundation models, released! Tech Report
  • 07/2024: Paper on referring image segmentation accepted at ECCV 2024
  • 07/2024: Paper on video frame sampling accepted at BMVC 2024
  • 03/2024: Pegasus-1, a new SOTA video-to-text generative model, released! Tech Report
  • 03/2024: Marengo-2.6, a new SOTA video foundation model for any-to-any search, released! Tech Blog
  • 01/2024: Paper on video moment localization accepted at AISTATS 2024
  • 09/2023: Started working at Twelve Labs as research scientist
  • 06/2023: Started working at Twelve Labs as research intern
  • 05/2023: Paper on talking head generation accepted at Sight and Sound, CVPR Workshop 2023
  • 06/2022: Paper on scene boundary segmentation accepted at ACCV 2022
  • 01/2022: Started working at KakaoBrain as research intern
  • 03/2021: Started MS in Data Science at Seoul National University Graduate School of Data Science

Research
SVI-Bench: A Dynamic Microworld for Strategic Video Intelligence
Yulu Pan*, Han Yi*, Seongsu Ha*, Md Mohaiminul Islam*, Benjamin Zhang, Lorenzo Torresani, Gedas Bertasius
ArXiV, 2026
paper
Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
Hyungjin Chung*, Hyelin Nam, Jiyeon Kim, Hyojun Go, Byeongjun Park, Junho Kim, Joonseok Lee, Seongsu Ha, Byunghoon Kim
CVPR Findings, 2026
paper
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Seongsu Ha*, Chaeyun Kim*, Donghwa Kim*, Junho Lee, Sangho Lee, Joonseok Lee
ECCV, 2024
paper
Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space
Junho Lee*, Jeongwoo Shin, Seung Woo Ko, Seongsu Ha, Joonseok Lee
BMVC, 2024
paper
Towards a Complete Benchmark on Video Moment Localization
Jinyeong Chae*, Donghwa Kim*, Kwanseok Kim, Doyeon Lee, Sangho Lee, Seongsu Ha, Jonghwan Mun, Woo-Young Kang, Byungseok Roh, Joonseok Lee
AISTATS, 2024
paper
Disentangled Audio-Driven NeRF: Talking Head Generation with Detailed Identity-Specific Micro expressions
Seoyoung Lee*, Seongsu Ha*, Joonseok Lee
CVPRW, 2023
paper
Boundary-aware Self-supervised Learning for Video Scene Segmentation
Jonghwan Mun*, Minchul Shin*, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim
ACCV, 2022
paper
Marengo 2.7: Pioneering Multi-Vector Embeddings for Advanced Video Understanding
Twelve Labs
Technical Blog, 24.12
blog
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models
Twelve Labs
Technical Report arXiv, 24.08
paper
Pegasus-1: a new SOTA Video-to-Text Generative Model
Twelve Labs
Technical Report arXiv, 24.04
paper
Marengo-2.6: a new SOTA Video Foundation Model for Any-to-Any Search
Twelve Labs
Technical Blog, 24.03
blog
Experience
Graduate Researcher, Multimodal Video Perception Group

Aug.2025 ~ Present

AI research engineer, EverEx

Jan.2025 ~ May.2025

ML Research Scientist, Twelve Labs.

Sep.2023 ~ Jan.2025

ML Research Intern, Twelve Labs.

June.2023 ~ Sep.2023

Graduate Researcher, Visual Information Processing Lab.

Mar.2021 ~ Jun. 2023

Research Intern, Kakaobrain

Jan.2022 ~ Mar.2022

Research Assistant, Perform Research Group

Jun.2018 ~ Sep.2018


Source code credit to Dr. Jon Barron