Seongsu Ha

I am a deep learning researcher interested in computer vision and machine learning. Specifically my research interest lies in improving the quality of multi-modal representations and their interactions for various downstream applications, such as video understanding, video corpus moment retrieval, video scene boundary segmentation and visual grounding. Previously, I received Master's degree in Data Science at Seoul National University Visual Information Processing Lab, advised by Prof. Joonseok Lee. I also received Bachelor's degree in Computer Science, Engineering at University of Illinois at Urbana-Champaign.

Email  /  LinkedIn

profile photo
News

  • 08/2024: TWLV-1, analysis and insights from evaluation on video foundation models, released! Tech Report
  • 07/2024: Paper on referring image segmentation accepted at ECCV 2024
  • 07/2024: Paper on video frame sampling accepted at BMVC 2024
  • 03/2024: Pegasus-1, a new SOTA video-to-text generative model, released! Tech Report
  • 03/2024: Marengo-2.6, a new SOTA video foundation model for any-to-any search, released! Tech Blog
  • 01/2024: Paper on video moment localization accepted at AISTATS 2024
  • 09/2023: Started working at Twelve Labs as research scientist
  • 06/2023: Started working at Twelve Labs as research intern
  • 05/2023: Paper on talking head generation accepted at Sight and Sound, CVPR Workshop 2023
  • 06/2022: Paper on scene boundary segmentation accepted at ACCV 2022
  • 01/2022: Started working at KakaoBrain as research intern
  • 03/2021: Started MS in Data Science at Seoul National University Graduate School of Data Science

Research
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models
Twelve Labs
Technical Report arXiv, 24.08
paper
Pegasus-1: a new SOTA Video-to-Text Generative Model
Twelve Labs
Technical Report arXiv, 24.04
paper
Marengo-2.6: a new SOTA Video Foundation Model for Any-to-Any Search
Twelve Labs
Technical Blog, 24.03
blog
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Seongsu Ha*, Chaeyun Kim*, Donghwa Kim*, Junho Lee, Sangho Lee, Joonseok Lee
ECCV, 2024
paper
Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space
Junho Lee*, Jeongwoo Shin, Seung Woo Ko, Seongsu Ha, Joonseok Lee
BMVC, 2024
paper
Towards a Complete Benchmark on Video Moment Localization
Jinyeong Chae*, Donghwa Kim*, Kwanseok Kim, Doyeon Lee, Sangho Lee, Seongsu Ha, Jonghwan Mun, Woo-Young Kang, Byungseok Roh, Joonseok Lee
AISTATS, 2024
paper
Disentangled Audio-Driven NeRF: Talking Head Generation with Detailed Identity-Specific Micro expressions
Seoyoung Lee*, Seongsu Ha*, Joonseok Lee
CVPRW, 2023
paper
Boundary-aware Self-supervised Learning for Video Scene Segmentation
Jonghwan Mun*, Minchul Shin*, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim
ACCV, 2022
paper
Experience
ML Research Scientist, Twelve Labs.

Sep.2023 ~ Present

ML Research Intern, Twelve Labs.

June.2023 ~ Sep.2023

Graduate Researcher, Visual Information Processing Lab.

Mar.2021 ~ Jun. 2023

Research Intern, Kakaobrain

Jan.2022 ~ Mar.2022

Research Assistant, Perform Research Group

Jun.2018 ~ Sep.2018


Source code credit to Dr. Jon Barron