Wonseok Oh

I am a second-year M.S. student at the University of Michigan. I am currently majoring in Electrical and Computer Engineering (computer vision track).

Currently, I am a research assistant advised by Professor Andrew Owens, focusing on multi-modal learning with sound. I am also collaborating with Professor Qing Qu on diffusion models.

Previously I was a research assistant at the ETRI, Daejeon, South Korea, working with Kimin Yun and Yongju Lee in the summer of 2021 and 2022.
I received a B.S. in Computer Science and Engineering at Korea University and a B.E. in Software Technology and Enterprise Program, Chemical and Biological Engineering in 2022 (triple major!).

My research interests are Computer Vision, Machine Learning, and Robotics. I'm most interested in multimodal learning using sounds and computational creation using generative models.

I am actively seeking PhD positions starting in Fall 2025 :)

Recent Projects

EECS556 Image processing (Instructor: Liyue shen)

  1. Enhancing Multi-View Optical Illusion Generation with Latent Diffusion Models and Traditional Image Processing.
    Improve the perception of multi-view optical illusions by combining latent diffusion models with traditional image processing techniques. Our approach generates high-quality images from limited data and provides diverse transformations, including rotations and flips, to enhance the details usually obscured in 2D images.

EECS598 Biomedical Imaging (Instructor: Liyue shen)

  1. Domain transfer of sketched facial image into realistic facial image to prevent crime (Vision detectives).
    Improve forensic methods by generating detailed images of criminals from sketches using the pSp model. Then refine these images by applying specific characteristics through InstructPix2Pix. Evaluate the approach with real sketches.

EECS598 Action and Perception (Instructor: Stella Yu)

  1. Robotic Adaptation Strategies: From Simulation to Real-World Execution with RMA.
    Enhanced and evaluated Rapid Motor Adaptation (RMA) for legged robots. Key components include fine-tuning RMA to better adapt to diverse terrains, comparing phase-1 and phase-2 adaptations, and methodically analyzing performance across a range of environmental challenges in the real world. JPG [1] [2] MP4 [1] [2] [3] [4] [5] [6] [7] [8]

EECS542 Advanced topics of computer vision (Instructor: Andrew Owens)

  1. Multimodal with Latent Diffusion Models to Advance Multi-View Optical Illusion Generation.
    Proposed a method that enhances latent diffusion models with multimodal inputs, including sound and text, to generate dynamic multi-view optical illusions. Demonstrate the feasibility of multimodal approaches in enhancing optical illusion generation, evidenced by superior CLIP scores.

EECS598 Biomedical Imaging (Instructor: Liyue shen)

  1. Generate Domain Knowledge Diffusion Models employing the Schrödinger Bridge to enhance data representation for rare diseases.
    By maintaining domain integrity using a domain phase loss method, high-fidelity images for medical imbalance datasets were made. The approach overcomes prior knowledge loss when transferring between domains, outperforming existing models in unpaired image-to-image translation.

EECS453 Principles of Machine Learning (Instructor: Qing Qu)

  1. Learning Accurate and Parsimonious Point Cloud Representations from Images. Making efficient point cloud Nerf.
    Combine the strengths of volumetric neural rendering and deep multi-view stereo, using neural 3D point clouds and features to efficiently model a radiance field, improving both efficiency and visual quality.