Visual media in the form of television programs, online video, and cinematic films have the power to engage people with dynamic, presentations of ideas. Expert storytellers design how such media unfolds over time to help audiences make sense of complex concepts, appreciate cultural or societal differences and imagine living in entirely different worlds. Technological advances have made it cheaper and easier to capture audio-visual media using the video cameras that are readily available in our mobile and desktop devices. Yet, the most viewed video are not simply raw recordings thrown onto the Web. The best material is carefully composed, filtered and edited to ensure that the resulting media is clear and engaging.
Nevertheless, today’s tools for authoring and viewing video treat the media as a “baked” stream of audio samples, pixels, and frames – the very lowest-level representation possible. They have no understanding of the higher-level semantic structure of the audio-visual content. Researchers have developed a variety of techniques for extracting such higher-level structure from video and shown how to use this structure to significantly facilitate analysis, browsing, editing and manipulation of the raw material.
The goal of this graduate seminar (advanced undergraduates also welcome) is to survey recent work on computational video analysis and manipulation techniques. We will learn how to acquire, represent, edit and remix video. Several popular video manipulation algorithms will be presented, with an emphasis on using these techniques to build practical systems. Students will have the opportunity to acquire their own video and develop the processing tools needed to computationally analyze and manipulate it.
There are no official prerequisites for the course, but we will expect familiarity with the basic concepts of Computer Graphics and/or Computer Vision at the level of CS 148/248 and/or CS 131. Contact me (Maneesh) via email if you are worried about whether you have the background for the course.
Chapter 4.1: Feature Detection and Matching: Points and Patches. Szeliski. 2010. (pdf)
Feature-Based Image Metamorphosis. Beier and Neely. SIGGRAPH 1992. (pdf)
Michael Jackson's Black or White video, morphing sequence. (YouTube)
Chapter 2.1: Image Formation: Geometric primitives and transformations. Szeliski. 2010. (pdf)
Chapter 6.1: Feature-Based Alignment: 2D and 3D Feature-Based Alignment. Szeliski. 2010. (pdf)
Video Textures. Schodl et al. SIGGRAPH 2000. (pdf)
Panoramic Video Textures. Agarwala et al. SIGGRAPH 2005. (pdf)
Graphcut Textures. Kwatra et al. SIGGRAPH 2003. (pdf)
Automated Video Looping with Progressive Dynamism. Liao et al. SIGGRAPH 2013. (pdf)
Fast Computation of Seamless Video Loops. Liao et al. SIGGRAPH 2015. (pdf)
Gigapixel Panorama Video Loops. He et al. SIGGRAPH 2018. (pdf)
Bundled Camera Paths for Video Stabilization. Liu et al. SIGGRAPH 2013. (pdf)
Selectively De-Animating Video. Bai et al. SIGGRAPH 2012. (pdf)
Automatic Cinemagraph Portraits. Bai et al. EGSR 2013. (pdf)
Photo Tourism: Exploring Photo Collections in 3D. Snavely et al. SIGGRAPH 2006. (pdf)
Modeling the World from Internet Photo Collections. Snavely et al. IJCV 2007. (pdf)
Sampling Based Scene Space Video Processing. Klose et al. SIGGRAPH 2015. (pdf)
Visual Rhythm and Beat. Davis et al. SIGGRAPH 2018. (pdf)
VideoSnapping: Interactive Synchronization on Multiple Videos. Wang et al. SIGGRAPH 2014. (pdf)
AudeoSynth: Music-Driven Video Montage. Liao et al. SIGGRAPH 2015. (pdf)
Generating Emotionally Relevant Musical Scores for Audio Stories. Rubin et al. UIST 2014. (pdf)
Deformable Model Fitting by Regularized Landmark Mean-Shifts. Saragih et al. IJCV 2011. (pdf)
Detecting face landmarks using deep learning, Blendshape models, Pose estimation.
Bringing Portraits to Life. Averbuch-Elor et al. SIGGRAPH Asia 2017. (pdf)
Tools for Placing Cuts and Transitions in Interview Video. Berthouzoz et al. SIGGRAPH 2012. (pdf)
Computational Video Editing for Dialogue-Driven Scenes. Leake et al. SIGGRAPH 2017. (pdf)
QuickCut: An Interactive Tool for Editing Narrated Video Truong et al. UIST 2016. (pdf)
Image-to-Image Translation with Conditional Adversarial Nets. Isola et al. CVPR 2017. (web)
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Zhu et al. ICCV 2017. (web)
Video-to-Video Synthesis. Wang et al. NeurIPS 2018. (web)
Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. Jiang et al. CVPR 2018. (web)
Video Frame Synthesis using Deep Voxel Flow. Liu et al. ICCV 2017. (arXiv)
Deep Video Portraits. Kim et al. SIGGRAPH 2018. (web)
Everybody Dance Now. Chan et al. 2018. (web)
Instructor: Maneesh Agrawala
Office Hours: 3-4p Mondays, Gates 364 and by appointment
Instructor: Ohad Fried
Office Hours: 3-4p Wednesdays, Gates 375 and by appointment
Instructor: Michael Zollhöfer
Office Hours: 3-4p Tuesdays, Gates 386 and by appointment
To contact us please use Piazza. This is the fastest way to get a response.
Attendance Requirement: This course relies on you reading the assigned papers and participating in the discussions. Therefore attendance is mandatory.
Plagiarism Policy: Assignments should consist primarily of your original work, building off of others’ work–including 3rd party libraries, public source code examples, and design ideas–is acceptable and in most cases encouraged. However, failure to cite such sources will result in score deductions proportional to the severity of the oversight.