Visual media in the form of television programs, online video, and cinematic films have the power to engage people with dynamic, presentations of ideas. Expert storytellers design how such media unfolds over time to help audiences make sense of complex concepts, appreciate cultural or societal differences and imagine living in entirely different worlds. Technological advances have made it cheaper and easier to capture audio-visual media using the video cameras that are readily available in our mobile and desktop devices. Yet, the most viewed video are not simply raw recordings thrown onto the Web. The best material is carefully composed, filtered and edited to ensure that the resulting media is clear and engaging.

Nevertheless, today’s tools for authoring and viewing video treat the media as a “baked” stream of audio samples, pixels, and frames – the very lowest-level representation possible. They have no understanding of the higher-level semantic structure of the audio-visual content. Researchers have developed a variety of techniques for extracting such higher-level structure from video and shown how to use this structure to significantly facilitate analysis, browsing, editing and manipulation of the raw material.

The goal of this graduate seminar (advanced undergraduates also welcome) is to survey recent work on computational video analysis and manipulation techniques. We will learn how to acquire, represent, edit and remix video. Several popular video manipulation algorithms will be presented, with an emphasis on using these techniques to build practical systems. Students will have the opportunity to acquire their own video and develop the processing tools needed to computationally analyze and manipulate it.

There are no official prerequisites for the course, but we will expect familiarity with the basic concepts of Computer Graphics and/or Computer Vision at the level of CS 148/248 and/or CS 131. Contact me (Maneesh) via email if you are worried about whether you have the background for the course.

Schedule


Week 1
M Apr 1: Introduction/Feature Detection
    Slides
   Assigned: Assignment 1 (due Apr 10 by 1:30pm)
   Optional readings
        Chapter 4.1: Feature Detection and Matching: Points and Patches. Szeliski. 2010. (pdf)
 
W Apr 3: Warping/RANSAC/Morphing
    Reading Prompt | Slides
   Required readings
        Feature-Based Image Metamorphosis. Beier and Neely. SIGGRAPH 1992. (pdf)
        Michael Jackson's Black or White video, morphing sequence. (YouTube)
   Optional readings
        Chapter 2.1: Image Formation: Geometric primitives and transformations. Szeliski. 2010. (pdf)
        Chapter 6.1: Feature-Based Alignment: 2D and 3D Feature-Based Alignment. Szeliski. 2010. (pdf)
 
Week 2
M Apr 8: Feature Tracking and Video Texture
    Reading Prompt | Slides
   Required readings
        Video Textures. Schodl et al. SIGGRAPH 2000. (pdf)
   Optional readings
        Panoramic Video Textures. Agarwala et al. SIGGRAPH 2005. (pdf)
 
W Apr 10: Graph-Cut Texture
    Reading Prompt | Slides
   Due (by 1:30pm): Assignment 1
   Assigned: Assignment 2 (due Apr 24 by 1:30pm)
   Required readings
        Graphcut Textures. Kwatra et al. SIGGRAPH 2003. (pdf)
 
Week 3
M Apr 15: Looping in Space and Time
    Reading Prompt | Slides
   Required readings
        Automated Video Looping with Progressive Dynamism. Liao et al. SIGGRAPH 2013. (pdf)
   Optional readings
        Fast Computation of Seamless Video Loops. Liao et al. SIGGRAPH 2015. (pdf)
        Gigapixel Panorama Video Loops. He et al. SIGGRAPH 2018. (pdf)
 
W Apr 17: Stabilization
    Reading Prompt | Slides
   Required readings
        Bundled Camera Paths for Video Stabilization. Liu et al. SIGGRAPH 2013. (pdf)
 
Week 4
M Apr 22: De-Animation and Cinemagraphs
    Reading Prompt | Slides
   Required readings
        Selectively De-Animating Video. Bai et al. SIGGRAPH 2012. (pdf)
   Optional readings
        Automatic Cinemagraph Portraits. Bai et al. EGSR 2013. (pdf)
 
W Apr 24: Structure from Motion
    Reading Prompt | Slides
   Due (by 1:30pm): Assignment 2
   Required readings
        Photo Tourism: Exploring Photo Collections in 3D. Snavely et al. SIGGRAPH 2006. (pdf)
   Optional readings
        Modeling the World from Internet Photo Collections. Snavely et al. IJCV 2007. (pdf)
 
Week 5
M Apr 29: Scene Building
    Reading Prompt | Slides
   Assigned: Final Project: Proposal (due May 6 by 1:30pm)
   Assigned: Final Project: Presentation (due Jun 5 )
   Required readings
        Sampling Based Scene Space Video Processing. Klose et al. SIGGRAPH 2015. (pdf)
 
W May 1: Video Retiming
    Reading Prompt | Slides
   Required readings
        Visual Rhythm and Beat. Davis et al. SIGGRAPH 2018. (pdf)
   Optional readings
        VideoSnapping: Interactive Synchronization on Multiple Videos. Wang et al. SIGGRAPH 2014. (pdf)
 
Week 6
M May 6: Music and Video
    Reading Prompt | Slides
   Due (by 1:30pm): Final Project: Proposal
   Required readings
        AudeoSynth: Music-Driven Video Montage. Liao et al. SIGGRAPH 2015. (pdf)
   Optional readings
        Generating Emotionally Relevant Musical Scores for Audio Stories. Rubin et al. UIST 2014. (pdf)
 
W May 8: Processing Faces
    Reading Prompt | Slides
   Required readings
        Deformable Model Fitting by Regularized Landmark Mean-Shifts. Saragih et al. IJCV 2011. (pdf)
   Optional readings
        Detecting face landmarks using deep learning, Blendshape models, Pose estimation.
 
Week 7
M May 13: Bringing Portraits to Life
    Reading Prompt | Slides
   Required readings
        Bringing Portraits to Life. Averbuch-Elor et al. SIGGRAPH Asia 2017. (pdf)
 
W May 15: Transcript-Based Manipulation and Browsing
    Reading Prompt | Slides
   Required readings
        Tools for Placing Cuts and Transitions in Interview Video. Berthouzoz et al. SIGGRAPH 2012. (pdf)
 
Week 8
M May 20: Automated Film Editing
    Reading Prompt | Slides
   Required readings
        Computational Video Editing for Dialogue-Driven Scenes. Leake et al. SIGGRAPH 2017. (pdf)
   Optional readings
        QuickCut: An Interactive Tool for Editing Narrated Video Truong et al. UIST 2016. (pdf)
 
W May 22: Generative Adversarial Networks
    Reading Prompt | Slides
   Required readings
        Image-to-Image Translation with Conditional Adversarial Nets. Isola et al. CVPR 2017. (web)
   Optional readings
        Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Zhu et al. ICCV 2017. (web)
        Video-to-Video Synthesis. Wang et al. NeurIPS 2018. (web)
 
Week 9
M May 27: Memorial Day Holiday (No Class)
 
W May 29: Learning Slow Motion
    Reading Prompt | Slides
   Required readings
        Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. Jiang et al. CVPR 2018. (web)
   Optional readings
        Video Frame Synthesis using Deep Voxel Flow. Liu et al. ICCV 2017. (arXiv)
 
Week 10
M Jun 3: GANs for Faces and Pose
    Reading Prompt | Slides
   Required readings
        Deep Video Portraits. Kim et al. SIGGRAPH 2018. (web)
   Optional readings
        Everybody Dance Now. Chan et al. 2018. (web)
 
W Jun 5: TBD
   Due: Final Project: Presentation
 


Teaching Staff


Instructor: Maneesh Agrawala
    Office Hours: 3-4p Mondays, Gates 364 and by appointment
Instructor: Ohad Fried
    Office Hours: 3-4p Wednesdays, Gates 375 and by appointment
Instructor: Michael Zollhöfer
    Office Hours: 3-4p Tuesdays, Gates 386 and by appointment

To contact us please use Piazza. This is the fastest way to get a response.


Assignments and Requirements


Class Participation (15%)
Paper Presentation (15%)
Assignment 1: Manual Manipulation (5%)
Assignment 2: Video Morphing (15%)
Final Project (50%)

Attendance Requirement: This course relies on you reading the assigned papers and participating in the discussions. Therefore attendance is mandatory.

Plagiarism Policy: Assignments should consist primarily of your original work, building off of others’ work–including 3rd party libraries, public source code examples, and design ideas–is acceptable and in most cases encouraged. However, failure to cite such sources will result in score deductions proportional to the severity of the oversight.