YouTube and other web services alike have revolutionized content sharing and online social networking by providing an easy-to-use platform for users to post and share video. At the same time, content owners have raised serious concerns on unauthorized uploads of copyrighted movies and TV shows to these websites, as witnessed by high-profile lawsuits filed against YouTube and Google. In order to deter copyright violation and more importantly, to help keep online communities alive legally, "content fingerprinting" technologies are deployed to compute a short string of bits to capture unique characteristics of each video and use it determine whether an uploaded video belongs to a set of copyrighted content or not. Content fingerprints are also used by such applications as Shazam on iPhone to use recordings of short audio clips to identify the song and provide information about the artist, the album, and where to buy.
Our research focuses on developing a better understing of content fingerprinting systems through theoretical modeling and analysis and answer questions regarding the identification performance, scalability and security. Through our analysis, we have also derived guidelines for improving the performance of commonly used modules in fingerprinting.