Google Drive Flags Almost Empty Information for ‘Copyright Infringement’

Users were left shocked as Google Drive’s automated detection systems flagged a nearly empty file for copyright infringement. The file, as per one Drive user, contained nothing except just the digit “1” within.

Is the Number “1” Copyrighted?

Recently, Assistant Professor at Michigan State University, Dr. Emily Dolson, Ph.D. reported seeing some odd behavior when utilizing Google Drive. One of the files in Dolson’s Google Drive, ‘output04.txt’ was nearly empty—with nothing other than the digit ‘1’ inside it.

But according to Google, this file violated the company’s “Copyright Infringement policy” and was hence flagged. And what’s worse is, the warning sent to the professor ended with “A review cannot be request for this restriction.” 

Google-Drive-Flags-Almost-Empty-Information-for-Copyright-Infringement-image1

Dolson’s file ‘output04.txt’ was stored at path ‘CSE 830 Spring 2022/Testcases/Homework3/Q3/output’ in Drive which led the professor to wonder if the file path possibly contributed to the false alarm. Present on Dolson’s “non-educational Google account,” the file was among a batch of TXTs containing output generated as part of a homework assignment.

What are One too many digits?

A pseudonymous user also shared screenshots of their Google Drive account where files containing just the digit “1”—with or without newline characters, were flagged. “The 1 byte files contain just ‘1’, the 2-byte file is ‘1\n’, and the 3-byte (not flagged yet) file has ‘1\r\n’,” wrote the user.

Google-Drive-Flags-Almost-Empty-Information-for-Copyright-Infringement-image2

It turns out the behavior isn’t limited to just files containing the digit “1.”

Dr. Chris Jefferson, Ph.D., an AI and mathematics researcher at the University of St Andrews was also able to reproduce the issue when uploading multiple computer-generated files to Drive. Jefferson generated over 2,000 files, each containing just a number between -1000 and 1000.

The files containing the digits 173, 174, 186, 266, 285, 302, 336, 451, 500, and 833 were shortly flagged by Google Drive for copyright infringement. Some allege that should the file contain just the digit “0,” Google would permanently disable your account, although the outcome more likely applies to users that Google deems to be repeat infringers.

“I deleted the experiment, just in case I got my account deleted for too many naughty numbers,” writes Jefferson. Mikko Ohtamaa, the founder of Defi Company Capitalgram, alleged that Google’s automated style of flagging suspected copyright infringement candidates could be problematic with parts of the GDPR legislation.

Note, however, the GDPR Article 22 also known as “automated individual decision-making, including profiling,” more specifically refers to making automated decisions about individuals by profiling their online behavior, such as before granting a loan or when making hiring decisions, as explained by UK’s ICO.

“I’d have more sympathy if it weren’t ‘A review cannot be requested for this condition. “It’s developed to be as difficult and draconian as possible. They chose this. It is guilty until proven innocent, with no alternative.” It isn’t known yet what generates this behavior and our experts have been unable to reproduce the issue at the time of writing.

Google posted a precise document in 2018, explaining how the organization fights to pirate. But when particularly talking about Google Drive, the report frames a “full-time abuse engineering team” was set up by Google for attacking illegal streams served on Google Drive. As such, not much information is available on how Google’s algorithms function non-video content kept on Drive. Our experts reached out to Google well in advance of posting with specific questions—such as, whether Google depended on checksums to keep track of copyrighted content and if this behavior rose from a possible hash-collision between copyrighted files and benign ones sharing the same hash. 

Leave a Reply