Commsdart
Posts
Social media data: The inevitable fuel for AI

Social media data: The inevitable fuel for AI

How LinkedIn, Meta, X, and Google are using your data to train AI and why it might be unavoidable

Stephen
September 19, 2024 • Read time: 7 minutes

News today that LinkedIn has come under scrutiny for using user data to train AI models without explicit consent has reignited the debate about privacy in the digital age.

The business-focused social network, owned by Microsoft, has opted its users into this practice by default, allowing data from public posts to train both its own AI models and those of Microsoft.

While LinkedIn claims that privacy measures are in place and European users are exempt, the controversy surrounding its decision is emblematic of a broader trend across the social media landscape.

LinkedIn, however, is not alone in pursuing this path. The practice of leveraging vast amounts of user data to train AI models is becoming a standard operating procedure for major social platforms.

From Meta’s use of Facebook and Instagram posts to Elon Musk’s ambitions with X (formerly Twitter), and even Google’s handling of YouTube data, it is clear that data from billions of users is fuelling the next generation of AI.

The question, then, is whether this shift was inevitable—and if so, whether it should be embraced, rather than resisted.

LinkedIn and the quiet data revolution

LinkedIn’s decision to use user data to train AI models caught many by surprise, though it closely mirrors moves made by its tech rivals. The platform recently clarified that public posts, comments, and other shared content are being used to enhance its generative AI features.

Crucially, users were not explicitly asked for consent; instead, they were automatically opted in. Privacy advocates argue that this opt-out model leaves users vulnerable to unanticipated data exploitation. While the platform offers an option to disable this feature, the outcry has highlighted how little control users have over how their data is handled by large corporations.

This is not just a LinkedIn issue. Across the social media spectrum, platforms are following a similar trajectory, feeding ever-expanding data sets into AI models without significant transparency or accountability. Yet, despite the backlash, this practice is arguably the natural evolution of the way social media companies have long operated.

Meta’s decade-long data harvesting

Meta has perhaps been the most aggressive in this regard. Since 2007, it has been collecting vast amounts of data from its users, with the aim of using this information to train its AI models.

Recent revelations indicate that all public posts, photos, and comments from adult users on Facebook and Instagram have been fed into these systems for more than a decade. Unless users actively adjusted their privacy settings, their data has likely been used for purposes they never anticipated.

In regions with stringent privacy laws, such as the European Union, Meta has been forced to allow users to opt out of this process. However, in most parts of the world, including the United States, users have no such option if they wish to keep their posts public.

Privacy advocates have criticised Meta for its opaque policies, arguing that users should have been informed more clearly about how their data would be used. Yet Meta’s approach reflects the broader reality: user data is the lifeblood of social platforms and AI is the next frontier in which this data is being harnessed.

X and the Grok experiment

Elon Musk’s X has also embraced this trend, albeit with its characteristic controversy. In July this year, X quietly changed its user settings, allowing its parent company, xAI, to use data from public posts to train its chatbot, Grok.

The move prompted swift legal action in Europe, where nine countries filed complaints against the platform for violating privacy laws. In response, X halted its data scraping activities in the region, though it continues unabated elsewhere.

This development is yet another example of the friction between innovation and regulation. Musk’s ambitions for Grok rely heavily on access to the vast trove of data housed on X.

Yet, as the platform’s legal troubles in Europe demonstrate, the question of user consent remains a thorny issue. Like Meta, X has placed the burden on users to opt out, rather than seek their active agreement. This, privacy experts argue, is a fundamental violation of trust in the digital space.

Google and the YouTube data goldmine

Google, too, has found itself under scrutiny for how it uses user data. Reports suggest that the company has been transcribing YouTube videos to train its AI models, including the advanced GPT-4 system developed by OpenAI.

Although YouTube’s terms of service ostensibly allow Google to use content for such purposes, the company has faced criticism over its lack of transparency. Content creators, many of whom have large followings on the platform, are often unaware that their work is being used to train AI models.

YouTube’s CEO, Neal Mohan, has stated that AI training is conducted in accordance with the platform’s terms and individual contracts with creators. However, it remains unclear how widespread this practice is and whether it extends beyond the contracts Google has in place.

Like its peers, Google is walking a fine line between innovation and user trust, as it seeks to balance its ambitions in AI with the realities of data governance.

The inevitable shift: AI as the new advertising

The use of social media data to train AI models may feel like an encroachment on privacy, but it is the inevitable consequence of how these platforms have operated for years.

Just as social media companies have long used personal data to power their advertising engines, they are now using it to enhance their AI capabilities. In many ways, this shift is a natural progression.

AI models require vast amounts of data to function effectively, and social media platforms, which house the daily activities, thoughts, and interactions of billions of people, are an obvious source.

Personally, I am willing to accept this trade-off. The content I share on social media is not intellectual property, and if it can be used to build better, more inclusive AI systems, I see it as a positive outcome.

Of course, this is not a sentiment shared by all. Privacy concerns are valid, and the current model of opting users into data collection without explicit consent is problematic. But, as with the use of data for targeted advertising, the use of data for AI training feels like an unavoidable reality of our digital age.

Transparency and consent in the AI era

As social platforms continue to evolve, their use of user data will only become more sophisticated. AI offers unprecedented opportunities, but it also raises new questions about privacy, consent, and the relationship between users and the platforms they rely on.

Social media companies must find a way to balance their AI ambitions with the need for greater transparency and user control. In the meantime, we, as users, must decide whether the benefits of AI outweigh the risks of having our data used in ways we may not fully understand.