Metaphor of the Month! Data Scraping

Web Scraping GraphicAs I prepare for my Fall class, “Writing With and About AI,” as well as a book proposal on AI in the writing classroom, I keep encountering neologisms like this month’s metaphor.

What do “scrapers” do? They can, according to a firm that employs them, “browse sites based on your keyword inputs or connections to your website or social media accounts. They can also skim through online reviews, product descriptions, and other categories.” That sounds benign enough, as sites like this one lie behind no pay-walls (there’s a neologism to which I’ll return in some future post).  The practice, according to the Wikipedia entry, appears to date to the 1980s, before we had The Web or household Internet.

Why scrape data to train AI? From the firm quoted in the previous paragraph, data scrapers assist in “automating outreach, [and] they can also help during the early company development and research phases. Even later on, you can use them to monitor online chatter and brand perception.” As I tell my students constantly, they need to learn how to use these AI-based tools, even if they dislike them. Getting a job will depend upon AI-fluency.

And yet as I write this, the BBC has threatened to take the AI firm Perplexity to court for unauthorized scraping of its data and “reproducing BBC content ‘verbatim’ without its permission.” This use of BBC content, though free, poses a new problem for me, a self-professed “Copy Leftist” who has long opposed copyright save for creative work.

Open-access scholarship, my own syllabi, and more in The Creative Commons are there to be scraped. The problem for me involves my and other creators’ words being used without any asking or attribution; this use violates the ethos of the Creative Commons. 20 years ago, I wrote to a Hong-Kong firm that had used our online handbook pages, verbatim, without acknowledgement. I told them I’d be contacting every e-list I knew to show that they had done this. They relented and gave our creators credit. I gave them my blessing to use our content under that one condition.

I’ve long advocated having everything save classified government information and creative work given away, free. That was one promise of the original Internet. Just cite it if you scrape it. I dislike copyright for other materials intensely.

Now I’m thinking that Web-crawlers and other bots that scrape data pose an even larger problem than copyright laws and pay walls. We may need to revise copyright laws to require attribution even for Creative-Commons work, or to watermark all AI-scraped content.

Scrape the barrel for new words and metaphors, then send them to me at jessid -at- richmond -dot- edu or leaving a comment below.

See all of our Metaphors of the Month here and Words of the Week here.

Creative-Commons image courtesy of lab.howie.tw

Word of the Week! Agentic

Travel Agency, Glasgow ScotlandWe have this week a neologism that I encounter, suddenly, almost daily. The word is one you need to know, if the enthusiasts for certain technologies are not stretching the truth. Our word proves too new even to appear in The OED.

Soon it will. But what does it mean? In the current contexts about artificial intelligence, “agentic” means autonomous, making decisions on their own. Agentic AI does more than answer a query; it can be given parameters for complex tasks and then go about solving them in a manner it best sees fit. Human input may not be needed by such systems.

I’m thinking of travel a lot lately, and how, before I moved to Spain in 1985, I went to a travel agency with lots of general ideas. I then relied upon them to provide me with several affordable options for touring France and Spain before I arrived in Madrid, for a job interview that led to my first paying teaching gig.

Flash forward 40 years: tonight I dined at a really fine place in Richmond’s West End for sushi and sashimi. I’ve been curious about it since spotting it, so for about 20 minutes I read I used my phone to read reviews, comparing notes others left, looking at how it ranks with other similar places.

Flash forward again to the year 2030: Had I an agentic AI to help, I could have simply said “Hal, could you brief me on the strengths and shortcomings of the food at XYZ? I’m thinking of going.” No huge prompt needed. Hal would perform a number of tasks to discuss happy or unhappy reviews, prices, comparisons, even where the place sources its seafood. It could find out that Kirin Beer, my dad’s favorite, was on tap. I quaffed one in his honor today.

Agency of this sort does not, luckily, imply sentience; I’ve covered the term sentient here before. Even so, these new AI systems already have reshaped industries, if New York Times reporter Kevin Roose’s work holds true. You will need to get past the paper’s paywall to read the entire story, but Roose’s latest column focuses on the downturn in employment for recent college grads. In a podcast Roose prepared from this article, he and his cohost claim that for young college-educated workers, “if you look at the unemployment rate for college graduates right now, it is unusually high. It’s about 5.8 percent in the US. That has risen significantly, about 30 percent since 2022.”

This while overall unemployment stands at historic lows. One theory? Industries are simply automating a record number of entry-level positions. The replacement has proven acute in fields such as finance and computer science.

As agentic AI expands its scope and abilities, how many more jobs will also vanish?

These are questions we humans need to ask, we who have been agentic for longer than our primate ancestors had fire or went on two legs.  We need to have a conversation, too, about how agentic we want our cybernetic companions to become. That’s not a doomsday warning, but if one loses a career and means of support, the outcome is dire.

As I tell my students, “if you want a job soon, you must add value to AI output.” Too many of them see AI as a shortcut. That’s not wise. Yet even if the students get wise about leveraging AI’s abilities, agentic AI may make statements like mine sound, and here comes a future metaphor of the month, like whistling in the dark.

AI did not write this piece, but if you or your AI have words or metaphors useful in academic writing, send them along by e-mailing me (jessid -at- richmond -dot- edu) or leaving a comment below.

See all of our Metaphors of the Month here and Words of the Week here.

Creative-Commons image from Flickr of a wonderful old-school travel agency in Glasgow, Scotland’s Great Western Road. I’ve been there, but the agency must be long gone.