Tuesday, 11 January 2022

Rytr Reviewed: How the GPT-3 ‘AI Writing Assistant’ Performs In Real Life

As consumer and corporate interest in artificial intelligence has grown, the number of companies purporting to offer AI-powered software has exploded. Microsoft has a web page devoted to the various ways it has baked AI into Office and Bing. Products like Topaz Video Enhance AI offer new tools for improving video content. Even NASA has recruited citizen scientists to help improve Perseverance’s image recognition algorithms.

One area where AI tools now claim to offer far better performance than in the past is automated text generation. The idea is that because these tools are powered by various neural nets, they can compose better and more humanlike prose than ever before — to the point of sometimes being indistinguishable from human writing. One of these relatively new tools is an app called Rytr. Rytr bills itself as “A better, 10x faster way to write,” and claims “2,500,000+ hours and $50 million+ saved in content writing so far.”

According to Rytr’s splash page, customers can use its software to automatically generate “catchy, original, and high-converting copies in popular tones & languages in just a few seconds. Just pick a use case, enter some context, and boom…your copy is ready!” It also says, further down, that a writer using Rytr can produce a thousand-word document in fifteen minutes. This is a testable claim, and worth investigating. So we shelled out for the app’s unlimited-use monthly plan, to see what we could make of all these promises.

Rytr’s pricing structure.

Once logged in, we were able to start a new document and put the AI to use within seconds. When creating a new document or section, Rytr offers a wide selection of use cases, including bullet-point outlines, blog sections, interview questions, product reviews and testimonials, marketing content, and many others. It even has a use case for generating song lyrics. From another drop-down menu, the writer can choose the tone in which Rytr will phrase the content it generates using descriptive terms like awestruck, joyful, informative, worried, and funny. It’s easy, but not necessary, to guide the AI a bit by writing a few words or sentences in a blank document, and then invoking Rytr’s assistance.

Rytr does not specifically claim that its service can generate factually accurate writing, but many of the categories it lists (excerpted above) require accurate text in order to be useful to potential customers. An AI that can’t write factually accurate content isn’t useful in many instances.

When choosing the use case and tone, the writer also provides some “seed text” that the AI will use to find and generate its own content. The seed text often includes a title, description, and a field for the user to enter as many relevant keywords as they can fit within the character limit. Once you tell the AI what you want it to do, you can even ask it to provide you a few variations on the same theme. And it can handle natural-language queries, to a point. Ask it to write you three paragraphs about the sequence of events on April 19, 1775, and it’ll do precisely that; ask it for three variations, and you wind up with a chunk of content you can either use as-is or mine for your own efforts.

The app is highly accessible, and responsive even when the computer was running Excel and a full-screen game in the background. There aren’t a lot of menus; much of what you can do is accessible via right-click. Rytr is full of tools to nudge what you’ve already written. The user can ask Rytr to finish a thought, expand or shorten a phrase, or figure out a different way to say something. This ability to scrape together copy from just a few keywords is key to the app’s advertised ability to defeat writer’s block. Better still, the nudge tools work on user-entered text, not just text Rytr generated. The app does its work within a text editor where the user can freely compose their own original ideas, and that’s an important part of what makes Rytr useful. You don’t have to start with what Rytr can come up with; in fact, results are much better if you start with your own ideas, and use Rytr to help elaborate on them. But if you have blank-page-itis, it really can help.

While the app is mostly advertised for the business and marketing world on its website, we found that Rytr has features useful to many students at the college level. It can work within the AIDA and PAS marketing content formats, which is a standout for marketing students and professionals. But Rytr’s ability to shorten or expand on an idea is generally useful during the drafting and workshopping phases of composition. Learners who are engaging in freewriting can use the app to trim, shape and polish a freewrite into a solid piece of content that still reflects their own original voice. It can even be used to nudge content toward a word count, although making a sentence more concise usually got better results than making it more verbose.

In language learning, reading comprehension usually comes before written or spoken fluency. Consequently, second-language learners can benefit greatly from using software like Rytr. If you aren’t confident in your written English, for example, but know how to read enough English to comfortably fact-check the content Rytr creates, it can be a powerful tool for second-language composition. In particular, Rytr is a useful aid in language learning through journal entries. If you don’t know how to say something, you can ask Rytr to work with your best attempt, and it can demonstrate correct spelling and solid grammar. The student can choose a tone for their entries that matches what they mean to convey and how they wish to say it. But it’s not magic, the company’s claims to the contrary. It’s up to the learner, ultimately, to use Rytr as a tool instead of a way to cut corners.

As a composition tool, Rytr isn’t quite fire-and-forget. The app bills itself as a time saver, especially for long-form content, but this is where it really falls on its face. It is absolutely true that Rytr can speed up the writing process. But the more technical the content, or the more heavily the user relies on the AI’s ability to compose from scratch, the greater the risk that the text the AI produces will contain statements that are flat-out wrong. While its grammar is solid, its facts aren’t as reliable, and that means you still have to spend a bunch of time fact-checking its work. With the GPT-3 model at its back, you’d think the app’s research track record would be better; GPT-3 indexes Wikipedia as a whole, along with many .edu, .gov and other generally trustworthy resources. But it requires constant shepherding in order to make sure everything checks out. With respect to the challenge of writing a thousand words in fifteen minutes, it’s plausible that a writer could get there writing about a topic they already knew inside and out, but technical writing remains a more ambitious goal.

Even when you specifically tell it what to say, the app still has some problems with fact-checking. Take the example of April 19th, 1775: the date of the “shot heard ’round the world.” In some places Rytr stumbles when linking names, places, events and dates, even for undisputed parts of this exhaustively discussed period of American history.

Similarly, it doesn’t do well with technical reference. We tested a broad swath of topics that are more and less technically detailed, including the anatomy of the human spinal cord, stages of a dicot bean seed’s growth, the American Revolution, the key features of a PCI Express connection, steps to correctly perform CPR, and characteristics of the Intel 12900K and the Ryzen 5950X. In this example, where we explored the AI’s historical accuracy and readability, we’ve highlighted a sentence composed by Rytr that was indistinguishable from a sentence written by a human.

“One year later (1779), France sent over 6800 soldiers to America under the command of Jean-Baptiste de Rochambeau, intending to pressure Britain into accepting American independence.”

That benchmark is a kind of Turing test for neural nets. It shows just how aware an AI is of the subtle underlying grammatical dumpster fire we call written English. And this wasn’t a one-off success. Most of the prose generated was on the same reading level as a newspaper or magazine, with fluid grammar and even multiple clauses. Under the hood, Rytr uses the same GPT-3 neural net recently discussed here and elsewhere for its unique ability to interpret natural language queries and produce intelligible, functional snippets of computer code.

Let’s look at some specific examples of how Rytr performs in various scenarios.

Reviews:

We asked Rytr to write some CPU reviews after giving it common terms like “AMD”, “Intel,” “5950X”, “12900K”, and “Ryzen.” How you configure the keyword settings will impact the final result. Some reviews cut off partway through because of length restrictions; Rytr will finish these hanging sentences if you highlight the text and ask it to do so.

These reviews were generated when we asked Rytr to review the Ryzen 9 5950X and the Core i9-12900K without giving the application any information on which CPU should be considered better. We provided categories for comparison but no information on which CPU should win any given category.

The reviews above were generated when we told Rytr to assume the AMD Ryzen 9 5950X won the competition against Intel.

This last set of reviews is what Rytr generated when we tweaked the application to be even more positive towards AMD.

Many of these reviews sound as if they could’ve been written by humans. In that regard, they succeed beautifully. Many of them also make factually incorrect statements about the CPUs they compare. A few echo out-of-date comparisons and stereotypes.

Non-Technical Text

The text samples below illustrate Rytr’s ability to compose factually accurate, human-sounding statements and paragraphs when asked to write a short article on the American Revolutionary War. When the service is good, it’s very good.

The worst that can be said for this snippet on the Revolutionary War is that it’s a bit simplistic. Factually, it holds up. Unfortunately, Rytr’s output isn’t always accurate:

This list of the 13 colonies omits New York, New Jersey, and Maryland. It also misidentifies Massachusetts as “Massachusetts Bay.” The text is taken almost verbatim from WeThePeople.scholastic.com, the only difference in the order of colonies are the colonies Rytr omits. The application has substituted the name of Massachusetts Bay Colony (1630-1691) when referring to Revolutionary War-era Massachusetts (1691 – 1780).

Sample analysis continues below:

The troops arrived in Boston in 1768, but the Boston Massacre did not occur until 1770. This phrase “that event” conflates the 1768 occupation with the Boston Massacre.

The first paragraph is accurate, the second is not.

George Washington was appointed commander-in-chief of the Continental Army on June 19, 1775, not 1776. Washington did not fortify New York City against a supposed invasion of Canada(!) by the British. The AmeriBritish troops under General John Burgoyne had decisively turned the tables against the US by the middle of 1776 and the Americans were retreating back towards Crown Point, New York by mid-June. The proposed British plan was for Burgoyne to invade from Canada while Howe pushed north from New York City. This plan, known as the Saratoga campaign, failed because Howe moved his forces south to capture Philadelphia rather than heading north to meet Burgoyne.

Howe’s appearance in New York City was not a surprise. Washington had anticipated that Howe might attack NYC after the British general left Boston in March 1776. Washington moved the Continental Army to defend New York City because he correctly anticipated Howe’s next target. The attack on Long Island in August 1776 was not a surprise attack. Howe did not launch an attack on New York City “instead” of launching the Saratoga campaign, much less “invading” Canada. Capturing New York was a necessary step in securing the Hudson River Valley and splitting the Northeast and Northeast supply lines.

The text generated by Rytr is a mixture of factually accurate and inaccurate sentences. At least some of these inaccuracies are presumably being generated by the AI’s attempt to parse its sources and construct coherent claims. There is no way to identify which claims are accurate and which are not is which without fact-checking every statement. Full text below:

Technical Text

When Rytr is good, it’s good. When it’s bad, though, it can be hilarious. Sometimes the GPT-3 AI tips its hand and comes up with gems like these, evaluated for accuracy by our managing editor and CPU reviewer, Joel Hruska:

AMD’s top CPUs in 2019 were based on 7nm. Cache structures do not have ALU logic units and we don’t typically discuss cache in terms of its pinout. Ryzen’s FPU is not based on ARM designs and AMD does not claim a technical advantage over Intel for this reason.

The claims in the paragraph above are mostly accurate, though the phrasing here is deeply awkward. This is an example of Rytr getting it Ryteish.

That’s going to be news to Intel.

Rytr was stuck on the idea that the Intel Pentium lacks a system bus, but helpfully clarifies that it has “several” bus interfaces. The Pentium, for the record, absolutely has a system bus. The comment on PCI bridge support would seem to relate more to the chipset than the CPU, but some Pentium motherboards did support two PCI bridges.

Intel’s latest products did not introduce these features and the definition of a “node” is flat wrong. The other terms are correctly defined but mistakenly attributed as new capabilities. The text is rhetorically sophisticated and factually empty.

Full text below. Rytr doesn’t get everything wrong, but everything it says must be fact-checked. There are enough incorrect assertions to render the body of text suspect.

Missteps like this show the difficulty of integrating context by way of weighted associations between words frequently occurring together. Because of the weight put on the factual correctness of content in any technical setting, this tendency to get things mixed up took some of the luster off Rytr’s promises of magic. It’s at its best when the writer already knows most of what they want to say. Put bluntly, unless the content you want to put on it is meandering and content-free, this app is not going to write your monetized blog for you all by itself while you idly sip an umbrella drink and spam left-click.

Writing reviews by the dozen, however, is something it will do quite easily. Tell it what to say, and it’ll obediently spin you as many product reviews and testimonials as you like. We asked it to write us reviews purporting to be from customers who had purchased the latest CPUs from AMD or Intel and it happily gave us a dozen apiece. To this we have to ask: What is the legitimate use case for product reviews written en masse by a computer?

It’s already hard to find a source of trusted information. Whole fly-by-night websites arise to pirate the work of another website, run the content through a synonym substitution function, and slap the whole thing back online in hopes that someone, somewhere, will click on an ad. ExtremeTech has experienced this kind of AI-enabled piracy even though we’re a relatively small site if measured by volume of visitors. Even the online-retail titans struggle to contain the endless wave of fake reviews and testimonials. Some reports have shown nearly half of all internet traffic — half of everything done on the web — is conducted by botnets. There’s an old saying thought to have originated with the author Jonathan Swift: “Falsehood flies, and the Truth comes limping after it; so that when Men come to be undeceiv’d, it is too late; the Jest is over, and the Tale has had its Effect.”

There’s also a deep but subtle problem with plagiarism. Rytr includes a built-in plagiarism checker, powered by Copyscape. The tool is useful, but it’s something of a smokescreen. The Rytr app is capable of composing several paragraphs at a time, with sophisticated and flexible grammar. You can get a whole paper written that way, and when the app can get its facts straight, it’s high enough quality that you barely need to touch the content it generates. But the core definition of plagiarism is claiming credit for work that’s not your own. If Rytr wrote my term paper, how much of it is really my work?

Ultimately, our verdict is that Rytr is a powerful and versatile writing assistant, with a wide variety of tricks up its sleeve, but it is not ready for what we’ll call loadbearing composition. Like any limited AI, it can tell you facts and knit sentences into coherent paragraphs, but it struggles to understand. We found that the app is most useful when the writer already has a sense of narrative and all their facts straight, and needs help getting over writer’s block, finding a better way of phrasing something, or adjusting tone. If the user can very clearly and carefully specify what they want, the writing process can be fast and smooth. Rytr is a sophisticated writing assistant, best applied with a light touch.

Now Read:



No comments:

Post a Comment