Society for the Teaching of Psychology - A case study alternative to papers, adjusting to recent AI developments

News
Resources
STP Publications
E-xcellence in Teaching Blog
A case study alternative to papers, adjusting to recent AI developments

A case study alternative to papers, adjusting to recent AI developments

28 May 2025 3:54 PM | Anonymous member (Administrator)

Brian Stone
Boise State University

Much has been written in the past couple years about how large language models (LLMs) like ChatGPT, Gemini, and Claude threaten the traditional essay or paper as an assignment. Indeed, at this point LLMs can write in ways that are hard for most instructors to tell apart from human writing (Scarfe et al., 2024). (I invite reflection on the toupee fallacy.) For those instructors who played around with earlier LLMs like ChatGPT 3.5 and haven't kept up with more recent developments and services like ChatGPT Deep Research (powered by the o3 model) or Gemini Deep Research, you may be surprised that the technology has largely moved beyond the point of hallucinated references and vague summarizing.

While frontier models may be financially out of reach for students right now (ChatGPT Pro is $200/month), other services like DeepSeek are stepping in with freely accessible models that bring more up-to-date performance into reach for cash-strapped students. Meanwhile, countless paper-writing apps layered onto existing LLMs are highlighted to students on TikTok and Instagram through targeted marketing and influencer partnerships. Students are definitely using generative AI, and my recent data found that around 60% of students self-report having used AI to cheat (up from around 40% a year earlier in a nearly-identical sample; Stone, 2025).

Instructors have adopted a variety of strategies in response to the easy availability of AI-generated papers, some of which have proven less robust than others to recent technical developments. After briefly surveying this, I will share a specific AI-resistant paper alternative I've experimented with in the past two years.

Instructor Strategies Meet Recent AI Developments

AI detection faces signal detection issues like false negatives and false positives, and research on detectors is not promising at this point (Malik & Amjad, 2025; Tufts et al., 2025). Meanwhile, false accusations damage student-instructor relationships. In my recent survey 11% of students reported being falsely accused, and it may be hitting first-gen students at a higher rate than non-first-gen (Stone, 2025). Regardless, most detection methods fail when AI-generated writing is adjusted with further AI paraphrasing (Sadasivan et al., 2024) and "humanizing" apps are advertised widely to students on social media to make AI papers undetectable.

Some instructors argue we should concentrate on the students who genuinely try: put the responsibility onto students to figure out when AI use helps or harms their learning, or maybe we just need to talk to students and convince them of the value of exercising their own mental muscles. Others decry the costs of developing an adversarial relationship with students, creating anxiety through surveillance, inducing worries about false accusations, and over-emphasizing performance and product over process and learning.

Early attempts to make writing assignments AI resistant are falling to recent technical advances. LLMs can generate outlines or draft papers in parts and stages (with deliberate errors and typos) and directly incorporate instructor comments into iterative drafts. Likewise, focusing assignments on events more recent than the model's training cutoff no longer works given that current models can browse the internet, access daily news, view preprint articles, and analyze [the transcript of] a YouTube/TED talk. Referencing specific details and examples from course readings or slides is easy for LLMs that now allow attachments like slides, articles, or even entire books. Meeting-recording tools like Otter.ai may soon allow a student who wasn't in class to reference specifics from that day's discussion into AI writing.

Many instructors are requiring students to compose in Google Docs with shared editing access or in Word with track changes or to use Grammarly's Authorship tool. This allows an instructor to see time-stamped writing and editing history, and browser plugins like Draftback visualize the editing process to spot odd behavior like pasting in large chunks of text. Nothing stops the student from typing in AI text word-by-word (though this behavior looks different from genuine writing). Yet recent developments in agentic AI like OpenAI's Operator appear already capable of creating and writing into a Google Doc through a process that looks superficially similar to real human writing (writing in bursts, removing and retyping, copying and pasting bits around), so we are back at an arms race.

Some (including myself) are moving the writing process and/or assessment in-person for non-online courses (blue books are back!). However, this limits class time for other activities, limits the length and scope of writing, and can be complicated due to slow or anxious writers, poor handwriting, and accommodations incompatible with time-limited writing. That said, written communication in general and lengthy structured writing in particular may be their own learning objectives, in which case we may just have to satisfice and accept some amount of cheating, as we always have.

Other instructors may directly integrate AI into their courses, ideally in a way that teaches critical AI literacy and ethical usage. AI may be integral to many future careers and become an important part of psychology education specifically (Koumpan & McOwen, 2024; Lim & Lee, 2024). The skillful psychology student should demonstrate adaptability to new technology, says the APA. Maybe we just need to teach students how to credit their AI usage (Albada & Woods, 2024); indeed, the APA style guide tells us how to cite AI. However, while using and integrating AI may become part of psychology student training (including to train critical thinking; Costello et al, 2024), this may not be a good fit for all courses or would require sacrificing other learning objectives. Furthermore, integrating AI in the wrong way may in some cases actively harm learning (Bastini et al., 2024; Gerlich, 2025; Lee et al., 2025; Spatola, 2024; Stadler et al., 2024).

Finally, many instructors are moving away from essays and papers altogether. For example, some have integrated social annotations like Perusall to encourage critical reading. Others have moved toward podcasts, videos, and infographics as paper alternatives, and while these can be great ways to serve and assess learning objectives, these other mediums are still susceptible to AI usage. Not only can recent AI write an engaging and accurate script for a podcast, free services like Google's Notebook LM can generate an entire podcast on any topic imaginable. Likewise, AI can generate the script for a video or presentation and some apps are starting to generate decent slide decks and infographics. At this stage, the average AI-generated project may still be worse than a good student-generated project, but it won't be surprising if that gap shrinks in the near future.

Live Critical Discussion: My Book Club Case Study

Another alternative to essays and papers that could still serve many of the same learning objectives would be critical discussion by students in groups, and that's what I want to focus on for the rest of this piece. As you'll see, it may not completely solve the AI problem, but for now the in-the-moment nature of synchronous discussion seems harder to fake.

Specifically, I want to share the case study of how I moved from having students in my 100-person upper-division cognitive psychology course write a paper on Kahneman's book Thinking, Fast and Slow to having them record book club-style critical group discussions. In both versions of the assignment, students had to reference specifics from the book to make their points, come up with connections to their own life, and talk about their take-aways from the book. In previous years, they were required to incorporate that into a paper about the book; in recent semesters, they have done so as part of their preparation notes for and live contributions during a group discussion.

Specifically, twice during the semester I have groups of roughly seven students meet synchronously over Zoom for 75-90 minutes each time. In my case, they address half of the book in each meeting, but this could be split up differently or done all in one session. The key is requiring them to use Zoom's "record to cloud" feature (or similar for other platforms) so that they can share the entire recording with me afterward.

Their preparation before a book club consists of reading and jotting down notes on each of the assigned chapters, but also doing a deep dive and extra preparation on a subset of chapters which they lead some critical discussion on.

There's no doubt a student could feed the book into an LLM and ask for notes on each chapter (just as some students have historically turned to Cliff's Notes rather than the original source), but such students tend to have much lower-quality contributions and reactions during the discussion. Indeed, the times where preparation notes have looked 'off' (likely auto-generated) have been the times where the same student seems to read from their notes verbatim when 'leading' discussion of their chapters and the times where a student does very poorly and superficially at responding to their peers in the moment. More importantly, a large portion of their grade comes from the quality of their emergent discussion and it's pretty clear when someone has done the reading themselves and can pull some examples or make connections on the fly.

During their synchronous meeting, students are required to contribute actively throughout. They are also encouraged to invite quieter teammates to speak up. After the discussion, they submit the recording link and a post-meeting reflection in which they report on the discussion (e.g., specific examples that changed their mind) and can confidentially name any teammate who went above-and-beyond or seemed entirely unprepared and unengaged. I also have them submit screenshots of their handwritten notes and prep as part of the submission (this can include pictures of annotations in the book).

I watch chunks of the discussions on high playback speed while looking for individual contribution quality, and in the end, I have found the grading time to be far shorter than when assigning papers. In a class of 100, I have 14 videos to skim through, and the individual student's prep and post-meeting reflection can be graded at a fairly quick glance using a rubric. No need to watch an entire video: as I skip around and watch selections on high playback speed, I click through the rubric for each student when I've witnessed a sufficient number of high-quality, on-topic contributions throughout and ensured they aren't reading from a script when discussing their assigned chapters to lead.

I have used this assignment in synchronous courses (in-person and online), in which case the groups meet during our assigned class time in order to keep scheduling simple, but I have also used it in asynchronous courses. In the latter case, it requires some logistical work ahead of time to set up scheduling: I provide a little logistical support in the form of sign-up sheets for possible times and a discussion board to propose alternative times. In other words, for asynchronous courses, they group based on schedule availability rather than me assigning groups first.

In feedback and course evaluations for asynchronous courses, students have told me this one synchronous assignment really increased their sense of belonging and connection to their peers far better than alternatives like discussion boards.

This assignment is certainly not identical to writing a paper about a scholarly book, but for me it has served to retain many of the same learning objectives (around critical reading, communicating, making connections to course concepts). Crucially, it seems harder for students to do well if they haven't done the work (i.e., read the book and given it some thought) because the nature of live critical discussion requires students to respond in the moment to arguments and connections raised by their peers.

While this assignment idea is far from solving the AI issue, I have found it to be a fun updated assignment option that -- for now -- seems better than traditional papers at guaranteeing active and critical engagement with material like a scholarly book.

References

Albada, N. A., & Woods, V. E. (2024). Giving credit where credit is due: An artificial intelligence contribution statement for research methods writing assignments. Teaching of Psychology. Advance online publication. https://doi.org/10.1177/00986283241259750

Bastini, H., Bastini, O., Sungu, A., Ge, H., Kabakci, O., Mariman, R. (2024). Generative AI can harm learning. The Wharton School Research Paper, 1-59. https://doi.org/10.2139/ssrn.4895486

Costello, T. H., Pennycook, G., & Rand, D. G. (2024). Durable reducing conspiracy beliefs through dialogues with AI. Science, 385(6714). https://doi.org/10.1126/science.adq1814

Gerlich, M. (2025). AI tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies, 15(1), 1-28. https://doi.org/10.3390/soc15010006

Koumpan, E., & McOwen, L. (2024). Revolutionizing talent: The path in 21st century workforce transformation. In: V. Salminen (Ed.), Human factors, business management and society vol. 135 (pp. 74–83). AHFE International. https://doi.org/10.54941/ahfe1004932

Lee, H., Sarkar, A., Tankelevitch, L., Drosos, I., Rintel, S., Banks, R., & Wilson, N. (2025). The impact of generative AI on critical thinking: Self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. Proceedings of the ACM CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3706598.3713778

Lim S. C. J., Lee M. F. (2024). Rethinking education in the era of artificial intelligence (AI): Towards future workforce competitiveness and business success. In A. O. J. Kwok & P. Teh (Eds.), Emerging technologies in business (pp. 151–166). Springer. https://doi.org/10.1007/978-981-97-2211-2_7

Malik, M. A., & Aljad, A. I. (2025). AI vs AI: How effective are Turnitin, ZeroGPT, GPTZero, and Writer AI in detecting text generated by ChatGPT, Perplexity, and Gemini? Journal of Applied Learning & Teaching, 8(1), 1-11. https://doi.org/10.37074/jalt.2025.8.1.9

Sadasivan, V. S., Kumar, A., Balasubramanian, S., Wang, W., & Feizi, S. (2024, February 19). Can AI-generated text be reliably detected? ArXiV preprint. https://doi.org/10.48550/arXiv.2303.11156

Scarfe, P., Watcham, K., Clarke, A., & Roesch, E. (2024). A real-world test of artificial intelligence infiltration of a university examinations system: A "Turing Test" case study. PLoS ONE, 1-21. https://doi.org/10.1371/journal.pone.0305354

Spatola, N. (2024). The efficiency-accountability tradeoff in AI integration: Effects on human performance and over-reliance. Computers in Human Behavior: Artificial Humans, 2(2), 1-10. https://doi.org/10.1016/j.chbah.2024.100099

Stadler, M., Bannert, M., & Sialor, M. (2024). Cognitive ease as a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry. Computers in Human Behavior, 160, 1-7. https://doi.org/10.1016/j.chb.2024.108386

Stone, B. W. (2025). Generative AI in higher education: Uncertain students, ambiguous use cases, and mercenary perspectives. Teaching of Psychology. Advance online publication. https://doi.org/10.1177/00986283241305398

Tufts, B. Zhao, X., & Li, L. (2025, February 9). A practical examination of AI-generated text detectors for large language models. ArXiV preprint. https://doi.org/10.48550/arXiv.2412.05139