Society for the Teaching of Psychology: Division 2 of the American Psychological Association

Adding advanced data science skills in psychology laboratory courses

17 Aug 2025 6:21 PM | Anonymous member (Administrator)

James Mantell & Aileen Bailey
St. Mary's College of Maryland

Acknowledgements: NSF IUSE: EDU Award 2235645 (Developing Modernized Data Science Instruction in Psychology Curricula).


Data Science in Psychology

Data science is a growing, interdisciplinary field that leverages computational methods to interpret large datasets. Students with data science skills are better prepared for research opportunities and they are more competitive across multiple career paths (Business-Higher Education Forum, 2017; National Association of Colleges and Employers, 2024). However, undergraduate data science training is limited (Marshall & Geier, 2020; Yavuz & Ward, 2020). Psychology is an excellent candidate for data science training because psychology and data science share core knowledge areas including research design, quantitative literacy (e.g., probabilistic knowledge, visualization literacy; Börner et al., 2019), and critical thinking (e.g., reasoning, problem solving; Halpern, 2013). Moreover, many psychologists practice data science techniques in their research that can be developed in the classroom.

Focus on Data Science Skills

Before adding data science course material, psychology instructors should consider their pedagogical situation to determine whether data science content is compatible with their existing program or course learning outcomes. Some psychology programs exert more control over curricular material; such restrictions may limit the instructor’s freedom to introduce new course experiences or skills. Alternatively, topics and lab courses may offer more flexibility for instructors to focus on data science skill development. In any case, instructors should know that there is no need to reinvent a course that works well. Data science can be introduced in modular, manageable lessons that complement existing psychology course goals.

After determining how much flexibility they have in developing their course or curriculum, instructors should identify the data science skills that they want to teach. For example, we designed our psychology data science course materials to help students develop four families of data science skills that are linked to successful professional outcomes. Our psychology data science courses are presented in a 300-level laboratory course format, with 12 students per section. It is also possible to teach data science in larger sections, especially with TAs who can supplement instruction. Computer programming is an essential data science skill, but in many psychology programs, dedicated computer programming instruction may be limited to statistical coursework. Data management includes ethical data acquisition, transformation, and distribution. While psychology students increasingly understand the need to transparently communicate their research design via preregistration, it is less common for them to understand how to ethically share their data to enable other researchers to replicate their results. Data visualization, analysis, and modeling enable data scientists to extract meaning from data by combining analytical and visualization techniques. While these topics are introduced in psychological statistics courses, data science techniques present opportunities for deeper exploration including highly customizable visualizations and advanced techniques such as machine learning. Data storytelling with code offers a unique approach to scientific communication that enables the audience to replicate and extend the analyses in ways that can enhance transparency and motivate new discoveries (Granger & Pérez, 2021). A successful data scientist must be able to describe their analytical workflow just as clearly as they can show their results.

Psychology Data Science Instruction Ideas

With respect to computer programming, we chose to teach python in our courses because it is a readable, flexible, and popular programming language within data science. Thus, our course assignments feature python-based platforms including Jupyter Notebook (an open-source, web-based, interactive coding platform; https://jupyter.org/) and PsychoPy (Peirce et al., 2019). However, some instructors may prefer to teach coding via R (https://www.r-project.org/), especially if they are already familiar with the language or their program includes coursework with R integration. The choice of programming language is not important; rather, the instructors’ commitment to integrate coding into the classroom—in any format—will accelerate students’ data science skill development.

To ensure that all students begin with a foundational knowledge of computer systems, we developed lessons for our lab courses that enable students to practice their knowledge of file hierarchies (e.g., by comparing file visibility within Notebook and their OS file explorer). Text-based programming lessons begin with simple text formatting via Markdown, wherein they learn to switch between code and text cells within their Notebook. After students have gained familiarity with the Notebook interface and practiced basic coding with Markdown, they are ready to learn how to use python for data science skills including data management and visualization. We teach students how to use pandas (https://pandas.pydata.org/) to import, access, and transform datafiles. Along the way, students learn about the reproducibility advantage of using python for data management. For example, by writing their data transformation code in a Jupyter Notebook, they permit others to replicate their work. We additionally teach students how to conduct basic data analyses with pandas, including descriptive statistics.

Instructors and students may find the most value in learning how to use data science visualization tools, which can offer superior graphical capabilities in comparison to popular statistical software. We prefer to teach data visualization with seaborn (Waskom, 2021) because it produces elegant, customizable graphs. Our visualization lessons vary depending on the level of knowledge that students bring to the session, which increases with practice across the semester. For beginning lessons, we present verbose instructional Notebooks including, for example, a codebook to describe a research design and cells containing pre-built code that students can execute to produce informative graphs. For intermediate lessons, we provide simple challenges that empower students to explore seaborn functionality (e.g., format graph options; compare plot styles; create combination graphs such as a bar plot with overlaid data points). For advanced lessons, we provide a dataset and a goal (e.g., create a figure that addresses the hypothesis), which often leads to critical thinking opportunities (e.g., what is gained or lost from choosing one kind of visualization over the other). An iterative teaching approach increases students’ understanding of data science workflows and enables them to see the commensurate benefits of transparent science communication. For more information about our ongoing project, please visit our OSF page (https://osf.io/xze7n/); we will freely share our course materials at the end of our project in 2026.

Successes, Challenges, and Suggestions

Our students’ abilities to achieve our data science learning outcomes suggests that psychology students, even those without any previous computer programming experience, can successfully learn data science techniques within a single semester course. We are convinced that psychology students have the background knowledge to learn data science skills, and we are confident that learning those skills will be beneficial to their long-term professional development. Moreover, data science instruction can enhance students’ research design, quantitative literacy, and critical thinking knowledge in ways that substantially bolster their psychology research capabilities. Depending on program goals, data science skills could be taught across the curriculum with a gradually increasing depth of coverage, or they can be presented within a small number of courses to complement student and faculty preferences for such material. No matter which approach instructors take, it is worth noting that data science skills, like many others, must be practiced to promote retention. Thus, we suggest that data science instruction should appear soon after statistics and methods coverage, and occur in more than one course experience. It is also important to remind students to note their growing data science skills on their resumes.

Successful psychology data science instruction requires planning, piloting, and refinement. If possible, instructors should introduce data science skills within courses that they will have the opportunity to teach again in the future so that they can improve their approach. For example, after teaching advanced data visualization in our courses, we realized that we had to remove some of our preexisting course content to make room for our new data science course experiences. Instructors will have to choose how to divide their limited class time between content knowledge (i.e., textbook material; lecture instruction) and skill development (i.e., practical applications; active learning), and their course revisions should be informed by program and course learning outcomes. Instructors should not expect to design the perfect course on their first try. Instead, by accepting that obstacles will arise, they can commit to iterative improvement within and between semesters. Finally, instructors should be prepared to address some students’ anxiety about computer programming. We suggest that they begin with introductory lessons and foster an encouraging learning atmosphere in the classroom. For example, by repeating mantras like “anyone can be a programmer”, instructors emphasize that data science skills, like all others, are achievable via practice. In conclusion, psychology students are prepared to learn data science skills. It is up to instructors to offer them the opportunity.


References

Börner, K., Bueckle, A., & Ginda, M. (2019). Data visualization literacy: Definitions, conceptual frameworks, exercises, and assessments. Proceedings of the National Academy of Sciences, 116(6), 1857–1864. https://doi.org/10.1073/pnas.1807180116

Business-Higher Education Forum (2017). Investing in America’s data science talent: The case for action [White paper]. PwC. https://www.naceweb.org/uploadedfiles/files/2018/publication/free-report/bhef-investing-in-data-science.pdf

Granger, B. E., & Pérez, F. (2021). Jupyter: Thinking and storytelling with code and data. Computing in Science & Engineering, 23(2), 7-14, https://doi.org/10.1109/MCSE.2021.3059263

Halpern, D. F. (2013). Thought and knowledge. Psychology Press. https://doi.org/10.4324/9781315885278

Marshall, B., & Geier, S. (2020). Cross-disciplinary faculty development in data science principles for classroom integration. SIGCSE '20: Proceedings of the 51st ACM Technical Symposium on Computer Science, 1207-1213. https://doi.org/10.1145/3328778.3366801

National Association of Colleges and Employers. (2024). Career readiness: Competencies for a career-ready workforce [Fact sheet]. NACE. https://www.naceweb.org/docs/default-source/default-document-library/2024/resources/nace-career-readiness-competencies-revised-apr-2024.pdf

Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., & Lindeløv, J. K. (2019). PsychoPy2: Experiments in behavior made easy. Behavior Research Methods, 51(1), 195-203. https://doi.org/10.3758/s13428-018-01193-y

Waskom, M. L. (2021). Seaborn: Statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/https://doi.org/10.21105/joss.03021

Yavuz, F. G., & Ward, M. D. (2020). Fostering undergraduate data science. The American Statistician, 74(1), 8-16. https://doi.org/10.1080/00031305.2017.1407360


Powered by Wild Apricot Membership Software