An objective look at the Overhead Squat Assessment
Keith Krugh
PhD Student · Rehabilitation and Health Sciences · Idaho State University
Get notified when we publish new insights.
Narrated by Scarlett
The overhead squat assessment has been used for decades to spot compensation patterns in human movement. What the research says about it is more mixed than that ubiquity suggests.
Inter-rater reliability — how consistently different observers score the same movement — is the recurring concern. Supportive findings exist alongside it: one well-known study found substantial reliability for screening medial knee displacement, though even the favorable studies flag their own limitations.
So the picture isn't "good tool" or "bad tool." It's a tool that does some things reliably and other things unreliably, and the literature is honest about which is which. The nuance is the part that's interesting.
One more wrinkle worth observing: most of the published reliability work measures clinicians watching and rating movement — variability between human observers being the documented weakness. A setup where the user is placing their own positional markers on themselves is a meaningfully different scenario than what those studies were evaluating. In our opinion, the kind of methodological gap worth sitting with.
The reason any of this matters more than academically is access. Some people can't reach care quickly; others wait long enough that small things become bigger ones. The question of whether imperfect tools at home are meaningful compared to no tools at all is a real one.
This is the kind of question we expect to keep returning to and studying. We'll share what we find as we go.
The overhead squat assessment has been used for decades — by physical therapists, athletic trainers, and corrective-exercise specialists.
A tool that's used to spot compensation patterns in the human movement system.
What the research actually says about it is more mixed than that ubiquity suggests.
Inter-rater reliability — how consistently different observers score the same movement the same way — is the recurring concern. One validity-and-reliability study of a protocol that included the overhead squat concluded the instrument was valid by expert opinion but unreliable in practice, recommending more observer training [1]. Multi-rater studies of related lower-extremity screens land in the moderate range, not high [2].
Supportive findings exist alongside the concerns. A study by Post and colleagues found substantial reliability for using the overhead squat to screen specifically for medial knee displacement — though the authors flagged that their subjects were presumed healthy and the squats were rated from video rather than scored live [3]. Limitations named, even in the favorable studies.
So the picture isn't "good tool" or "bad tool." It's a tool that does some things reliably and other things unreliably, and the literature is honest about which is which. That nuance is the part that's interesting.
One distinction is worth holding onto: most of the sharper criticism in the literature targets using these assessments to predict future injury — a use the research has not strongly supported. Identifying compensation patterns to inform corrective programming is a narrower use, and the research treats it differently.
There's one more wrinkle worth observing. Most of the published reliability work is measuring something specific: clinicians watching a person move and rating what they see, with the variability between human observers being the documented weakness. A setup where the user is placing their own positional markers on themselves is a meaningfully different scenario than the one those studies were evaluating. Not necessarily better — but different in a way the existing literature doesn't directly speak to. It's the kind of methodological gap worth sitting with, in our opinion.
The reason any of this is more than an academic exercise is access. Some people can't reach care quickly; others wait long enough that small things become bigger ones. Self-directed approaches are imperfect, but the question of whether imperfect tools at home are meaningful compared to no tools at all is a real one.
This is the kind of question we expect to keep returning to and studying. We'll share what we find as we go.
References
Aragón-Vargas, L. F., et al. (2020). Validity and Reliability of the New Basic Functional Assessment Protocol (BFA). International Journal of Environmental Research and Public Health. PMC
Kollöfrath, A. M., et al. (2021). Visual assessment of movement quality: intra- and interrater reliability of a multi-segmental single leg squat test. BMC Sports Science, Medicine and Rehabilitation, 13, 60. PMC
Post, E. G., et al. (2017). The Reliability and Discriminative Ability of the Overhead Squat Test for Observational Screening of Medial Knee Displacement. Journal of Sport Rehabilitation, 26(1). PubMed
PhD Student · Rehabilitation and Health Sciences · Idaho State University
Founder, 3D Medical
Web & App Development3D Animation
3D Medical was founded to improve patient understanding and outcomes through clinically accurate, multilingual 3D animations. Drawing on Keith’s background in exercise science and visual communication—along with firsthand insight into where patients often struggle to understand their care—his work bridges critical gaps in healthcare delivery, from hospital discharge to home recovery and lifelong wellness.
3D Medical’s mission is to deliver visual education tools designed to reach every patient—regardless of language, literacy level, or location. The content is scalable, accessible across devices and bandwidth conditions, and built to strengthen understanding, boost confidence, and measurably improve health outcomes—while also reducing strain and cost for healthcare facilities.