How hard is the IIT JAM physics

Successful knowledge acquisition in the first semester of physics


The acquisition of specialist knowledge at a level that can be applied to the typical complex and abstract problems of university physics is a central challenge in the introductory phase. Various problems in this phase, such as the high dropout / loss rates, can lead to technical difficulties in this Context. In later studies, too, the specialist knowledge learned here is conceptually built up so that this phase can be viewed as critical.

The present study collects longitudinally the physical knowledge (mechanics) of 122 physics students at the beginning and end of the first semester. The students can be assigned to a level model, which describes the tasks of which complexity they can successfully deal with.

On the one hand, these levels represent an evaluation criterion for the quality of the existing specialist knowledge and can thus provide a target criterion based on requirements for successful specialist knowledge acquisition. On the other hand, the levels can provide an initial assessment of which level of prior knowledge is predictive for successful knowledge acquisition.

The data suggest a significant influence of physical and mathematical prior knowledge on the mean further development of knowledge; in particular, there are indications of an additive predictive power of prior school knowledge for the acquisition of university knowledge by the end of the first semester. The results suggest that, above all, the ability to deal with complex problems is predictive for university physics learning - even and especially if it was learned in the context of school physics.


Acquiring usable levels of content knowledge which can be applied to complex and abstract problems is one of the major challenges when studying physics at universities. A lot of current issues in the introductory study phase today (like high dropouts) can be traced back to problems acquiring that knowledge. Due to the cumulative nature of physics knowledge, the knowledge acquired in this phase is also critical for success in later studies.

The study presented here uses a longitudinal analysis of 122 students ’test results in physics content knowledge (mechanics) at the beginning and end of the first semester at university. Based on the test results participants are assigned competency levels according to the complexity of tasks they can handle.

These levels on the one hand serve as a criterion for the quality of the acquired knowledge and thus give a task-characteristics based target criterion for successful learning of content knowledge. On the other hand the levels can give a first impression of what amount and quality of knowledge is predictive for successful acquisition of knowledge.

The data show a major impact of physics and mathematics foreknowledge. Especially first indications of an additive impact of school knowledge can be observed. The major predictor for learning physics at university can be identified as the ability to solve complex problems - even and especially if that is acquired within school physics.


In physics at German universities - as in other MINT subjects - there is a diverse range of problems. technical difficulties occupy a central position (Heublein et al. 2014). The first year of study is particularly relevant here, as there are more frequent dropouts and changes of study (Heublein et al. 2017), and at the same time, due to the high cumulative nature of the physics degree, essential foundations are laid for further studies (Schecker and Parchmann 2006). Empirically, it has been shown that about a third of the students who remained in their studies, even after a few semesters of study in mechanics, the typical subject of the first semester, do not have specialist knowledge of an appropriate level (Woitkowski and Riese 2017). In addition to student retention and satisfaction, however, successful acquisition of specialist knowledge (in some cases ascertained via exam grades) is often the central feature of academic success (Albrecht 2011; Blüthmann et al. 2008; Freyer 2013; Fries 2002; Rindermann and Oubaid 1999; Sorge et al. 2016).

Buschhuter et al. (2016) report a general problem of fit between (especially mathematical) previous knowledge and study requirements. For chemistry, this could be identified as a central factor for the excessive demands perceived by many students (Schwedler 2017).

Sometimes z. For example, teachers have expressed the view that the physics course draws on previous mathematical knowledge, but usually not on physical knowledge (e.g. also Agarwala, 2015; Shumba and Glass 1994). In contrast, Buschhüter et al. (2017) a strong prior knowledge dependency of the exam grades at the end of the first semester.

For the critical analysis of technical knowledge acquisition, a measure of success that is objectively oriented towards requirement characteristics would be helpful (Kauertz 2008), instead of an orientation towards the sample or purely numerical cutoff scores (such as in IQB 2013). Similarly, the question arises for prior knowledge whether “a lot helps a lot” applies here, or whether more specific statements can be made about the amount or quality of the necessary prior knowledge.

In the following, a level model is created on the basis of a complexity model, which is used as a criterion for describing the acquisition of specialist knowledge in the first semester. For this purpose, longitudinal data are analyzed, with which the technical prior knowledge of students with successful and less successful knowledge acquisition (in terms of the complexity criterion) can be characterized and contrasted. The same level model is used for a critical analysis of the prediction of knowledge acquisition through prior knowledge. An analysis of the whole phenomenon Study success and dropout However, this is not the aim in its many facets.

The contribution is based on the logic of a larger research program in which a test instrument was developed based on a competence structure model (Woitkowski et al. 2011) (Woitkowski 2015). With the data collected in this way, it was possible to create a competence level model (Woitkowski and Riese 2017), with which a tool is available for monitoring competence development processes. The results can then flow into the formulation of a competence development model.


Expertise as a facet of competence

In the following, competence is understood as the “cognitive abilities and skills available to individuals or can be learned by them to solve certain problems, as well as the associated motivational, volitional and social readiness and abilities to be able to use problem-solving successfully and responsibly in variable situations “(Weinert 2001, p. 27). This overarching concept of competence is given concrete form by specifying competence models. For example, B. the model of the competence of teachers by Riese (2009, p. 26) the physical knowledge in addition to the didactic and pedagogical knowledge and the motivational orientations and beliefs. The model of the competence of physicists by Woitkowski (2017) also contains specialist knowledge to be learned in physics studies in addition to mathematical and other scientific abilities and skills as well as physics-related (motivational) attitudes and beliefs.

As in many studies in this context (Kirschner 2013; Krauss et al. 2008; Riese 2009; Vogelsang et al. 2016; Walzer et al. 2013; Woitkowski and Borowski 2017), a distinction is made between two facets of physical knowledge:

  • The school knowledge denotes based on Krauss et al. (2008) the knowledge that an average student should have acquired at the end of lower secondary level. For operationalization, items are then used that could also be used in school with regard to their conceptual-conceptual horizon.

  • In contrast, that is university knowledge in the sense of the conception of Riese (2009) completely detached from the school. The terms used and / or the degree of math (e.g. the use of differential and integral calculus) go beyond what is typically affordable for students. Due to the insufficient conceptual or mathematical-methodological horizon, corresponding test tasks cannot usually be solved by very good students, at least in the intermediate level.

A deepened Knowledge facet (cf. Woitkowski and Borowski 2017) or Upper school knowledge is here in favor of a better delimitation between school-based and university knowledge not considered. To a certain extent, these two facets represent the knowledge that should be brought into the course from school and that which would have to be acquired during the course itself.

Physics-related beliefs play an essential role in acquiring this knowledge, as knowledge construction is always based on what learners know or believe they know about the subject (Putnam and Borko 1997). Beliefs represent a kind of filter in this process, since only that which does not conflict with the learner's beliefs is effectively learned (Blömeke 2004).

On the other hand, motivational factors play a somewhat different role in the learning process: They rather determine the extent to which learners actively use the learning opportunities offered (Eccles and Wigfield 2002). Motivation influences the extent to which students z. B. attend courses at all, work on tasks and otherwise regularly carry out active learning activities (see also Schulmeister 2015).

Complexity as a requirement

In addition to the subdivision into knowledge facets, requirements in this context are typically categorized according to further characteristics. In the context of the construction of specialist knowledge or competence tests in natural science didactics, inter alia an as complexity designated task characteristic established. The central idea is that there are task types within each facet of knowledge in which only simple knowledge elements have to be used (e.g. named or reproduced) to solve them. In the case of other tasks, these elements must be further linked in order to come to a solution in a reasonable time. More formally, one can say that learners make the step from a lower to a higher complexity if they succeed in combining and transforming elements of a lower complexity in such a way that a requirement can be met that can only be achieved by stringing together elements of lower complexity would not be manageable (Commons et al. 1998).

This concept of complexity reflects the conception of knowledge as a propositional network (cf. e.g. Schnotz 1994), the quality of which increases with the degree of interconnection of this network (Peuckert and Fischler 2000). The step from one complexity to the next higher would correspond to the process of Chunking, in which existing entities of the network are combined into larger and more complex units of meaning (Laird et al. 1986). If a complex-linked knowledge network already exists in a knowledge area, this degree of linkage can also be established comparatively quickly in a closely adjacent area, provided that the rules according to which links can be meaningfully established can be transferred between the areas (Dawson-Tunik 2006). Compared to this, the establishment of links without this transferability takes place much more slowly (Armon and Dawson 1997).

When determining the complexity of requirements (i.e. test items), the procedure is usually that the terms and concepts that appear in the task and those required for the solution are analyzed for their degree of linkage (Bernholt 2010; Kauertz 2008). It is therefore an "objective" task feature that does not depend on the person solving the problem or the specific approach to the solution and can be assessed with low inferences (Kauertz 2008, p. 22).

If items of different complexity are used in a test instrument, various studies have shown a high impact on the item difficulty - the complexity can be classified as Difficulty-creating task characteristic (Bernholt 2010; Kauertz 2008; Ohle et al. 2011; Woitkowski 2015). This forms the basis for level models that can be used to analyze the acquisition of specialist knowledge (cf. Klieme et al. 2003, p. 85).

The following is the complexity model from (Bernholt 2010) with the complexity characteristics Facts, process descriptions, linear causality and Multivariate interdependence adapted. "Upper levels build on lower levels, with the lower levels being organized by the upper levels. Each element is created through a connection and coordination of elements of the level below. ”(Commons et al. 1998, translation Bernholt 2010, p. 22) The one described differs universities and school knowledge Although in the degree of math and abstraction, requirements of all the complexities mentioned can be described.

Level models

Klieme et al. (2003) recommend the critical interpretation of test values ​​on the basis of competence levels. These are "sections on continuous competence scales that are formed with the aim of a criterion-oriented description of the competencies recorded." (Hartig 2007, p. 86) For level construction, several methods are discussed in the literature (Woitkowski and Riese 2017). In our case, the assignment takes place criterially on the basis of the task characteristic complexity. This means that the test subjects of one level can successfully cope with requirements of one complexity, but not requirements of the next higher complexity. It makes sense for the school-based and university knowledge to generate separate level models so that the levels occupied in the various knowledge areas can be related to one another.

The interpretation of the test data with the help of this complexity-based level model then enables two interpretative approaches to the knowledge of the test subjects, which would not be possible on the basis of numerical test values ​​alone:

First, the literature suggests that it takes longer to learn requirements of higher complexity than to catch up with a complexity already mastered in a neighboring facet of knowledge (Armon and Dawson 1997; Dawson-Tunik 2006).

Second, the levels provide a critical classification of the level of knowledge. For example, it can be assumed that the typical beginners' lecture at the university places complex and highly mathematical demands on the students. In the level model, this would correspond to a high level in university knowledge, i.e. dealing with complex problems in the knowledge facets defined by abstraction and mathematization (Woitkowski 2015, p. 262). Thus, this level can be accepted as a normative learning objective in the first semester.

Knowledge acquisition in physics studies

Previous findings on university knowledge acquisition are primarily available from longitudinally interpreted cross-sectional surveys, so they have the interpretative problem that the same people are not tested at several points in time. Thus, individual developments and cohort or other group effects cannot be effectively differentiated. These studies show a difference between able and less able students that increases with the duration of the course (Riese 2009).

The construction of complexity-based competence levels has already been tested with students for the facets of knowledge tested here (Woitkowski 2015, 2017). On this basis, a level could already be specified as a desirable goal in physics studies. However, this is not achieved by around a third of the test subjects even after several semesters of study (Woitkowski and Riese 2017). So far, the question of the determinants of this development has only been answered in cross-sectional analyzes. Likewise, the speed of development or level advancement can hardly be answered on the basis of the previous data.

The analysis of longitudinal knowledge acquisition by means of complexity levels has not yet been carried out in the university. Due to the level construction along item difficulties, an increase is also to be expected here, but the extent is not clear. On the basis of the transferability between neighboring facets of knowledge found by Dawson-Tunik (2006), it can be assumed that subjects who were in School knowledge reach a higher level, even in the university knowledge rather move up to higher levels. This hypothesis could not yet be tested on the basis of the previous cross-sectional surveys.

In the past, level constructs were often used to set educational goals or criteria for success (e.g. IQB 2013; Woitkowski and Riese 2017). Longitudinal data on the connection between target achievement and learning requirements at the beginning of the course are not available here. Here come next to the above. Prior knowledge also includes a number of other characteristics as a predictor in question.The first heuristic examined here were characteristics that are also recorded in the context of dropping out of studies, as this is often associated with technical difficulties (e.g. Heublein et al. 2014): motivation, buoyancy, social and institutional requirements, mathematical knowledge ( Albrecht 2011; Bosse and Trautwein 2014; Burger and Groß 2016; Buschhüter et al. 2016; Neumann et al. 2016; Sorge et al. 2016).

Research questions

The literature findings consistently show an increase in specialist knowledge over the duration of the study (Riese 2009; Woitkowski 2015). Longitudinal data with two test times at the beginning (TZP 1) and at the end (TZP 2) of the first semester should be able to reproduce this finding. Against the theoretical background, the increases in university knowledge higher than in school knowledge fail because the former is to a greater extent the subject of university teaching.


Are there any increases in the school and university knowledge? In which facet is the increase on average higher?

The level model now allows a comparison of the levels achieved at the beginning of the study (TZP 1) in the two facets of knowledge. For first-year students (TZP 1) the frequent achievement of high levels in the schoolbut not in university knowledge expected. However, since there are two facets of knowledge in a common domain, beginners could go for high school Levels are most likely also higher universities Levels are expected.


Which levels are reached in TZP 1 in the two facets of knowledge? To what extent does the achievement of high levels correspond between the facets?

The increase investigated in F1 can be criterially analyzed with the help of the level model: The main interest is from which point school Level in the previous knowledge an effect becomes visible.


What levels in the university knowledge are reached at TZP 2? To what extent do the levels reached at TZP 1 predict this?

Achievement of the highest level can be used as a criterial measure for successful acquisition of specialist knowledge in university education universities Levels are set.


How many students reach the top level of the university knowledge and how do they differ from the subjects who do not reach this level?


Sample and test times

The random sample is recruited from the participants in 7 experimental physics beginner lectures at 6 German universities in the 2016/17 and 2017/18 winter semesters, which are attended by students in specialist and grammar school teacher training courses. Subjects who were no longer in the first semester were sorted out.

The subject of all courses was classical Newtonian mechanics. The research project was briefly presented to the students in the first lecture in the first semester, before the first test took place in the first week of the semester. The second test time also took place in the last week of the semester in the course. A third test time took place at the end of the second semester; However, data from this are not reported here for reasons of space. For participation in all tests of the longitudinal section, a test person's allowance of € 50 was paid. The analyzes shown here only include those test persons for whom data sets are available from the first two test times. So that in the following N = 122 test persons, of which 99 subject and 23 teaching degree students, are analyzed (Tab. 1). The proportion of women is 27.8% (slightly higher in the teaching profession), the mean Abitur grade 1.90 (SD = 0.67), the mean last school grade in mathematics 1.64 (SD = 0.84) and in physics at 1.49 (SD = 0.74).

If one compares the group analyzed in the following with a complete data set with the dropout group, i.e. the test subjects for whom data is only available at the first test time, the overall dropout is 52.7%. The dropout group deviates significantly downwards in all three grades from the sample analyzed here (Abitur grade: W = 10,564; p <0.001; Math grade: W = 9735.5; p = 0.003; Physics grade: W = 9250.5; p = 0.003). The high dropout rate in the longitudinal section is in line with the known drop-out and changeover rates in this area (Heublein et al. 2014) and represents one of the usual problems when acquiring longitudinal samples. However, it cannot be determined whether (and to what extent) Subjects from TZP 1 to TZP 2 were still studying but did not take part in the test for other reasons - the students were not addressed personally for the second test, but as a group in the respective course. In any case, the sample analyzed here is a positive selection with regard to characteristics such as test motivation, regular participation in events and possibly also self-concept.

Test instrument

In the study reported here, the scales come to school and university expertise von Woitkowski (2015) used. The items operationalize the content of the mechanics, which is usually the subject of the first semester of study (cf. KFP 2010). Example items are shown in Fig. 1. The assignment of the items to the facets of knowledge and complexities was done in the context of a survey of a physics didactic and a subject manager as experts with the aid of decision trees, whereby individual criteria for classification were queried in a structured order. For this purpose, a solution prepared by an expert and reflected in relation to the necessary knowledge and structures is necessary as a basis. The knowledge facet criteria relate to the terms used in the item, their degree of mathematisation and abstraction, but the complexity criteria to the structure of the procedure required for the solution (Woitkowski 2015, Chapter 12). A consensus classification was drawn up based on the feedback from the two experts. In the test, items are available for all complexities in both knowledge facets; the criteria for the complexity allocation are identical for both knowledge facets. The characteristics of complexity and facets of knowledge are therefore as orthogonal as possible and can be compared between the facets. Table 2 shows exemplary criteria for allocation.

Mathematical knowledge is assessed with 15 items from the study entry test by Krause and Reiners-Logothetidou (1981), which include the areas of vector calculation, straight line and ellipse equations, quadratic equations, function graphs and derivatives. This easily goes beyond the mathematical knowledge required to solve the specialist knowledge items.

Further development predictors such as motivation, attitudes and beliefs are covered by scales taken from the literature: Belief and self-concept scales by Riese (2009) and Lamprecht (2011), scales on study satisfaction, context conditions, learning difficulties and study climate according to Albrecht (2011) and Burger and Groß (2016), for the subject-specific Academic Buoyancy (Neumann et al. 2016) and two scales for effort and importance adapted by Sundre (2007) for the exercises and written exams relevant to physics studies.

The test instrument is designed for a test duration of 60 minutes. The knowledge scales follow (as in the original publication) one partially balanced incomplete block design (pBIBD; cf. Kubinger et al. 2011), with each subject being presented with 3 out of 10 item blocks for processing. The 3 blocks of TZP 1 and 2 are completely disjoint so that memory effects can be excluded. Each test booklet contains between 10 and 16 items for the school knowledge (M = 12.6; SD = 2.0) and between 3 and 8 items for the university knowledge (M = 5.1; SD = 1.5). Apart from the specialist knowledge items, the test booklets are identical at each test point.

The specialist knowledge as well as the mathematics items are partly open items, partly closed (single choice) items. The other items are (as in previous tests) formulated as 4-point Likert scales - the N1 Academic Buoyancy scale, however, has 7 levelsFootnote 1.

The tests were coded by trained assistants using a detailed coding manual with expectations for all knowledge items. The quality of the coding was continuously checked by double coding approx. 10% of the test booklets and corrected if necessary. Cohen's κ = 0.874 is in the very good range (Bortz and Döring 2006, p. 277).


Only the N = 122 subjects were used for whom a complete data set was available at both test times. The technique of virtual test subjects was used for the scale formation and Rasch analysis, i.e. each test subject was mapped as an individual case in the data set at each point in time (Hartig and Kühnbach 2006; König et al. 2018; Plöger et al. 2016; Seifert and Schaper 2012). In total, the scales were formed with 244 data sets.

The specialist knowledge scales are analyzed with the dichotomous Rasch model with the R-package TAM (Robitzsch et al. 2017), whereby items with an infit of MNSQ> 1.25 or T> 1.96 were excluded from further use ( see Adams and Wu 2007); likewise items with item DIF of more than 0.638 logits between the two test times, which was a huge DIF would correspond (see Wilson 2005, p. 167). The Rasch analysis was carried out on the one hand for the items separated according to knowledge facets in two scales and on the other hand for a common scale with all items in order to test the separability of the knowledge facets. The personal parameters (WLE estimator) are used as test scores for further analysis of the subject's abilities. Overall, this procedure has the advantage that the item parameters do not vary between the test times (which is necessary for comparability of the level construction based on it) and that the use of WLEs as scores provides more reliable results (Hartig and Kühnbach 2006). The use of plausible values ​​(PV) instead of WLE estimators would also make it more difficult to make statements about individual individuals. The disadvantage that changes in the scale composition or the underlying competence structure cannot be mapped in this way is accepted here in favor of easier interpretability in relation to the question.

The level model is constructed from the item parameters according to the method tested by Woitkowski (2015). Before doing this, a linear regression is used to check whether the item difficulties can be predicted well with the complexity (in other studies, there may also be an influence of other item characteristics; e.g. Kauertz and Fischer 2006). All other scales are evaluated using the means of classical test theory. Cronbach’s α is determined as a measure of reliability and the scale is no longer used for α <0.6. After the Rasch analysis and the formation of a scale, the cases that belong together between TZP 1 and 2 are identified and merged in the data set.

For research question F1, differences in the test times are calculated in the respective scores. Level assignments are counted for F2. In F3, a descriptive report is first given of how many students change from which level to which level between TZP 1 and 2. To check the prediction of the levels attained for TZP 1, the scores in university knowledge for TZP 2 are compared between the groups and the result is confirmed with an ANOVA. The interaction between school and university knowledge is clarified by comparing different ANOVAs and linear regression models. For question F4, group differences are reported between the subjects who achieve the target level for TZP 2 and those who do not.

The two-sided Wilcox-Mann-Whitney test is used for the group differences. Compared to the common t-test, this is more robust with regard to sample size and normal distribution of the data; the level of significancep but can be done analogously with * <0.05; ** <0.01; *** <0.001 can be stated and interpreted (Hollander and Wolfe 1973). In the case of multiple factor differences, an ANOVA is also calculated. Cohen’s d is given as the measure of the effect size. Here, d> 0.2 marks small, d> 0.5 medium and d> 0.8 large effects (Tiemann and Körbs 2014, p. 291).

Scale parameters

The test items were combined into two scales according to knowledge facets (Tab. 3). in the school knowledge 2 items were excluded due to insufficient infits. The WLE reliability is in school knowledge acceptable, but weak in university knowledge. This may be due in part to the short universities Scale, which is additionally shortened by the rotating test booklet design (Adams 2005). In part, however, it also appears to be a pre-test effect, as the analysis carried out separately for comparison according to test times shows, in which the reliability to TZP 2 is slightly higher. In the case of a pre-test, however, the WLE reliability can i. d. Usually cannot be interpreted (Rost 2004, p. 382). The EAP reliability, which is easier to interpret in this case, is acceptable in all cases.

Regardless of the specific causes, the low WLE reliability clearly means a relatively high measurement uncertainty of the individual test subject's abilities. The level assignment carried out on this basis is therefore more uncertain, especially at the level limits, than it would be with appropriate reliability. The measurement uncertainty also leads to a lower stated significance of correlations between the test subject's abilities and other variables (Adams 2005), so the significance of effects may be underestimated in the following. The use of plausible values ​​(PV) would help here, but make the level assignment more difficult (see above), so that we continue to work with WLE estimators.

To check the fit of a model with two scales, an overall scale with all items was created for comparison. Fit indices and model comparisons are shown in Table 3. The AIC speaks briefly in favor of separate scales, the BIC rather in favor of a common scale. A χ2-Test is barely not significant. This inconsistent picture therefore allows both models in principle. The analysis, which is separated according to test times, speaks more strongly for a common scale. This can be interpreted in such a way that the two scales correlate highly with each other at each test time, but this relationship between the TZP shifts, i.e. it turns out to be lower in the joint analysis of both test times. This speaks for a differential development of the knowledge facets between the test times and thus for a separate analysis. Since these are also delimited theoretically and it is in the interest of knowledge to separate the knowledge that should be brought from school from that that should be acquired in the university, the following model will continue to calculate according to facets.

To check the homogeneity between the test times, Table 3 also shows the variance and reliability for each test time. These are each in the same order of magnitude for a common scaling of the test times. The item parameters correlate very significantly between the test times (school knowledge: r (35) = 0.833; p <0.001; University knowledge: r (15) = 0.820; p <0.001). The common scaling with the method of virtual cases can thus be carried out (see Seifert and Schaper 2012).

The resulting scales have a latent, moderately high correlation with one another (rlat = 0.884, note that latent correlations are numerically significantly higher than manifest ones; Wu et al. 1998).

The mathematics scale was evaluated classically. With Cronbach’s α = 0.76, it is to be regarded as sufficiently reliable; on average, 73% (SD = 19%) of the items were solved correctly. The scales for beliefs, attitudes, motivation and perception of one's own studies are shown in Table 4. The two scales for Importance and Effort with reference to the exam showed, even after adjustment, a Cronbach's α <0.6 for the entire sample and were from excluded from analysis. The importance and effort scales with reference to the exercise sheet can, however, be interpreted.