IB Math Studies Internal Assessment: What is the Relationship between SAT Scores and Family Income of the Test Takers around the World? Exam Session: May 2011 School name: International School Bangkok Teacher: Mr. Demille Date: December 8th, 2010 Course: IB Math Studies Word Count: 1,832 Name: Billy Egnehall
What is the Relationship between SAT Scores and Family Income of the Test Takers around the World? Introduction The SAT examination is mostly in today’s world of academics, a requirement of getting accepted into collage. Not only is it enough to take the examination but the student has to pass with an average score or above to even have his/her application be considered. Many students around the world recognize this and therefore apply to prep schools for the SAT or their parents send them to a higher educational institution for that purpose. The prep schools such as Princeton are not cheap however as it helps give advice on how to best tackle the SAT examination, neither are higher educational institutions. Also it can be considered a luxury service by some middle class and low class societies in the world to be able to attend either one. This being said, the SAT prep course and higher educational institutions are, as a result, aimed at the high class societies in the world or those who can afford it. If this is true, it would put families with a higher income at an advantage for their children to get accepted into col age compared to families who cannot afford for their children to take the course or school fee and learn the advice of how to pass the SAT examination with a high score. Are the col ages which students aim to be accepted into for a better education real y based on which families can afford for their children to take the SAT prep course or learn at a higher educational institution? The data col ected from Col age Board in year 2007 was analyzed to determine whether there is a relationship between SAT scores and family income of the test takers around the world (Rampell). Statement of Task The main purpose of this investigation is to determine whether there is a relationship between SAT scores and family income of the test takers around the world. The type of data that will be col ected is the SAT scores and family income of the two-thirds of test takers who voluntarily reported it to col age board when signing up for the SAT examination worldwide. The SAT scores are used to determine how high of a score the test taker got and family income to determine the possibility to send their children to SAT prep schools or better educational institutions. The data used to generate the data breaks down the average score for ten different income groups of $20,000 range.
Plan of investigation I am investigating the relationship of SAT scores and family income of the test takers around the world. I have collected data on SAT scores and family income of the test takers around the world. With the collection of data that I have acquired, a number of mathematical processes were used to analyze the data: a scatter plot of the data, calculation of the least squares regression line and correlation coefficient. I am going to do a χ2 test on the data to show the dependence of SAT scores and family income of the test takers around the world.
Mathematical Investigation Collected Data Family income of test Percentage of test takers takers within each family income Critical group reading Math Writing ∑ Less than $10,000 4% 427 451 423 1301 $10,000–$20,000 8% 453 472 446 1371 $20,000–$30,000 6% 454 465 444 1363 $30,000–$40,000 9% 476 485 466 1427 $40,000–$50,000 8% 489 496 477 1462 $50,000–$60,000 8% 497 504 486 1487 $60,000–$70,000 8% 504 511 493 1508 $70,000–$80,000 9% 508 516 498 1522 $80,000–$100,000 14% 520 529 510 1559 Table 1: Mean SAT scores per section categorized in family income of test taker in 2007 More than $100,000 26% 544 556 537 1637 This bottom row, the “More than $100,000” I am going to consider as an outlier therefore excluded in all calculations as it goes from $100,000 up to the mil ions of dollar of income which is too wide of a range to include into the calculations of this assessment.
Graph 1: Average SAT Score Vs. Family Income 1600 1559 1550 1522 1508 1500 1487 1462 1450 1427 1400 1371 1363 Average SAT score Overal Averaged SAT Score (top score 2400) 1350 1301 1300 1250 1200 11500 50000 100000 Family Income of SAT Takers ($ in Thousands) Graph 1 shows the average SAT score Vs. family income of test taker. As of now, there seems to be very strong positive correlation. It does appear that the SAT scores improve as the family income increases. (Graph was generated through Microsoft Excel)
Calculation of the Least Squares Regression The Least Square regression identifies the relationship between the independent variable, x, and the dependent variable, y. It is given by the following formula: S y ∑ xy − ´y = xy (x−´x) S where Sxy= − ´x ´y and S − ´ x2 x2= √∑ x2 x2 n n Table 2: Values of Least Squares Regression x y xy x2 15000 1301 19515000 225000000 25000 1371 34275000 625000000 35000 1363 47705000 1225000000 45000 1427 64215000 2025000000 55000 1462 80410000 3025000000 65000 1487 96655000 4225000000 75000 1508 113100000 5625000000 85000 1522 129370000 7225000000 95000 1559 148105000 9025000000
∑ = 495000
∑ = 13000
∑ = 733350000
∑ = 33225000000 ´x = 55000 ´y = 1444. ´x ´y = 79444444.44 ´x2 = ´ 44 3691666667 These are the calculated values used in finding the Least Squares Regression ∑ xy Sxy= − ´x ´y n 733350000 S = −79444444.44 xy 9 S =2038888.893 xy
∑ x2−¿´x n 2 S = x √¿ Sx=√33225000000−3025000000 9 S =25819.88897 x S y− ´y = xy (x−´x) Sx2 25819.88897 ¿ ¿ ¿ 2 ¿ 2038888.893 y−1444.44444= ¿ y=0.0030583333 x +1276.231666 Calculation of Pearson’s Correlation Coefficient Pearson’s Correlation Coefficient indicates the strength of the relationship between the two variables (SAT scores and family income of test taker). It is given by the fol owing formula: Sxy r= S where S = = x √∑(x−´x)2 √∑(y−´y)2 x Sy n , S y n and ∑ xy Sxy is t h e covariance −´x ´y . n Table 3: Values of Pearson’s Correlation Coefficient x y ( x−´x )2 ( y− ´y )2 15000 1301 1600000000 20576.30864 25000 1371 900000000 5394.08642 35000 1363 400000000 6633.197531
∑ = 58204.22222 ´x = 55000 ´y = 1444. ´ 44 These are the calculated values used in finding the Correlation Coefficient. S =25819.88897 x S y=√58204.22222 9 S =80.4185041 y 2038888.893 r=(25819.88897)(80.4185041) r=¿ 0.9819360378 r2=0.9642983824 The calculation r2=0.9642983824 suggests that the strength of the association of the data is very strong since 0.90 ¿ r2 < 1. I compared this value of r2 with the standard table of coefficient of determinations which places it in the “very strong” category (Whiffen).
Graph 2: Average SAT Score Vs. Family Income Linear Fit line 1600 1559 1550 1522 1508 1500 1487 1462 1450 1427 Average SAT score Linear (Average SAT score) Overal Averaged SAT Score (top score 2400) 1400 Linear (Average SAT score) 1371 1363 1350 1301 1300 1250 0 50000 100000 Family Income of SAT Takers ($ in Thousands) r2=0.9642983824 y=0.0030583333 x +1276.231666
Graph 2 indicates that there is a strong positive linear correlation. This is also indicated through the value of correlation coefficient, 0.96.(the graph was generated through Microsoft Excel ) C alculation of a χ 2 test The χ 2 test is used to measure whether two classifications or factors from the same sample are independent of each other – if the occurrence of one of them does not affect the occurrence of the other. χ2=∑ ( f o−f e)2 f e Observed Values: B1 B2 Total A1 A B A+B A2 C D C+D Total A+C B+D N Calculations of Expected Values: B1 B2 Total ( A+ B )( A +C) ( A+ B )(B+D) A1 N N A+B ( A+C)(C + D) ( B+D)(C +D) A2 N N C+D Total A+C B+D N Degrees of freedom measure the number of values in the final calculation that are free to vary: Df =(rows−1)(columns−1) Null (H0) Hypothesis: SAT scores and family income are independent from each other.
Alternative (H1) Hypothesis: SAT scores and family income are dependent from each other. Table 4: Observation Values Score Income($) 1300-1430 1431-1561 Total 15000 – 55000 4 1 5 56000 – 96000 - 4 4 Total 4 5 9 Table 2 shows the observed values for SAT score Vs. family income. The data pieces have been put into ranges that represent the income of the families of the test takers. Table 5: Calculations for the Expected Values Score Income($) 1300-1430 1300-1430 Total ( 4+1)(4+ 0) ( 4+1)(1+4) 15000 – 55000 9 9 4+1 ( 4+ 0)(0+4) (1+4)(0+4) 56000 – 96000 9 9 0+4 Total 4+0 1+4 9 Table 3 shows the individual calculations for each of the expected values. Table 6: Expected Values
Score Income($) 1300-1430 1300-1430 Total 15000 – 55000 2.22222 2.77777 5 56000 – 96000 1.77777 2.22222 4 Total 4 5 9 Table 6 shows the expected values retrieved by the calculations in table 4 χ2=∑ ( f o−f e)2 f e (4−2.22222)2 (1−2.77777)2 (0−1.77777 )2 (4−2.22222)2 χ2= + + + 2.22222 2.77777 1.77777 2.22222 χ2=5.759995408 Df =(rows−1)(columns−1) Df =(2−1)(2−1) Df =1 The χ 2 critical value at 5% significance with 1 degree of freedom is 3.841. As the χ 2 value is greater than the critical value, 5.760 ¿ 3.841, the null hypothesis is rejected and SAT score is assumed dependent from family income. Discussion/Validity Limitations Throughout the investigation between the correlation of SAT scores and family income, various limitations may have affected the outcome of the results. One limitation of the data collected could be that it only reflects on the people who filled in the family income section before signing up for the SAT. There is no evidence that the data reflects everyone who has taken the SAT score as there may be people who did not fil that section.
Another limitation could be that not everyone in the world decide to take the SAT, people who cannot afford it or take alternative tests are being neglected. Also the data does not confirm of how many SAT takers are being considered. The data can be proved insufficient and inaccurate for those reasons. There is also a limitation in the data as it states income of “$100,000 and above”. That could mean that the data goes on unto family incomes of millions which is not proportionate to the other ranges of family income given. Due to this however, that piece of data was left out in the calculations. Continuing, there might be a limitation to the recording of the data itself as SAT takers are to take a survey where they mention family income when signing up for SAT. This might cause a problem as many SAT takers, mostly in ages 15- 17, do not know the actual income of their family therefore wrong data may be entered. Then there could be a limitation to the data due to culture and race. The data does not mention culture and race which might affect the data as there might have been more American surveys who mentioned family income compared to Asian who answered the survey. Another limitation is that the table of expected values in the χ 2 test has al values less than 5 which reduces its validity. Adding on to that, there might be a limitation to the amount of data that was collected as 9 pieces of data may not prove to be sufficient enough to reflect the correlation between SAT scores and family income in a world perspective. Lastly, there may be many other factors taking place when considering the correlation between SAT scores and family income such as reasons for having a high family income and IQ of SAT test takers. Conclusion Despite of the previously mentioned limitations, the found χ 2 value, 5.760, rejects the null hypothesis that SAT scores are independent from family income and accepts the alternative hypothesis that SAT scores are dependent from family income. Furthermore, the investigation clearly shows that there is a strong and positive correlation between SAT score and family income as it can be an assumed dependence from each other.
Work Cited Rampell, Catherine. "SAT Scores and Family Income - NYTimes.com." The Economy and the Economics of Everyday Life - Economix Blog - NYTimes.com. 28 Aug. 2009. Web. 01 Nov. 2010.<http://economix.blogs.nytimes.com /2009/08/27/sat- scores-and-family-income/>. Downey, Joel. "SAT Scores Rise with Family Income." Cleveland OH Local News, Breaking News, Sports & Weather - Cleveland.com. 10 Apr. 2008. Web. 01 Nov. 2010.<http://www.cleveland.com/pdgraphics/index.ssf/2008/04/sat_scores_rise_ with_family_in.html>. Whiffen, Glen, John Owen, Robert Haese, Sandra Haese, and Mark Bruce. "Two Variable Statistics." Mathematics for the International Student: Mathematical Studies SL. By Mal Coad. [S.l.]: Haese And Harris Pub, 2010. 581-82. Print.