D'Ortenzio,+A

__Data Management Culminating Task 2010/2011 __ Andy D'Ortenzio - Idea Brainstorming - Stage ONE -
 * Number of cigarettes smoked (per week) vs. Lung Cancer rates
 * Sexual encounters vs STI's
 * McDonalds Kids Meals Sold vs. childhood obesity
 * Crime vs Unemployment
 * Marijuana consumption vs University Acceptances
 * Alcohol consumption vs Grade Point Average
 * NHL goals scored vs minutes played
 * Obesity rates vs. diabetes
 * Video game usage vs. Absenteeism in Canadian high schools
 * Substance use vs. Academic achievement
 * Age and gender vs. Vehicle insurance rates
 * Population vs. Fossil fuels burned
 * Population vs. Energy consumption
 * <span style="color: #0000ff; font-family: 'Comic Sans MS',cursive;">Unemployment rates vs. Divorce rates
 * <span style="color: #0000ff; font-family: 'Comic Sans MS',cursive;">Number of fast food joints (in a city) vs. Obesity rates (in that city)
 * <span style="color: #0000ff; font-family: 'Comic Sans MS',cursive;">NHL Offensive Hockey Line (1st, 2nd, 3rd or 4th) vs. Points scored per lineup
 * <span style="color: #0000ff; font-family: 'Comic Sans MS',cursive;">Age of Vehicle vs. Stopping distance at 40km/h
 * <span style="color: #0000ff; font-family: 'Comic Sans MS',cursive;">Childcare vs. Social Skills
 * <span style="color: #0000ff; font-family: 'Comic Sans MS',cursive;">Teen Pregnancy rates vs. Abortion rate
 * <span style="color: #0000ff; font-family: 'Comic Sans MS',cursive;">NHL Draft standings vs. points in Rookie year
 * <span style="color: #0000ff; font-family: 'Comic Sans MS',cursive;">Number of Sunglass removals From Horatio in CSI Miami vs. Number of episodes (in season 7)

- ---

Variables Being Tested: Independent (x) --> Canada's population Dependent (y) --> Energy consumed

Question: Does the rapidly growing population directly relate to the amount of energy consumed?

Background Info: The planet Earth, let alone Canada's population is expanding at a dangerous, exponential rate, which directly effects many other factors. The growing population demands a higher number of motorized vehicles, vast amounts of processed foods (which requires energy), which can only lead to one thing, destruction of mother nature to make room for 17 lane highways. Literally every single time you drive by the gas station you say to yourself, "oh my goodness gas is $7 per litre? Why is the price always skyrocketing, I have kids to feed!". There is a simple answer to this common predicament, which is supply and demand. Earth only has a limited amount resources and will eventually be sucked completely dry from the exponentially growing population. Side effects of this growing problem have already been popping up around the globe:


 * 1970s energy crisis - caused by the peaking of oil production in major industrial nations (Germany, United States, Canada, etc.)
 * 1973 oil crisis - caused by an OPEC oil export embargo by many of the major [|Arab] oil-producing states, in response to Western support of Israel during the Yom Kippur War1979 oil crisis - caused by the Iranian Revolution
 * 1990 oil price shock - caused by the Gulf War
 * The 2000–2001 California electricity crisis - Caused by market manipulation by Enron and failed deregulation; resulted in multiple large-scale power outages
 * Fuel protests in the United Kingdom in 2000 were caused by a rise in the price of crude oil combined with already relatively high taxation on road fuel in the UK.
 * North American natural gas crisis
 * 2004 Argentine energy crisis
 * North Korea has had energy shortages for many years.
 * Zimbabwe has experienced a shortage of energy supplies for many years due to financial mismanagement.
 * Political riots occurring during the 2007 Burmese anti-government protests were sparked by rising energy prices



Hypothesis: I believe Canada's population will directly correlate with the total energy levels consumed and result in a positive moderate correlation as time progresses.

- ---  ---  (Please mark off **most** correct answer.) **These will be handed out to basically anybody who will take five seconds to fill it out, therefore this eliminates bias. On the other hand those around me will be the ones answering questions, so it could be considered a Convenience sample.**
 * //__ Survey Questions: __//**

**1. How large is your family (at home)** [ ] 2 [ ] 3  [ ] 4  [ ] 5  [ ] 6  [ ] 7 (haris) [ ] 8 or more

**2. Estimate your electrical bill per month at home.** [ ] <$100 [ ] $100 - $200  [ ] $200 - $300  [ ] $300 - $400  [ ] > $400

**3. Estimate your grocery bill per week at home.** [ ] <$100 [ ] $100 - $200  [ ] $200 - $300  [ ] $300 - $400  [ ] > $400

**4. List electrical devices at home (ex. computer, microwave)**

**5. How many hours a day do you spend in front of a screen?** [ ] 2 [ ] 3  [ ] 4  [ ] 5 or more

**6. Do you own a car?** [ ] Yes [ ] No

**7. Do you plan on buying a car in the next 5 years?** [ ] Yes [ ] No

**8. What is your most frequent mode of transportation?** [ ] Automobile [ ] Walking [ ] Biking [ ] Carpooling [ ] Cash cab

**9. How often do you go out for dinner a week?** [ ] 2 [ ] 3  [ ] 4  [ ] 5  [ ] 6  [ ] 7

**10. Your car has what type of engine?** [ ] Gasoline [ ] Diesel [ ] Combustion [ ] Electrical / Hybrid [ ] Solar powered

[ ] Supercharged Turbo NOS fuel injected superman car engine

Testing your knowledge: **11. Which of the following groups spent the most on zero and low-carbon technologies since 2000? **

[ ] Federal Government [ ] Oil and natural gas industry [ ] Other private industries [ ] No clue...

- ---  Raw Data: This graph illustrates the steady incline of Canada's Population from 2000-2008. Data is not continuous due to Estimated or rounded values, therefore it is Discrete. <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">**Geography** = Canada <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;"> **Sex** = Both sexes <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;"> **Age group** = All ages - ---
 * <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">Raw data: **


 * ~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">2000 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">2001 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">2002 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">2003 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">2004 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">2005 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">2006 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">2007 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">2008 ||
 * <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">30,685,730 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">31,019,020 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">31,353,656 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">31,639,670 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">31,940,676 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">32,245,209 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">32,576,074 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">32,931,956 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 90%;">33,327,337 ||

- This graph clearly illustrates the steady incline of energy supply and demand in Canada from 2000-2008. Data is not continuous due to Estimated or rounded values, therefore it is Discrete. <span style="font-family: 'Comic Sans MS',cursive; font-size: 12px;">**Geography =**Canada <span style="font-family: 'Comic Sans MS',cursive; font-size: 12px;"> **Fuel type =**Total, energy consumed as fuel (higher heating value) (gigajoules) <span style="font-family: 'Comic Sans MS',cursive; font-size: 12px;"> **North American Industry Classification System (NAICS)=** Food manufactoring

**Raw Data:**

- --- __//** *Data analysis* **//__
 * ~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">2000 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">2001 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">2002 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">2003 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">2004 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">2005 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">2006 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">2007 ||~ <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">2008 ||
 * <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">94,606,870 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">89,116,064 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">88,765,413 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">89,041,122 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">90,928,089 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">95,773,902 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">96,136,500 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">99,535,969 || <span style="font-family: 'Comic Sans MS',cursive; font-size: 80%;">98,607,593 ||

1. For this project to include one variable analysis, in actuality you must analyze the "dependent variable", which in this case is the energy consumed in Ontario from 2000 - 2008.

2. Since the independent variable (population) is also increasing over the years, it would make sense to choose two variable analysis and correlate the rate at which each increases to hopefully relate back to my hypothesis.

3. Both data samples are from the same years, 2000 to 2008, the independent and dependent variables match up with specific regions (Ontario) and years, so calculating the correlation coefficient will be accurate.

4. As expected, over the years the population in Canada is increasing at a steady pace, along with the amount of energy consumed in the industry. The only question that remains is will the correlation between these two be positive and moderate? FIND OUT NEXT TIME ON MYTH BUSTERS!!!

__**// Sampling techniques/bias //**__ //__Convenience sample__// – the survey was handed out to students at Ancaster High school, which was chosen because distributing them would make life easy on me.

//__Non Response bias__// – some people probably didn’t even look at the survey and just threw it in the garbage and continued living their lives. High school students are quite busy these days.

//__Response bias__// – some of the surveys were handed to friends, and siblings, which increases risks of data being skewed and unreliable because they probably thought it would be funny to submit ridiculous surveys that are very farfetched just to make my project interesting.

**This data was taken from a sample NOT a population because not everybody at AHS completed the survey** - ---

<span style="background-attachment: initial; background-clip: initial; background-color: yellow; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial;">M <span style="background-attachment: initial; background-clip: initial; background-color: #ffff00; background-image: initial; background-origin: initial;">﻿ <span style="background-attachment: initial; background-clip: initial; background-color: yellow; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial;">anaging the Data - -- *** Population Calculations* ** ** - **--

Mean for a Sample = = ∑x / n

(Sum of x values divided by number of values)

<span style="font-family: 'Comic Sans MS',cursive; font-size: 12px; margin: 0px; padding: 0px;"> 30,685,730 <span style="font-family: 'Comic Sans MS',cursive; font-size: 12px; margin: 0px; padding: 0px;"> 31,019,020 <span style="font-family: 'Comic Sans MS',cursive; font-size: 12px; margin: 0px; padding: 0px;"> 31,353,656 <span style="font-family: 'Comic Sans MS',cursive; font-size: 12px; margin: 0px; padding: 0px;"> 31,639,670 <span style="font-family: 'Comic Sans MS',cursive; font-size: 12px; margin: 0px; padding: 0px;"> 31,940,676 <span style="font-family: 'Comic Sans MS',cursive; font-size: 12px; margin: 0px; padding: 0px;"> 32,245,209 <span style="font-family: 'Comic Sans MS',cursive; font-size: 12px; margin: 0px; padding: 0px;"> 32,576,074 <span style="font-family: 'Comic Sans MS',cursive; font-size: 12px; margin: 0px; padding: 0px;"> 32,931,956 __<span style="font-family: 'Comic Sans MS',cursive; font-size: 12px;"> + 33,327,337 __

= 287,719,328 / 9 = //31,968,814//

Median (middle number) = // 31,940,676 //

Mode = Since no values repeat, there is no mode for Population

Range (highest value – lowest value) = 33,327,337 – 30,685,730 = //2,641,607//


 * = Year ||= X ||= Deviation ||= Deviation squared ||
 * 2000 || 30,685,730 || - 1,283,084 || 1.646 x 10^12 ||
 * 2001 || 31,019,020 || - 949,794 || 9.02 x 10^11 ||
 * 2002 || 31,353,656 || - 615,158 || 3.784 x 10^11 ||
 * 2003 || 31,639,670 || - 329,144 || 1.08 x 10^11 ||
 * 2004 || 31,940,676 || - 28,138 || 791,747, 044 ||
 * 2005 || 32,245,209 || 276,395 || 7.64 x 10^10 ||
 * 2006 || 32,576,074 || 607260 || 3.688 x 10^11 ||
 * 2007 || 32,931,956 || 963,142 || 9.276 x 10^11 ||
 * 2008 || 33,327,337 || 1,358,523 || 1.846 x 10^12 ||
 * ||  ||   || ** ∑ 6.253 x 10^12 ** ||

Standard Deviation for Population Increase



= (6.253 x 10^12) / (9-1) = 7.82 x 10^11 (then take square root) = //884,095.6//

- ---  -  -
 * *Energy Calculations* **

Mean for a Sample = = ∑x / n  (Sum of x values divided by number of values)

<span style="font-family: 'Comic Sans MS',cursive; font-size: 11px; line-height: 9px;">94,606,870 <span style="font-family: 'Comic Sans MS',cursive; font-size: 11px; margin: 0px; padding: 0px;"> 89,116,064 <span style="font-family: 'Comic Sans MS',cursive; font-size: 11px; margin: 0px; padding: 0px;"> 88,765,413 <span style="font-family: 'Comic Sans MS',cursive; font-size: 11px; margin: 0px; padding: 0px;"> 89,041,122 <span style="font-family: 'Comic Sans MS',cursive; font-size: 11px; margin: 0px; padding: 0px;"> 90,928,089 <span style="font-family: 'Comic Sans MS',cursive; font-size: 11px; margin: 0px; padding: 0px;"> 95,773,902 <span style="font-family: 'Comic Sans MS',cursive; font-size: 11px; margin: 0px; padding: 0px;"> 96,136,500 <span style="font-family: 'Comic Sans MS',cursive; font-size: 11px; margin: 0px; padding: 0px;"> 99,535,969 __<span style="font-family: 'Comic Sans MS',cursive; font-size: 11px;"> + 98,607,593 __

= 842,511,522 / 9 = //93,612,391//

Median (middle number) = // 90,928,089 //

Mode = Since no values repeat, there is no mode for Energy

Range (highest value – lowest value) = 98,607,593 - 88,765,413 = //9,842,180//


 * = Year  ||=  X  ||=  Deviation  ||=  Deviation squared  ||
 * 2000 || 94,606,870 || 994,479 || 9.89 x 10^11 ||
 * 2001 || 89,116,064 || -4,496,327 || 2.02 x 10^13 ||
 * 2002 || 88,765,413 || -4,846,978 || 2.35 x 10^13 ||
 * 2003 || 89,041,122 || -4,571,269 || 2.09 x 10^13 ||
 * 2004 || 90,928,089 || -2,684,302 || 7.21 x 10^12 ||
 * 2005 || 95,773,902 || 2,161,511 || 4.67 x 10^12 ||
 * 2006 || 96,136,500 || 2,524,109 || 6.37 x 10^12 ||
 * 2007 || 99,535,969 || 5,923,578 || 3.51 x 10^13 ||
 * 2008 || 98,607,593 || 4,995,202 || 2.495 x 10^13 ||
 * ||  ||   || ** ∑ 1.44 x 10^14 ** ||

Standard Deviation for Energy Increase



= (1.44 x 10^14) / (9 -1) = 1.799 x 10^13 (then take square root) = //4,241,005//

- ---  Calculating the Correlation Coefficient

r = __2.4255 x 10^17 - 2.4241 x 10^17__ (5.3588 x 10^13)(1.17443 x 10^15)
 * X ||  Y  ||  X^2  ||  Y^2  ||  XY  ||
 * 30,685,730 || 94,606,870 || 9.42 x 10^14 || 8.950 x 10^15 || 2.903 x 10^15  ||
 * 31,019,020 || 89,116,064 || 9.62 x 10^14 || 7.942 x 10^15 || 2.764 x 10^15  ||
 * 31,353,656 || 88,765,413 || 9.83 x 10^14 || 7.879 x 10^15 || 2.783 x 10^15  ||
 * 31,639,670 || 89,041,122 || 1.00 x 10^15 || 7.928 x 10^15 || 2.817 x 10^15  ||
 * 31,940,676 || 90,928,089 || 1.02 x 10^15 || 8.268 x 10^15 || 2.904 x 10^15  ||
 * 32,245,209 || 95,773,902 || 1.04 x 10^15 || 9.173 x 10^15 || 3.088 x 10^15  ||
 * 32,576,074 || 96,136,500 || 1.06 x 10^15 || 9.242 x 10^15 || 3.132 x 10^15  ||
 * 32,931,956 || 99,535,969 || 1.08 x 10^15 || 9.907 x 10^15 || 3.278 x 10^15  ||
 * 33,327,337 || 98,607,593 || 1.11 x 10^15 || 9.723 x 10^15 || 3.286 x 10^15  ||
 * ** ∑ 287,719,328 ** || ** ∑ 842,511,522 ** || ** ∑ 9.204 x 10^15 ** || ** ∑ 7.9 x 10^16 ** || ** ∑ **** 2.695 x 10^16 ** ||

r = __1.4 x 10^14__ 2.5086 x 10^14 - *r = 0.55808 -

- ---  Calculating the Line Of best Fit (the long annoying way instead of cheating using excel lolsss)

Step 1. Find slope (m value) __= (10)(2.695 x 10^16) - (<span style="border-collapse: collapse; color: black; font-family: Arial; font-size: 10pt; line-height: normal;">287,719,328)(842,511,522) __ (10)( 9.204 x 10^15 ** ) - ( ** <span style="border-collapse: collapse; color: black; font-family: Arial; font-size: 10pt; line-height: normal;">287,719,328)^2

m = (__2.695 x 10^17) - (2.42 x 10^17)__ . (9.204 x 10^16) - (8.278 x 10^16)

m = __2.75 x 10^16__ . 9.26 x 10^15

m = 2.97

Mean of x = 287,719,328 / 10 = 28,771,933 Mean of y = 842,511,522 / 10 = 84,251,152 y = mx + b

Step 2: Find the y intercept (b value) b = y - mx b = 84,251,152 - (2.97)(28,771,933) b = - 1201,489 Therefore y = 2.97x - 1.201,489 is equation of line of best fit.

Don't believe me??? LETS TEST IT FOR ACCURACY!!!

Lets take a random x value and sub it into the equation I found, and see where abouts it lies on the graph. For example:

Let x = 33,000,000

y = (2.97)(33,000,000) - (1,201,489) y = 96,808,511

What this means is when x is at 33,000,000 people, according to our line of best fit, the energy (or y value) should be 96,808,511 gigajoules.

*SCROLL UP AND SEE GRAPH TO VERIFY RESULTS*.



- --- Therefore the correlation coefficient between the two variables Population vs. Energy consumption is 0.55808, which is a positive moderate correlation. If you scroll up for 97 minutes, you are able to see that in my hypothesis stated a prediction of a positive moderate correlation. In reality, this means that as the population in Canada is increasing at a rate that directly connects with the Energy being consumed in Canada, resulting in the positive moderate correlation coefficient of 0.55808, and I'm extremely lucky they actually had any relation, and a boss, but mostly the 7th one.

- ---

//**__ Works Cited: __**//

<span style="font-family: 'Comic Sans MS',cursive;">Roberts, P. (2008). Energy Crisis. Wikipedia. Retrieved December 26, 2010, from http://en.wikipedia.org/wiki/Energy_crisis

<span style="font-family: 'Comic Sans MS',cursive;">Statistics Canada. Table 128-0005 - Energy fuel consumption of manufacturing industries in natural units, by North American Industry Classification System (NAICS), annual (cubic metres unless otherwise noted) (graph), CANSIM (database), Using E-STAT (distributor). <span style="font-family: 'Comic Sans MS',cursive;"> [] <span style="font-family: 'Comic Sans MS',cursive;"> (accessed: November 14, 2010)

<span style="font-family: 'Comic Sans MS',cursive;">Statistics Canada. Table 051-0001 - Estimates of population, by age group and sex for July 1, Canada, provinces and territories, annual (persons unless otherwise noted) (graph), CANSIM (database), Using E-STAT (distributor). <span style="font-family: 'Comic Sans MS',cursive;"> [] <span style="font-family: 'Comic Sans MS',cursive;"> (accessed: November 14, 2010)

__//**<span style="font-family: 'Comic Sans MS',cursive;">Pictures: **//__

<span style="font-family: 'Comic Sans MS',cursive; font-size: 12px;">http://upload.wikimedia.org/wikipedia/commons/f/ff/Imported_Crude_Oil_as_a_Percent_of_US_Consumption_1950-2003.jpg

<span style="font-family: 'Comic Sans MS',cursive; font-size: 12px;">http://upload.wikimedia.org/wikipedia/commons/6/66/Gcprrets.gif

[]

[]

<span style="display: block; font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 1em; text-align: center;">