For this session, use the presentation "Finding Data" in the Lesson Resources Folder.
Getting Started (5 min)
Given this data: [slide 1]
A blood drive at the local high school reveals that 20% of the students were HIV positive.
Journal on these questions:
- What is your immediate reaction?
- What questions do you have?
Activities (40 minutes)
Activity 1 (5 min) - Discuss the journal prompt
Lead the students in discussion using the bullets below and slide 2 of the PowerPoint as guidence. Students should talk about WHY they assumed the data was true, or were uncomfortable questioning the truth of the data.
Activity 2 (5 min) - Brainstorm: What kinds of data can be found online?
Part 1 - Discussion
Data comes from many places and takes many forms [slide 3]
- Have students discuss: How do business, personal, government and devices create and use data?
Part 2 - Brainstorm
Brainstorm as a class: what kinds of data are generated? Possible answers:
- video: movies, webcam images, CCTV, youTube, Netflix, Facebook, etc.
- pictures: maps, Instagram, photos, cartoons, drawings, …. everything!
- words: books, articles, news, stories, blogs, Facebook
- numbers: facts, financial transactions, scientific data
- sound: music, speech
- behavior tracking: GPS, click behavior, search history
Activity 3: How is meaning created from data? (10 minutes)
- Look at some data gathered about selfies from different cities around the world. [slide 4]
- Main ideas:
- You have to gather the data and analyze it to create meaning.
- Creating meaning from pictures still takes some human interpretation.
- Prompt students to come to a conclusion about the graphed data on the page.
- Question for discussion: How large of a sample is needed to draw a conclusion?
- Quick review: Make the point that there is a LOT of data even in a single picture. [slide 5]
- Define these and put them in order. Use this webpage to review bytes: http://highscalability.com/blog/2012/9/11/how-big-is-a-petabyte-exabyte-zettabyte-or-a-yottabyte.html
- MB, bit, TB, ZB, byte, GB, pixel (one dot of color on the screen), KB, PB
- Look at the photo on slide 5.
- 365 gigapixels is 365 billion pixels, if the picture is a square, then it is 604,152 pixels on each side (too big to fit on any HDTV screen)
- http://www.rtings.com/info/what-is-the-resolution A 4K super high resolution TV is only about 3,000 X 2,000 pixels. Even a movie screen can’t show all of the detail!
- https://www.amctheatres.com/sony4k, you can only look at it one part at a time.
- Preview Wolfram Alpha, an engine for providing knowledge from data.
Activity 4 (20 min) - Work with some data online
- Students should complete the Data Search and Analysis Handout. [slide 7]
- Depending on how much time you have, you can pair students and assign even/odd questions or chunks of questions to different groups, or have each student research on their own.
- If there’s time in class, try to go over results and compare (especially the first 5) to see if people got similar answers. Why or why not? [slide 8]
Assign Homework (5 minutes)
Give students the worksheet: Homework Unit 4 Lesson 1.
There are 10 videos to choose from, each 10-15 minutes long. Either allow students to self-select, or assign them a particular video. Students should watch the video and answer the questions on the worksheet. This is an opportunity to discuss plagiarism: students are expected to watch the video and write from their own experience.
For this session, use the presentation: Finding and Analyzing Data from the Lesson Resources Folder
Getting Started (5 min)
Students should journal on the following: Describe at least 2 ways that we create meaning out of data. [slide 1]
- Possible answers: graph it, total it, average it, find min and max, map it, compare it to other data, find trends, generate predictions (like weather), draw conclusions (facial recognition, emotions, voice inflection), diagnose diseases, discover new stars, etc.
Activities (40 min)
Activity 1 (35 min): Analyzing Data
Part 1: Correlation vs. Causation
- Look at slide 2 from the PowerPoint. Creating meaning from data can be misleading.
- Point out that the graph shows a direct relationships between the number of divorces in Maine and the amount of margarine that is purchased. When one goes up, the other does too, and vice versa. Is this a causal relationship?
- Show some examples from the Tyler Vigen website http://www.tylervigen.com/spurious-correlations . It has many examples of data connections that may be statistically valid but don’t make sense. The site was created to point out how comparisons due to data correlation are often not valid.
Part 2: Data Science
- What does a data scientist do? [slide 3]
- Show the two videos and discuss.
- Tricks to analyzing big data:
- Knowing what data to use, and what to disregard.
- Knowing how to make up for missing data.
- Knowing how to discover and predict trends and correlations.
- There are many degrees offered in data science, and free online courses are available from Udacity and Coursera, among others.
- Look at 3 false assumptions about big data [slide 4]:
- It’s complete and accurate
- It tells the whole story
- Bigger is better
- What considerations and tradeoffs arise in the computational manipulation of data? [slide 4]
- How do you account for missing data?
- How do you certify your sources?
- How do you decide which data to include and which to exclude?
- How much data is enough? (time is money!)
- Are your processing algorithms accurate?
- What is some of the data needed to successfully fly a space mission? (Possible answer: Knowing all about the spacecraft: speed, direction, amount of fuel/oxygen left.) The same problems that applied to early space missions are some of the same problems faced in dealing with big data.
- You need to decide which factors to include in your calculations, and which to exclude.
- You need to decide when to make an assumption for missing data or when to estimate.
- In writing a program for an early space flight there are many unknown factors using a space craft that has never flown before.
- It’s usually impossible to create a perfect algorithm that can take into account every possibility, so how do you allow for errors and changes?
- What are some of the calculations needed? (Possible answers: how much fuel to release and with which engines.)
- They had to run many simulations first to see what would happen under various circumstances.
- See if anybody knows how NetFlix, movie makers, or Amazon use data about their customers to be more successful. [slide 5] http://www.smartdatacollective.com/bernardmarr/312146/big-data-how-netflix-uses-it-drive-business-success and http://www.fastcompany.com/3024655/pitch-perfect-and-how-analytics-are-transforming-movie-marketing
Businesses like Amazon and NetFlix learn the habits of different customers and make recommendations based on their previous choices and others who share similar characteristics (like Google ads).
See if anybody knows the story of Moneyball (based on a true story) of how a baseball team made decisions based on data analysis to become winners, https://en.wikipedia.org/wiki/Moneyball_(film) and how Vivek Ranadivé--who knew little about basketball but owned a multi-million dollar computer processing company and knew how to choose and analyze data--coached his then twelve-year-old daughter’s National Junior Championship basketball team to the national championship game. He relied upon his sporting knowledge of soccer and cricket paired with his analytic mindset, to create a system of play which allowed his relatively un-athletic team to excel. From the moment that he used intellect and his business experience to coach an inexperienced team to the championship game, the man who once thought basketball was “mindless” was hooked on the sport. http://www.newyorker.com/magazine/2009/05/11/how-david-beats-goliath
- How is data analyzed? Data analysis requires an algorithm, a plan to collect and process data. [slide 6]
- Generate discussion about what data is collected and how it is analyzed. What is a possible algorithm for making a decision about choosing what movies NetFlix might suggest for a customer?
Brainstorm: what other data might they collect? (what’s currently popular in that age group, demographic, etc.)
- Choose one of the options and write an outline of an algorithm: choosing a movie to produce or a sports player to hire. [slide 7]
Share and discuss.
- Describe at least two calculations needed
- Describe some of the data you’d need to collect.
Activity 2 (5 min): Present homework from previous day after watching TED talks on data. [slide 8]
If time is short, choose only 1 or 2 of the questions from the homework to be presented to the class and collect the rest to grade.
Journal (5 min)
In your writing journal, map out the steps to answer a specific question or find a solution to solve a specific problem using data.
Data analysis activities from NOAA, NASA, and more! - http://climate-expeditions.org/educators/activities.html
What is data acquisition? - http://www.ni.com/data-acquisition/what-is/
Data analysis and graphs (with Excel sample) - http://www.sciencebuddies.org/science-fair-projects/project_data_analysis.shtml
Collecting and analyzing data - http://ctb.ku.edu/en/table-of-contents/evaluate/evaluate-community-interventions/collect-analyze-data/main
Using Excel for Handling, Graphing, and Analyzing Scientific Data: A Resource for Science and Mathematics Students - http://academic.pgcc.edu/psc/Excel_booklet.pdf