Data Workshop

Rasmus Erik

September 2018

Welcome and intro to Jupyter


English or Danish?

  • Purpose and background
  • Plan / topics
  • Structure
  • Introduction to Jupyter Notebooks


  • Purpose: your learning
    Applicable tools and useful knowledge in studies, research, and professionally
  • My background: Own company, consultant, and software development. Computer Scientist from KU, including library and information science and teaching. I am here, because I want to pass on my experience
  • Your Background: Semester? IT-experience? Interests/passions? Projects?

Topics / plan

  • Welcome and introduction to Jupyter
  • Fetching and working with data
  • Wordcloud visualisation (and learning)
  • Computer language - (and tactics vs strategy)
  • Structured data (JSON)
  • WWW / Internet - HTTP
  • Data Science - literature case study*
  • Digital images*
  • Your topic!*
  • Conclusion


Exploring a series of topics.

  • Theory / lecture / perspective
  • Examples - follow along
  • Exercises, free experimentation, including break
  • Follow-up

Do interrupt and ask questions!

What is Jupyter Notebook

  • Tool for working with data
  • Primary tool within science
  • Web application, cloud + local


  • Opening Jupyter Notebooks
  • Jupyter tour: modes, cells, text and code, builtin help, errors, kernel
  • Calculation with data


  • User interface tour
  • Exercises from Introduction to Jupyter (in Danish, from

Follow up on exercises

Fetching and calculating with data

Computer language

Analogy - a different country:
Pointing and gestures vs language


Examples of computer languages


Information overload...

  • calculating with data
  • debugging
  • importing and using functionality: request, frequency, random
  • random word
  • random jargon entry
  • word frequencies
  • naming


  • fetch urls, and print content
  • generate random sentences, - like random words, but with subject-verb-object words
  • random fortune-quote
  • word frequencies from online book of choice

Follow up on exercises

Learning, and Visualisation: Wordclouds

Blooms taxonomy

Blooms taxonomy
Blooms taxonomy


  • wordcloud
    • with own words
    • with popular words from book
    • with stop-words


  • Wordcloud
    • with own words
    • with popular words from data source of choice

Follow up on exercises

Computer language

What is data

  • Computer = calculator
  • All is numbers
  • Recipies

Tactics and Strategy

  • Overview vs detail
  • Pair programming
  • Tip for working together


  • definitions
  • for loops
    • all word combinations
  • list comprehension
  • refactoring code


  • repeated random sentences / words
  • refactor code from previous exercises

Follow up on exercises

Types of data


  • strings
  • numbers
  • true / false (Boolean values)
  • lists
  • dictionaries

Data structures

Nested lists and dictionaries

  • bibliography example
  • person / social media example


  • Fetching and accessing JSON-data:
    • wikipedia
    • reddit


  • Wordcloud of linked articles from wikipedia
  • Wordcloud from popular words on reddit

Follow up on exercises

About the Web / Internet

Web and HTTP

  • How the internet works (whiteboard)
  • Look at brows http-requests
  • Web data


  • YouTube OGP
  • RSS-DR
  • Creative Commons photots


  • List headlines
  • Show pictures from search

Follow up on exercises

Data Science

The Scientific method

  • Question
  • Hypothesis
  • Prediction
  • Testing
  • Analysis

Data Science

  • Whiteboard examples
    • Clustering
    • Models - linear regression
  • Supervised vs unsupervised


  • Topic-space
  • Recommender
  • Clustering
  • Meta-data analysis


  • implement the examples yourself
  • explain / discuss the examples to your neighbour

Follow up on exercises


What are images

  • Images as numbers
  • What is color


  • Fetching image
  • Scale image
  • Composition


  • Random collage
  • Find images via API and compose

Follow up on exercises

  • Image color analysis

Your topic

Live coding example and exercise



  • Neighbours: what is the primary thing you remember / have learned?
  • Brainstorm of cases for next time?

Futher studies

NB: Python 3 vs 2.