Machine learning kaggle

Machine learning with Scikit-learn and Kaggle

Transcript

1 machine learning with Scikit-learn and Kaggle Dr. B.Tödtli Distance University of Applied Sciences Switzerland (FFHS)

2 Machine learning with Scikit-learn and Kaggle bots, agents, image understanding, recommendation systems, bots, agents

3 Program Crash Course Machine Learning Kaggle Competitions Kaggle Kernels Discussion: Use in high school lessons? Fitness? Necessary adjustments? Contact Me!

4 Please register at kaggle.com (for later) Practical machine learning Exchange of ideas and experiences Open data sets, tutorials

5 machine learning crash course

6 What is machine learning? Machine learning is essentially curve fitting (F.Chollet) Learn to generalize from existing data to new situations Training set: (X, y) Learn a model y = a x + b Model predictions on test set: (X,?)

7 Important ML Application Classes Supervised Learning: Classification: Dog or Cat? Regression Unsupervised Learning: Understanding Data Clustering Data Compression Data Generation (~ Interpolation)

8 Unsupervised learning

9 Scikit-Learn package in Python for machine learning methods.fit () ,. predict (): learning and predictions Suitable for intermediate level?

10 Scikit-Learn package in Python for machine learning methods.fit () ,. predict (): learning and predictions Suitable for intermediate level?

11 Scipy-Stack Python NumPy Matplotlib Jupyter notebook Pandas See workshop by U. Künzi and B.Keller: Scipy and Jupyter Lab for math lessons

12 An Example of Machine Learning Data Generating Distribution

13 An example of machine learning Data-generating distribution (training) data points in the scatter diagram:

14 An example of machine learning Data-generating distribution (training) data points in the scatter plot Add 10% noise: Incorrectly labeled data points

15 An example of machine learning Data-generating distribution (training) data points in the scatter diagram Add 10% noise: Incorrectly labeled data points Data-generating distribution almost always unknown in practice

16 Generalization of training points given with a label How should new test points be classified?

17 Supervised learning Completely unknown target function Training data Cost function: how well does a hypothesis fit the data? Learning algorithm seeks the best possible hypothesis Cost function defined as best possible: e.g. mean square deviation of hypotheses set (e.g. all knn with k = 5) learning algorithm with cost function (e.g. gradient descent with least squares cost function) final hypothesis

18 K-Nearest Neighbors Which class label does a new data point get? Same as its k nearest neighbors!

19 K-Nearest Neighbors Which class label does a new data point get? Same as its k nearest neighbors! How many neighbors consider?

20 hyperparameters k overfitting! Underfitting!

21 Overfitting with regression Fit a polynomial on points Perfect fit on training points: poor generalization Underfitting! Overfitting!

22 Kaggle Competitions

23 Kaggle.com Learn Machine Learning! Datasets, tutorials, discussion forums, Kaggle Notebooks (with computing capacity) Kaggle InClass Competitions

24 free data records

25 free course materials on Kaggle Approx. 4 hours of materials 19 lessons Level 1, Level 2

26 discussion forums

27 notebooks with computing capacity Kaggle Kernels: Cloud computing environment Objectives: Exchange, learning, collaboration Reproducibility Kernels for R, Python Jupyter Notebooks

28 Machine learning Learning through competitions: Building experience Training dataset with correct label Test dataset with unknown label Competitions: Upload forecast to Kaggle.com

29 Host your own Kaggle Competition!

30 Private Kaggle Competitions

31 Private Kaggle Competitions Contact me!

32 Higher education example used in 3 modules at the Swiss Distance University of Applied Sciences Context Further professional training in data science (MAS / CAS Web4B) Goal: Gaining practical experience in machine learning with Python and Scikit-Learn The Kaggle competitions were great, absolutely retained.

33 University teaching example Pictures: Solution: Principal component analysis works well id, target 0.2 1.1 2.0 3.0 4.2 5.0 6.0 ...

34 Kaggle Kernels

35 New Kernel

36 InClass Kaggle Competition Enter the following link: or Click on Kernels Click on the demo notebook OED2019 Click on Fork to get your own editable notebook

37 Demo kernel Or bit.ly/2tdz0gu

38 Alternative Kaggle Competition: Regression? (Supplementary Material) Search for oed19 or enter the following link: Click on the Demo_Kaggle_Submission Kernel Click on Fork to get your own editable notebook

39 InClass Kaggle Competition (without kernels) Alternatively: Download the data and unzip it. The Jupyter Notebook can also be downloaded:

40 submission! Write the submission file: Upload file: Format: Title line Id, y

41 Take part in a competition with the kernel Edit your notebook Commit it

42 Commit a kernel Saving a version of the notebook: reproducibility of the results!

43 Participate in a competition with the kernel Commit the notebook Submit the output of your notebook as a prediction for the competition

44 Assessment in the competition Competitive element: Who is how good? Much more important, but often secret: which methods work well?

45 Leaderboard: Who will win?

46 Experiences from university teaching: Limited number of submissions

47 Experiences from university teaching: Limited number of submissions

48 University teaching example 1 Kaggle search for male-daan-quick-mal-classify Learning objective here: Relevant features are not x and y, but

49 Thank you for your attention! Contact: Questions?

50 Creating Kaggle Competitions