Data Science

Data Science

Introduction

A “Data Scientist” has been touted as the sexiest job of the 21st century. This course is about giving you a comprehensive introduction to the vast and exciting field of data science. You will learn to gain useful insights and make predictions from the data you collect. Through real world examples and case studies you will learn the following key facets on your journey to become a data scientist :-

  • Data Wrangling/Munging/Scraping/Cleaning in order to transform your raw data into a manageable informative dataset.
  • Exploratory Data Analysis to create hypotheses and intuitions about the data.
  • Predictions based on machine learning algorithms like regression, classification and clustering.
  • Story Telling or communicating your data story through effective visualizations for the intended audience.

Why Take This Course?

Almost all industries are today well aware about the importance of data-driven decision making. However, the biggest missing link in their pursuit to maximize their gains, is finding the people with the right skill set. If you were to go with a dated McKinsey report, in the United States alone, there is going to be an acute shortage of nearly 190,000 by the year 2018. The dearth in European or Asian markets is yet unknown, but suffices to assume that the demand is real and astounding. Not surprisingly, MOOCs on sites like coursera.org, EdX, Udemy, Udacity etc. courses in Data Science are seeing an unprecedented growth. However, while everyone seems to be clamouring for such courses, the completion of these courses is a dismal 7% or lesser. One of the biggest reasons for this is lack of teacher-student interaction. This course will bridge that gap through a weekly in-person interaction with our experienced faculty in addition to offline query-addressing over emails or if required through video calls. While theoretical rigour cannot be omitted, our approach is to teach data science through realistic case studies which you will be able to showcase as “work experience” on your Github account. If data science was to be likened to a sport, say cricket, then a data scientist is an all-rounder. Rather than the Sachin Tendulkars or a Brian Laras, the industry needs the Ravindra Jadejas and Andrew Flintoffs of the world. Through this course, we endeavour to put you on the path to succeeding in this lucrative, yet demanding field. Knowing how to program in Python, R or Java does not make you a data scientist. Nor does a solid theoretical background in Statistics and Probability Theory give you the cutting edge. This course will assume absolutely no background or pre-requisites and shall gently equip you with the right tools and knowledge that you need to swim in the deep end.

Expected Learning Outcomes

By the end of this course, you will be able to carry out the following :-

  • Import data into R (a statistical programming language) from various sources be it structured or unstructured.
  • Perform tidying, transformation and wrangling operations to make the data ready for analysis.
  • Visualize your data to reveal patterns in your data.
  • Apply machine learning algorithms like regression, classification and clustering to predict outcomes.
  • Present your findings through effective visualizations using packages like ggplot2 and base R’s plotting functions.

Course Curriculum

The course will be spaced over a duration of eight weeks. Instructor led classes in the form of a webinar will be held every week on Saturday/Sunday with each session lasting 4 hours. Material used for teaching is the sole property of Pythian Technologies and may not be translated or copied in whole or in part without the written permission of Pythian Consultants. At the end of every week, assignments will be provided to the students which they would be required to turn in on their Github accounts no later than the following Saturday by midnight. Grades will be provided based on the performance in the assignments. Having understood the pain areas in learning data science, the approach adopted by Pythian Technologies is based on practicality. Theoretical knowledge for all the modules will be explained through relevant case studies and all instructor led classes will require students to practically write code in R. Brief outline of the course is as follows:-

  1. Data Visualization using tidyverse and ggplot2.
  2. Aesthetic Mappings, Facets, Geometric Objects.
  3. Statistical Transformations.
  4. Data Transformation using dplyr and tidyr.
  5. Exploratory Data Analysis.
  6. Dealing with missing values.
  7. Patterns & Models.
  8. Working with tibbles and data frames.
  9. Parsing vectors/files.
  10. Reading/Writing to a file.
  11. Spreading & Gathering data using tidyr.
  12. Non tidy data.
  13. Relational Data – Keys/Mutating Joins.
  14. Set Operations.
  15. Dealing with strings and factors.
  16. Handling Dates.
  17. Pipes, Functions, Vectors and Iterations in R.
  18. Model Building.
  19. Communicating Results.
  20. R Markdown Workflow.

Student Enrolment & Evaluation

Students desirous of enrolling for the course will have to udergo a basic aptitude test designed for testing basic analytical skills like an understanding of mean, median, mode etc. On clearing the basic aptitude test, the student will be enrolled in the Foundations Of Data Science course.

Assessment during the course will based on submission of weekly data projects/assignments which will have to be uploaded on respective Github accounts. Each assignment will carry a maximum of 10 marks and students will be graded on the basis of a standard rubric applicable to all students. The rubric will take into account coding practices, accomplishment of assigned task, use of functions, etc. At the end of the course, a student will be eligible for a completion certificate from Pythian Consultants if the total marks secured are greater than 75%.

Course Fees

Course fee for the Foundations in Data Science course will be Rs 30,000 + GST/- per student for the entire duration of 8 weeks payable before the commencement of the course.

A minimum of 20 students is required to start a batch. Those students who successfully complete the course, are also eligible to undertake a specialization course during which they will also have to undertake a Capstone project.

Commander Dhiraj Khanna (retd)

Mentor


Cdr Dhiraj Khanna has spent over 20 years in the Indian Navy. During this period, he has been involved in design and development of wireless military data links deployed on all platforms of the Navy. He has worked extensively on all 7 layers of the OSI model and has been instrumental in the design of proprietary algorithms on the transport and network layers of the Indian Naval Data Link. For the last six years, Dhiraj was also heading the Software Defined Radio program of the Navy which is a key enabler of fleet functionality of combat management systems through the use of high data rate waveforms. During his tenure in the Navy, Dhiraj also designed, developed and tested conformal log periodic dipole array antennas at the Navy’s prestigious Antenna Test Range at Naval EMC unit in Jamnagar. With his strong background of statistical signal processing and machine learning, Dhiraj went on to design a maritime anomaly detection system capable of detection of anomalies in AIS data emanated by merchant vessels in real time for homeland security.

A passionate teacher, Dhiraj believes in dissemination of knowledge and hence mentors working professionals and students online on data science and analytics. He holds a MTech in lasers and electro-optics and has also undergone a certificate course in predictive business analytics from Northwestern University, Chicago.

Commander Ajit Pal Singh (retd)

Mentor


Commander Ajit has spent more than two decades of his career with Indian Navy in design, development and integration of most contemporary Naval Combat Systems. His passion for design and development of mission-critical real-time embedded systems resulted in fielding of more than 300 such state-of- the-art systems onboard Indian Naval Ships, Submarines and Aircrafts. He led a team of 50 engineers and DRDO scientists indelivering combat system integration solutions in prestigious projects, such as Indigenous Aircraft Carrier (P71), Shivalik and Kolkata Class Destroyers, Indigenous Nuclear Submarine Program, Life Cycle support programs for conventional Russian (EKM) and German (SSK) submarines and several aviation projects. He was also instrumental in design, development and fielding of mission critical shipboard ‘Tactical Databuses’, enabling networking of various naval sensors and weapons on a common backbone. His post graduation work at IIT Delhi in Sonar Target Tracking algorithms and wireless communications led him into the exciting world of ‘Prediction Filtering’ and ‘Statistical Signal Processing’. He is instrumental in design and development of various ‘Adaptive Filters’ using Machine Learning Algorithms. Armed with a Degree in ‘Predictive Business Analytics’ from North Western University, Chicago, and various Business Analytics courses from Wharton, University of Pennsylvania Commander Ajit currently utilizes his vast knowledge and military experience in exploiting Machine Learning algorithms in solving business problems. In addition, having extensive experience in sensor networks and systems integration, he is currently heading design and development of various ‘Internet of Things (IoT)’ solutions.

Data Science Registration