The Introduction to Data Science class will survey the foundational topics in data science, namely:

  • Data Manipulation
  • Data Analysis with Statistics and Machine Learning
  • Data Communication with Information Visualization
  • Data at Scale -- Working with Big Data
  • Introduction to R

The class will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give you the opportunity to sample and apply the basic techniques of data science.

  • Any Graduates(B.A,B.Com,B.Sc)
  • Engineering Students(B.Tech, B.E, M.Tech)
  • BCA,MCA
  • Any Diploma Holder
  • Any Working Professionals
  • Data Warehouse Administrators
  • Database Administrators
  • Software Tester
  • Project Manager
  • MIS Support

  • Fundamentals of BigData

    • What is Big Data
    • Managing Bigdata
    • Extracting insights from Bigdata
    • Bigdata for business intelligence

     

    Development Environment

    • Introduction to Databricks
    • Databricks account setup

     

    Spark - DataFrame API

    • Create DataFrame
    • Schema Inference
    • File formats - awareness
    • Define custom schema
    • Introduction to DBFS

     

    Spark - DataFrame - Functions

    • Functions, Filters & Aggregations
    • Windowing
    • Partitions & Bucketing
    • Joins

     

    Spark SQL

    • Functions, Filters & Aggregations
    • Windowing
    • Partitions & Bucketing
    • Joins

     

    Spark Architecture

    • Why Spark
    • Distributing computing
    • Cluster concepts
    • RDD concepts
    • Memory Management
    • Spark Optimization
    • Structured Streaming
    • Spark UI

     

    ELT with Spark SQL

    • Data Extraction techniques
    • Data load features
    • Transformation techniques
    • Delta lake
    • Lakehouse architecture

     

    Big Data eco system

    • Resource Manager( YARN)
    • HIVE
    • Fundamentals of Cloud Computing
    • Introduction, Cloud computing architecture
    • Delivery Models, Deployment Models and Benefits of moving to cloud

     

    Just Enough Scala/Python for Spark Programmers

    • Getting started with Python Vairables and DataTypes Loops and Conditions Methods
    • Functions and Packages Collection and Classes

     

    Project:   Real Life Case Study

  • Learners are taught to understand business intelligence and business and data analytics.
  • To understand the business data analysis through the powerful tools of data application.
  • Learn how to apply Tableau, MapReduce, and get introduced in to R and R+.
  • Understand the methods of data mining and creation of decision tree.
  • Explore different aspects of Big Data Technologies.
  • Learn the concepts of loop functions and debugging tools.