fbpx
  • LOGIN
  • No products in the cart.

Prerequisites:

  • Basic skills with at least one programming language are desirable.

 

Training Program Description:

  • the capability of collecting and storing huge amounts of versatile data necessitate the development and use of new techniques and methodologies for processing and analyzing big data. this course provides a comprehensive covering of a number of technologies that are at the foundation of the big data movement. the Hadoop architecture and ecosystem of tools
  • provides an introduction to machine learning and statistical data analysis. The course provides an introduction to the basic probability theory, statistics, and statistical data analysis. Topics such as parameter estimation, hypothesis testing and regression analysis will be covered in the course. In addition, the course will focus on machine learning topics including Bayes classifiers, K-nn, decision trees, SVM, K-means, principal component analysis, independent component analysis and Neural Nets.
  • develop on their combined knowledge of Big Data technologies (e.g. Hadoop, Spark, etc.) and Data Science (e.g. Statistics, Machine Learning, etc.) and understand how such combination is used to solve real-world applications. In addition to this main goal, the program has the additional goal of familiarizing trainees with the latest technological and scientific trends in the field and how Big Data and data science are used in modern business enterprises. Use cases of real problems such as networking traffic, text analytics, and financial applications will be addressed in this course.
  • link the machine learning theories and methods in a practical real-life use case context. The hands-on labs will reinforce) with deeper focus in applying them to enable customers realize business value.

Projects

    • This program is comprised of many career-oriented projects. Each project you build will be an opportunity to demonstrate what you’ve learned in the lessons. Your completed projects will become part of a career portfolio that will demonstrate to potential employers that you have skills in data analysis and feature engineering, machine learning algorithms, and training and evaluating models.
    • One of our main goals at ETI is to help you create a job-ready portfolio of completed projects. Building a project is one of the best ways to test the skills you’ve acquired and to demonstrate your newfound abilities to future employers or colleagues. Throughout this program, you’ll have the opportunity to prove your skills by building the following projects
    • Building a project is one of the best ways both to test the skills you’ve acquired and to demonstrate your newfound abilities to future employers. Throughout this program, you’ll have the opportunity to prove your skills by building the following projects:
        • Project 1: Exploring the Titanic Survival Data
        • Project 2: Predicting Housing Prices
        • Project 3: Finding Donors for Charity
        • Project 4: Creating Customer Segments Deep learning
        • project 5: Dog Breed Recognition
        • Project 6: Teach a Quad copter to Fly
        • Project 7: Explore Weather Trends
        • Project 8: Investigate a Dataset
        • Project 9: Analyze Experiment Results
        • Project 10: Wrangle and Analyze Data
        • Project 11: Communicate Data Findings
        • Project 12: Crime Prediction
        • Project 13: Simulating and Predicting Traffic
        • Project 14: Fraud Detection
    • Capstone projects in many fields
      • Business
      • Trading

 

Program Duration: 150 hours

Program Language: English / Arabic

Location: EPSILON TRAINING INSTITUTE | Head Office

 

Participants will be granted a completion certificate from Epsilon Training Institute, USA if they attend a minimum of 80 percent of the direct contact hours of the Program and after fulfilling program requirements (passing both Final Exam and Project to obtain the Certificate)

 

COURSE CONTENTS

 

I- Introduction to Big Data, Developing with Spark and Hadoop

  • Introduction to Hadoop and MapReduce SQL JOINS
    • Hadoop Ecosystems
    • Hadoop Clusters
    • MapReduce API Concepts
    • Basic Writing and testing MapReduce programs
  • Hadoop API
    • ToolRunner Class
    • HDFS programmatically
    • Using the Hadoop API s Library of Mappers, Reducers and Practitioners
  • Managing Data Input and Output
  • Common MapReduce Algorithms
    • Sorting and Searching Large Data Sets
    • Indexing Data
    • Computing Term Frequency
    • Inverse Document Frequency (TF4IDF)
    • Calculating Word Co4Occurrence
  • Joining Data Sets in MapReduce Jobs
  • Hadoop Tools for Data Acquisition
  • Practical Development Tips and Techniques
    • Strategies for Debugging and Testing MapReduce Code
    • Reusing Objects
    • Creating Map4only MapReduce Jobs
  • PIG
    • Complex Data Analysis with Pig
    • Multi Dataset Operations with Pig
    • Extending Pig
    • Pig Troubleshooting and Optimization
  • Hive
    • Relational Data Analysis with Hive
    • Hive Data Management
    • Text Processing with Hive
    • Hive Optimization
    • Extending Hive
  • Analyzing Data with Impala
  • Introduction to Spark
    • Spark Basics
    • Working with Resilient Distributed Datasets (RDDs)

 

II- Introduction to Machine Learning and Statistical Analysis

A- Introduction

  • APPLICATIONS
  • Relation between Statistics and Learning
  • Supervised, Unsupervised and Reinforcement Learning
  • Linear Algebra Review
    • Vector and Matrix Operations
    • Matrix Inverse and Decomposition
    • The Eigenvalue Problem
  • Analysis Tools
    • Python Programming
    • Waikato Environment for Knowledge Analysis (WEKA)
    • Azure Platform

B- STATISTICS ANALYSIS

  • probability Theory Review
    • Marginal and joint Probabilities
    • Conditional Probabilities
    • Bayes’ Rule
    • Prior and Posterior Probabilities
    • Probability Distributions
    • Expected Value, Variance and Covariance
  • Statistical Parameter Estimation:
    • Types of Estimators
    • Random Sampling of a Population
    • Estimation of the Mean and Variance
    • Detection of Outliers
    • Data representation and Visualization
  • Hypothesis Testing
    • Confidence Interval and p-value
    • Alternative Hypotheses
    • Z-test and T-test
  • Regression Analysis
    • Assumptions of Linear Regression
    • Simple Linear Regression
    • Error Analysis

C- Machine Learning

  • Linear Classification:
    • Discriminant Functions
    • Discriminant Functions Properties
    • Least Squares Classifier
    • Fisher’s Linear Discriminant
    • Perceptron
  • Probabilistic Generative Models
    • Maximum Likelihood Estimation of Gaussian Generative Model
    • Naive Bayes Classifier
  • Probabilistic Discriminative Models
    • Logistic Regression
  • Non-linear Classification
    • Instance-based Learning:
    • K-nearest Neighbor Classifier
      • Cross-validation
      • Weighted K-nearest Neighbor Classifier
    • Support Vector Machines
    • Decision Tree Learning
    • Artificial Neural Networks:
      • Network Architecture
      • Back-propagation Learning
  • Introduction to Reinforcement Learning
    • Markov Decision Process
    • Q-learning
    • Non-deterministic Rewards and Actions

III- Advanced Big Data Analytics Technologies and Applications

  • Analyzing Data with Scala and Spark
  • Predicting Forest Cover with Decision Trees
  • Anomaly Detection in Network Traffic with K-means Clustering
  • Understanding Wikipedia with Latent Semantic Analysis
  • Analyzing Co-occurrence Networks with GraphX
  • Geospatial and Temporal Data Analysis on Taxi Trip Data
  • Estimating Financial Risk through Monte Carlo Simulation

 

IV- Practical Data Science Using Machine Learning Techniques

  • Big Data and Data Science: Use Cases
    • Walk through example use cases
    • The use case: solution overview and architecture The practice of the Data Sciences vs Traditional DW/BI
    • Data Sciences and Big Data Applications: Value to the Business
  • Data Preprocessing
    • Features Extraction and Transformations
    • Dimensionality Reduction
    • Visualization and Exploratory Data Analysis
    • Data Integration, Quality and Implications.
    • Handling Big Datasets
  • Advanced Data Analysis Methods with Applications
    • Unstructured Data Methods
    • Association Rules: Understand Customers Behavior
    • Clustering Techniques: Optimized Logistics
    • Classifications Methods: Prediction of Traffic Status
    • Network Analysis Techniques: Discover Social Patterns
    • Big Graph: Analyzing Electric Power Grids
    • Ensemble Learning techniques
  • Practicing Big Data Sciences in Real Life
    • The Data Sciences modeling lifecycle
    • Machine Learning modeling for Big Data applications
    • Data Sciences application implementation lifecycle
    • Deep Learning for Complex Data Science Models
    • Machine Learning Agile modeling approach
    • Data Driven Transformation for Organizations
    • Consulting Skills for the Data Sciences and Big Data Solutions
    • Deployment Considerations for the Big Data Platforms

V- Hands-on Group Project Based on Real-life Use Case

 

Download Big Data Brochure PDF

BIG_DATA_DIPLOMA

 

Course Curriculum

Introduction to Big Data, Developing with Spark and Hadoop
Introduction to Hadoop and MapReduce SQL JOINS
Hadoop API
Managing Data Input and Output
Common MapReduce Algorithms
Joining Data Sets in MapReduce Jobs
Hadoop Tools for Data Acquisition
Practical Development Tips and Techniques
PIG
Hive
Analyzing Data with Impala
Introduction to Spark
Introduction to Machine Learning and Statistical Analysis
Linear Algebra Review
Analysis Tools
STATISTICS ANALYSIS
probability Theory Review
Statistical Parameter Estimation
Hypothesis Testing
Regression Analysis
Machine Learning
Linear Classification
Probabilistic Generative Models
Probabilistic Discriminative Models
Non-linear Classification
Introduction to Reinforcement Learning
Advanced Big Data Analytics Technologies and Applications
Practical Data Science Using Machine Learning Techniques
Big Data and Data Science: Use Cases
Data Preprocessing
Advanced Data Analysis Methods with Applications
Practicing Big Data Sciences in Real Life
Hands-on Group Project Based on Real-life Use Case

Drop Us A Query

11877 STUDENTS ENROLLED

    Related Courses Widget

    Top Rated Course

    Copyright © 2018 Epsilon Registered in Egypt with company no. 118268
    X