fbpx

Prerequisites:

  • Data Science Specialist Certificate

 

Training Program Description:

  • the capability of collecting and storing huge amounts of versatile data necessitates the development and use of new techniques and methodologies for processing and analyzing big data. this course provides comprehensive coverage of a number of technologies that are at the foundation of the big data movement. the Hadoop architecture and ecosystem of tools
  • provide an introduction to machine learning and statistical data analysis. The course provides an introduction to the basic probability theory, statistics, and statistical data analysis. Topics such as parameter estimation, hypothesis testing, and regression analysis will be covered in the course. In addition, the course will focus on machine learning topics including Bayes classifiers, KNN, decision trees, SVM, K-means, principal component analysis, independent component analysis, and Neural Nets.
  • develop on their combined knowledge of Big Data technologies (e.g. Hadoop, Spark, etc.) and Data Science (e.g. Statistics, Machine Learning, etc.) and understand how such combination is used to solve real-world applications. In addition to this main goal, the program has the additional goal of familiarizing trainees with the latest technological and scientific trends in the field and how Big Data and data science are used in modern business enterprises. Use cases of real problems such as networking traffic, text analytics, and financial applications will be addressed in this course.
  • link the machine learning theories and methods in a practical real-life use case context. The hands-on labs will reinforce) with a deeper focus in applying them to enable customers to realize business value.

Projects

    • This program is comprised of many career-oriented projects. Each project you build will be an opportunity to demonstrate what you've learned in the lessons. Your completed projects will become part of a career portfolio that will demonstrate to potential employers that you have skills in data analysis and feature engineering, machine learning algorithms, and training and evaluating models.
    • One of our main goals at EAII is to help you create a job-ready portfolio of completed projects. Building a project is one of the best ways to test the skills you've acquired and to demonstrate your newfound abilities to future employers or colleagues. Throughout this program, you'll have the opportunity to prove your skills by building the following projects
    •  Building a project is one of the best ways both to test the skills you've acquired and to demonstrate your newfound abilities to future employers. Throughout this program, you'll have the opportunity to prove your skills by building the following projects:

 

  • Project 1: Simulating and Predicting Traffic
  • Project 2: Crime Prediction
  • Project 3: Fraud Detection
  • Project 4:Explore Weather Trends
  • Project 5:Big Data with Spark
  • Capstone Project

 

Program Duration: 5 weeks

Program Language: English / Arabic

Location: EPSILON AI INSTITUTE | Head Office

 

Participants will be granted a completion certificate from Epsilon AI Institute, USA if they attend a minimum of 80 percent of the direct contact hours of the Program and after fulfilling program requirements (passing both Final Exam and Project to obtain the Certificate)

 

COURSE CONTENTS

 

I-Introduction to Big Data, Developing with Spark and Hadoop

  • Introduction to Hadoop and MapReduce SQL JOINS
    • Hadoop Ecosystems
    • Hadoop Clusters
    • MapReduce API Concepts
    • Basic Writing and testing MapReduce programs
  • Hadoop API
    • ToolRunner Class
    • HDFS programmatically
    • Using the Hadoop API s Library of Mappers, Reducers and Practitioners
  • Managing Data Input and Output
  • Common MapReduce Algorithms
    • Sorting and Searching Large Data Sets
    • Indexing Data
    • Computing Term Frequency
    • Inverse Document Frequency (TF4IDF)
    • Calculating Word Co4Occurrence
  • Joining Data Sets in MapReduce Jobs
  • Hadoop Tools for Data Acquisition
  • Practical Development Tips and Techniques
    • Strategies for Debugging and Testing MapReduce Code
    • Reusing Objects
    • Creating Map4only MapReduce Jobs
  • PIG
    • Complex Data Analysis with Pig
    • Multi Dataset Operations with Pig
    • Extending Pig
    • Pig Troubleshooting and Optimization
  • Hive
    • Relational Data Analysis with Hive
    • Hive Data Management
    • Text Processing with Hive
    • Hive Optimization
    • Extending Hive
  • Analyzing Data with Impala
  • Introduction to Spark
    • Spark Basics
    • Working with Resilient Distributed Datasets (RDDs)

II-Advanced Big Data Analytics Technologies and Applications

  • Analyzing Data with Scala and Spark
  • Predicting Forest Cover with Decision Trees
  • Anomaly Detection in Network Traffic with K-means Clustering
  • Understanding Wikipedia with Latent Semantic Analysis
  • Analyzing Co-occurrence Networks with GraphX
  • Geospatial and Temporal Data Analysis on Taxi Trip Data
  • Estimating Financial Risk through Monte Carlo Simulation

 

III- Hands-on Group Project Based on Real-life Use Case

 

Download Big Data Specialist Certificate Brochure PDF

 

Course Curriculum

Introduction to Big Data, Developing with Spark and Hadoop
Introduction to Hadoop and MapReduce SQL JOINS
Hadoop API
Managing Data Input and Output
Common MapReduce Algorithms
Joining Data Sets in MapReduce Jobs
Hadoop Tools for Data Acquisition
Practical Development Tips and Techniques
PIG
Hive
Analyzing Data with Impala
Introduction to Spark
Advanced Big Data Analytics Technologies and Applications
Hands-on Group Project Based on Real-life Use Case

Drop Us A Query

Copyright © 2018 Epsilon Registered in Egypt with company no. 118268
X