Karlebovej 91, 3400 Hillerød | Krajbjergvej 3, 8541 Skødstrup
70 22 29 29
08:30 - 17:00

Performing Data Engineering on Microsoft HD Insight 20775 (70-775)

Interesseret i dette kursus? Send os en forespørgsel.
Register your interest now

Kursusinfo

  • Dette kursus varer 4 dage
  • Der medfølger kursusmateriale til dette kursus
  • Dette kursus koster 5 klip på dit klippekort.
  • Fuld forplejning (Morgenmad, frokost, kage, kaffe og sodavand ad libitum)
  • Eksamen er inkluderet i prisen
  • Med i prisen hører Practice Test
  • Du har i alt adgang til din kursus-pc i 3 uger

Varighed

Dette kursus varer 4 dage

Materialer

Der medfølger kursusmateriale til dette kursus

Klip på klippekort

Dette kursus koster 5 klip på dit klippekort.

Forplejning

Fuld forplejning (Morgenmad, frokost, eftermiddagskage samt kaffe og sodavand ad libitum)

Eksamen

Alle eksamensforsøg er inkluderet i prisen

Practice Tests

Med i prisen hører Practice Test

Remote adgang

Du har i alt adgang til din kursus-pc i 3 uger

Microsoft Vouchers

5 vouchers + 4.950 kr. for Certificeringspakken.
Læs mere her

Pålidelige analyser med åben kildekode og branchens førende SLA. HDInsight er designet til at have fuld redundans og høj tilgængelighed, herunder replikering af hovednoder, geo-replikering af data og indbygget standby-NameNode, og det gør HDInsight robust over for kritiske fejl, der ikke er taget højde for i standardimplementeringer af Hadoop.

Beskyt dine dataaktiver, og udvid sikkerheden og styringen i det lokale miljø til cloudmiljøet med HDInsight. Få enkeltlogon (SSO), multifaktorgodkendelse og problemfri administration af millioner af identiteter via Azure Active Directory. Godkend brugere og grupper med finkornede politikker for adgangskontrol til alle virksomhedsdata med Apache Ranger.

HDInsight overholder de angivne standarder i HIPAA (Health Insurance Portability and Accountability Act), PCI (Payment Card Industry) og SOC (Service Organization Controls), hvilket hjælper dig med at sikre, at virksomhedens dataaktiver altid er godt beskyttet.

For at understøtte det højeste niveau af forretningskontinuitet har HDInsight udvidede funktioner til vigtige meddelelser, overvågning og definition af proaktive handlinger, og du får desuden udvidet beskyttelse af arbejdsbelastning gennem indbygget integration med Microsoft Operations Management Suite.

Course Outline

Dette kursus er for dig der gerne vil I gang med Hadoop HD Insigt, big data og map reduce. På kursuset kommer vi både igennem opsætning og drift af et HD Insigt cluster, konfiguation af brugere, indlæsning af data, samt opsætning af ETL jobs og analyse af datasæts med map reduce or Spark Sql, Hive og Phoenix. Vi kommer yderligere igennem hvordan man kan lave stream analytics på platformen med både Kafka og Hbase, samt Apache Storm og præsenterer det hele i et nemt og overskueligt PowerBI dashboard.

 

Module 1: Getting Started with HDInsight
This module introduces Hadoop, the MapReduce paradigm, and HDInsight.

Lessons

  • Big Data
  • Hadoop
  • MapReduce
  • HDInsight

Lab : Querying Big Data

  • Query data with Hive
  • Visualize data with Excel

After completing this module, students will be able to:

  • Describe Big data.
  • Describe Hadoop.
  • Describe MapReduce.
  • Describe HDInsight.

Module 2: Deploying HDInsight Clusters
At the end of this module the student will be able to deploy HDInsight clusters.

Lessons

  • HDInsight cluster types
  • Managing HDInsight Clusters
  • Managing HDInsight Clusters with PowerShell

Lab : Managing HDInsight clusters with the Azure Portal

  • Create an HDInsight Hadoop Cluster
  • Customise HDInsight using a script action
  • Customize HDInsight using Bootstrap
  • Delete an HDInsight cluster

After completing this module, students will be able to:

  • Describe HDInsight cluster types.
  • Describe the creation, management, and deletion of HDInsight clusters with the Azure portal.
  • Describe the creation, management, and deletion of HDInsight clusters with PowerShell.

Module 3: Authorizing Users to Access Resources
This module covers permissions and the assignment of permissions.

Lessons

  • Non-domain Joined clusters
  • Configuring domain-joined HDInsight clusters
  • Manage domain-joined HDInsight clusters

Lab : Authorizing Users to Access Resources

  • Configure a domain-joined HDInsight cluster
  • Configure Hive policies

After completing this module, students will be able to:

  • Describe how to authorize user access to objects.
  • Describe how to authorize users to execute code.
  • Describe how to manage domain-joined HDInsight clusters.

Module 4: Loading data into HDInsight
This module covers loading data into HDInsight.

Lessons

  • HDInsight Storage
  • Data loading tools
  • Performance and reliability

Lab : Loading Data into HDInsight

  • Loading data using Sqoop
  • Loading data using AZcopy
  • Loading data using ADLcopy
  • Use HDInsight to compress data

After completing this module, students will be able to:

  • Describe HDInsight storage configurations and architectures.
  • Describe options for loading data into HDInsight.
  • Describe benefits of compression and pre-processing in HDInsight.

Module 5: Troubleshooting HDInsight
This module describes how to troubleshoot HDInsight.

Lessons

  • Analyze HDInsight logs
  • YARN logs
  • Heap dumps
  • Operations management suite

Lab : Troubleshooting HDInsight

  • Analyze HDInsight logs
  • Analyze YARN logs
  • Monitor resources with Operations Management Suite

After completing this module, students will be able to:

  • Analyze HDInsight logs.
  • Analyze YARN logs.
  • Analyze Heap dumps.
  • Use the operations management suite to monitor resources.

Module 6: Implementing Batch Solutions
This module describes how to implement batch solutions.

Lessons

  • Apache Hive storage
  • Querying with Hive and Pig
  • Operationalize HDInsight

Lab : Backing Up SQL Server Databases

  • Load data into a hive table
  • Query data with Hive and Pig

After completing this module, students will be able to:

  • Describe Apache Hive storage.
  • Query data using Hive and Pig.
  • Operationalize HDInsight.

Module 7: Design Batch ETL solutions for big data with Spark
This module describes how to design batch ETL solutions for big data with Spark.

Lessons

  • What is Spark?
  • ETL with Spark
  • Spark performance

Lab : Design Batch ETL solutions for big data with Spark.

  • Create a HDInsight Cluster with access to Data Lake Store
  • Use HDInsight Spark cluster to analyze data in Data Lake Store
  • Analyzing website logs using a custom library with Apache Spark cluster on HDInsight
  • Managing resources for Apache Spark cluster on Azure HDInsight

After completing this module, students will be able to:

  • Describe Spark and when to use it.
  • Describe the use of ETL with Spark.
  • Analyze Spark performance.

Module 8: Analyze Data with Spark SQL
This module describes how to analyze data with Spark SQL.

Lessons

  • Implement interactive queries
  • Perform exploratory data analysis

Lab : Analyze data with Spark SQL

  • Implement interactive queries
  • Perform exploratory data analysis

After completing this module, students will be able to:

  • Implement interactive queries.
  • Perform exploratory data analysis.

Module 9: Analyze Data with Hive and Phoenix
This module describes how to analyze data with Hive and Phoenix.

Lessons

  • Implement interactive queries for big data with interactive hive.
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix

Lab : Analyze data with Hive and Phoenix

  • Implement interactive queries for big data with interactive Hive
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix

After completing this module, students will be able to:

  • Implement interactive queries with interactive Hive.
  • Perform exploratory data analysis using Hive.
  • Perform interactive processing by using Apache Phoenix.

Module 10: Stream Analytics
This module introduces Azure Stream Analytics.

Lessons

  • Stream analytics
  • Process streaming data from stream analytics
  • Managing stream analytics jobs

Lab : Implement Stream Analytics

  • Process streaming data with stream analytics
  • Managing stream analytics jobs

After completing this module, students will be able to:

  • Describe stream analytics and it’s capabilities.
  • Process streaming data with stream analytics.
  • Manage stream analytics jobs.

Module 11: Spark Streaming using the DStream API
This module introduces the Dstream API and describes how to create Spark structured streaming applications.

Lessons

  • Dstream
  • Create Spark structured streaming applications
  • Persistence and visualization

Lab : Spark streaming applications using DStream API

  • Creating Spark streaming applications using the DStream API
  • Creating Spark structured streaming applications

After completing this module, students will be able to:

  • Explain DStream.
  • Create Spark structured streaming applications.
  • Describe persistence and visualization.

Module 12: Develop big data real-time processing solutions with Apache Storm
This module explains how to develop big data real-time processing solutions with Apache Storm.

Lessons

  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm

Lab : Developing big data real-time processing solutions with Apache Storm

  • Stream data with Storm
  • Create Storm Topologies

After completing this module, students will be able to:

  • Persist long term data.
  • Stream data with Storm.
  • Create Storm topologies.
  • Configure Apache Storm.

Module 13: Analyze Data with Spark SQL
This module describes how to analyze data with Spark SQL.

Lessons

  • Implement interactive queries
  • Perform exploratory data analysis

Lab : Analyze data with Spark SQL

  • Implement interactive queries
  • Perform exploratory data analysis

After completing this module, students will be able to:

  • Implement interactive queries.
  • Perform exploratory data analysis.

Lignende kurser

Querying Data with Transact-SQL MOC 20761 (70-761) Skriv dig op som interesseret til dette kursus
DEVELOPER TRAINING FOR APACHE HADOOP Skriv dig op som interesseret til dette kursus
Perform Big Data Engineering on Microsoft Cloud Services 20776 (70-776) Skriv dig op som interesseret til dette kursus
Analyzing Big Data with Microsoft R 20773 (70-773) Skriv dig op som interesseret til dette kursus
DEVELOPER TRAINING FOR APACHE SPARK Skriv dig op som interesseret til dette kursus