About this Event
Many companies are placing their corporate information into data lakes in the cloud. Since storage costs are cheap, the amount of data stored in the lake can easily exceed the amount of data seen in a typical relational database. Regardless of the types of files in the data lake, there is always a need to transform the raw data files into refined data files for analytics, machine learning, and/or AI.
The Delta Lakehouse design uses a medallion (bronze, silver, and gold) architecture for data quality. We can we abstract the read and write actions in Spark to create dynamic notebooks to process data files. Data pipelines can be used to bring remote data into the lake as well as orchestrate data processing. A metadata driven design allows for the inputs to the dynamic notebooks to be stored in a central place.
The most important part of a modern data platform is security. Microsoft Entra, formally known as Azure Active Directory, can be used to secure the files in storage. This security layer is used in both the Apache Spark and Serverless SQL pools.
Designers use a variety of tools for reporting. The Serverless SQL Pool turns a data lake files into a read only database tables. While the demos in this course are Azure specific, the concepts can be used with any cloud service.
Lessons:
- Infrastructure deployment (storage, key vault, Databricks, Synapse)
- Create a service principle for services
- Create medallion zones + assign rights
- Introduction to Data Factory pipelines
- How to create a hybrid design
- Working with different sources (database, file shares, rest apis)
- Hard coding vs meta data design
- Full vs incremental load patterns
- Configuring clusters + storage for security
- Writing data engineering notebooks
- Orchestrating pipelines with Data Factory
- Creating a presentation layer with Synapse Serverless Pools
- Connecting to Synapse with Power BI
About the Speaker<h4>John Miner</h4>
John Miner is a Senior Data Architect at Insight Digital Innovation, helping corporations solve their business needs with data platform solutions.
He has over thirty years of data processing experience, and his architecture expertise encompasses all phases of the software project life cycle, including design, development, implementation, and maintenance of systems.
His credentials include undergraduate and graduate degrees in Computer Science from the University of Rhode Island. Also, he has earned certificates from Microsoft for Database Administration (MCDBA), System Administration (MCSA), Data Management & Analytics (MCSE) and Data Science (MPP).
John has been recognized with the Microsoft MVP award seven times for his contributions to the Data Platform community.
When he is not busy talking to local user groups or writing blog entries on new technology, he spends time with his wife and daughter enjoying outdoor activities. Some of John’s hobbies include wood working projects, crafting a good beer and playing a game of chess.
Event Venue & Nearby Stays
Massry Center for Business and School of Business Accolades, Massry Center for Business and School of Business Accolades, Albany, United States
USD 150.00