While the majority of NHS trusts make use of Microsoft’s data products, be that Power BI, or Excel for analysis or SQL Server for data storage and manipulation, very few so far have been making use of what is one of the best analytics solutions available to a forward-thinking healthcare organisation, which is Azure Synapse Analytics.
Here are some of the reasons why I, as a former NHS Business Intelligence Developer and team lead feel Synapse Analytics is the perfect fit for healthcare organisations.
Before thinking about the benefits of Synapse Analytics it’s worth considering the benefits of Microsoft’s Azure Cloud services in general.
My Four Top Benefits of Azure
Primarily these derive from the switch in focus from Microsoft selling software which you then have to install and maintain on hardware that you will also need to install and maintain the provision of the software or platform as a service.
1. Microsoft looks after your software. No more downtime for patch days, no more negotiations to get on the update schedule to get an important bug fix applied.
2. With Azure you’re on the latest version with the latest features all of the time and it’s Microsoft’s job to ensure that happens without interruption.
3. Microsoft look after the hardware for you, they’ll make sure that it’s upgraded and functional and host it in their own secure environments. With that comes a level of robustness that few healthcare organisations could match themselves.
4. The London based UK South Region consists of three independent zones, each with separate independent power and data feeds and redundant copies of your data can sit in each of them. If that’s not enough there’s the option of adding redundancy with the UK West region in Cardiff or (GDPR permitting) other data centers located all around the globe.
What is Synapse Analytics?
Microsoft describes Synapse Analytics as ‘A unified analytics platform’ that allows you to perform data integration, data exploration, data warehousing, big data analytics, and machine learning tasks from a single, unified environment.
What this means is that it’s multiple analytical services living under a single umbrella covering virtually all your data handling needs in a single product. Let’s look at what these are.
Probably the first point of contact is storage, Synapse includes Azure Data Lake Storage Gen2. In other words, lots of disks are all joined up with fault tolerance and redundant copies of your data are handled for you. Whoever first said disk is cheap had never seen a quote for adding solid state disks to a local SAN but the cost of storage in Azure Data Lakes is genuinely low to the point that it introduces a new way of thinking about your data. There’s no longer a need to clear out older versions of files and far more room to keep an archive of older data for audit purposes, or to allow room for a full rebuild of historical data. There’s room to simply store data upfront in its original format and then figure out the best approach to make use of it without playing catch up on the source files.
And while the costs of storage are low as with many cloud-based services there’s room to squeeze them further by choosing between Hot, Cool and Archive access tiers depending on usage patterns.
Storing your data is important, but there’s no benefit to it unless you can query and manipulate it and Synapse provides an option for every situation.
Firstly, there are Serverless Pools which as the name suggests is essentially SQL Server without the server. There are no ongoing costs beyond the raw file storage and you pay by the volume of data queried. At the time of writing, querying a gigabyte of data will cost less than a penny.
Serverless Pools allow you to turn files in the lake into external tables and views to query directly with T-SQL. No physical transformations or need to wait for ingestion and processing. Instant access for those that need it with the SQL skills they already have. It’s useful for large datasets that are rarely accessed or a first step in the ETL process. Simple fixes to missing values, deduplication and other tweaks can be performed simply on the way to their final destination.
In some cases, with clean well modeled source data Serverless Pools are all you need and can feed into your Power BI model directly. The time between sourcing your data and delivering reports in a robust, repeatable, and auditable way is smaller than it has ever been before.
Transforming Your Data With Pipelines
If Serverless Pools don’t meet your needs and there’s a requirement to shape your data further with an ETL or ELT process – there’s an option for every situation and skill set.
Synapse pipelines are a subset of Azure Data Factory. As a starting point, it will run your existing SSIS packages so your existing skills are fully reusable. When you’re ready to move up to its own pipelines anyone who’s used SSIS will find themselves at home, the code-free drag-and-drop pipeline building experience will feel familiar and you’ll appreciate the improvements made over its on-premises predecessor.
For the more technical types, there are Spark Pools. Data science teams will feel right at home with this implementation of Apache Spark which allows them to hit the ground running coding in familiar languages such as Python and R. Spark even has its own dialect of SQL for those who prefer to use it.
Spark is a viable option for ETL work but also opens up the world of data science, providing the tools for data scientists to develop AI and machine learning solutions with the computing power behind them to deliver results quickly.
SQL Server Dedicated Pools
When data sizes get really big, SQL Server Dedicated Pool comes into play. Previously Azure SQL Data Warehouse this is a variant of SQL Server rebuilt for massively parallel processing. If you have terabytes of data or tables with row counts in the billions. Dedicated pools’ ability to distribute your data over numerous nodes and process queries in parallel accords them all gives, brings high performance to your queries even at massive scales.
The above delivers the core of any healthcare organisation’s needs but Synapse provides much more. There’s support for the Internet of Things and real-time processing of streaming data. Seamless integration with other services such as the Microsoft Dynamics family of products and other Dataverse data sources. Looking to the future, SQL Server 2022 is introducing Synapse Link delivering near real-time replication of data between your on-premises source systems and your synapse workload allowing up-to-date reporting without impacting live record systems.
Finally, Synapse integrates seamlessly with Power BI. It can generate connections for you automatically. Giving you the same powerful reporting and analytical experience and ability to rapidly develop the new solutions you’re familiar with and also avoiding the risks and lack of governance process available inherent to developing reports locally directly on top of source files.
Contact us to learn more about the implementation of Azure Synapse Analytics for your organisation.
Read about Azure Data Analytics Services to explore how it can further your organisation’s insights with flexibility, scalability and cost-effective data analytics, data science or data architecture services.
Barney Lawrence, Senior Consultant, Simpson AssociatesBack to blog