‘The three musketeers’ in the BI world – second one ‘Scale’

In the third of the ‘The three musketeers’ series , we look at ‘Scale’. We looked at ‘Agility’ in the previous blog.

Scalability is the ability to deliver consistent and predictable performance with increase in workload, be it the number of users for a reporting application or increase in data volumes in the database server.

We have had several challenges to handle scalability in the past. We struggled with Symmetrical Multiprocessing (SMP) servers, which could not leverage parallelism due to sharing of resources such as CPUs, memory and disks. Even though they were multi-processor machines, the common memory and I/O devices caused bottlenecks. To work around this, clusters or grids of servers were built. However, their scalability also was constrained as they needed a big pipe to connect them considering that they still needed to have a shared memory and disks architecture across servers. One example is the Oracle’s Real Application Cluster. Even though it provided good load balancing and fail-over capabilities across its servers, it was constrained by the bandwidth of the interconnects between its servers.

Then came in the Massively Parallel Processing (MPP) bandwagon that many database vendors jumped on to. With no sharing needed across the CPU, memory and disks, they functioned like independent machines that could take a slice of the work, process it in parallel and return back the results. The more the number of such machines added to the cluster, better the performance and hence ‘High Scalability’. Vendors like Teradata, IBM Netezza have leveraged this to build scalable BI systems.

With I/O identified as the major bottleneck and disks speed not following the Moore’s Law, architectures that needed to avoid I/O from disks were looked at. With costs of memory going down, in-memory computing started picking up, which gives us another option to scale our BI architecture. Tools such as SAP HANA or in-memory data exploration/visualization tools like Tableau have leveraged this.

Finally wanted to touch upon one of the best and cheap option that has come in as a boon to the BI world – Cloud computing. Cloud cuts down the worry of in-house infrastructure management. The entire work is managed by the cloud provider with best-in-class methods at affordable and pay-as-you-use costs. They internal have used several of the above methods to help scale out the infrastructure or the application (based on the service) . Cloud BI has picked up in a big way. Looks good for now! Given that we have the new ‘Big data’ beast to manage, we will have to wait and see if we need to look for newer methods soon.

The next blog will be on ‘Self-Service’.

About the Author: Anand Govindarajan

Anand Govindarajan

Chief Data Architect
Email: anandg@lucidtechsol.com
Linkedin: http://in.linkedin.com/in/anandgovindarajan/