Lazy loading for processing large data sets

Introduction

This is part of a series of articles where we describe the way the Meniscus Analytics Platform (MAP) works. Theses articles jump into the features that make MAP different to other analytics applications by providing an Integrated Analytics Stack delivering real time analytics.

This article investigate the benefits of lazy loading of data and why this is important in MAP

What is lazy loading of data?

Quite simply, it means only loading the part of the data that is required to deliver the information requested. In terms of how MAP works then this principle is used to limit the data input and output from the the underlying MongoDB database into MAP. Whilst this may sound like quite a simple and obvious principle to apply it isn’t always used. Many developers will know the principle when developing dashboard and user interfaces but it is more important when considering the back end database operation.

Lazy loading is a design pattern commonly used in computer programming to defer initialization of an object until the point at which it is needed. It can contribute to efficiency in the program’s operation if properly and appropriately used. The opposite of lazy loading is eager loading. This makes it ideal in use cases where network content is accessed and initialization times are to be kept at a minimum, such as in the case of web pages.

Source

Why is lazy loading relevant in MAP?

MAP ingests and processes very large volumes of near real time data, specifically data associated with weather. More importantly, MAP holds historic data so that we can deliver historic analytics as used in our MAP Rain solution.

This means data IO is a key factor in delivering the lighting fast calculation speeds that MAP delivers. So, anything that can improve these IO times is of huge importance to MAP. Lazy loading reduces data volumes extracted and then written back to the database and so improves data IO times.

About MAP

MAP is an Integrated Analytics Stack providing a framework for users to create and deploy calculations at scale using any source of raw data. MAP is based on IOT principles and uses Items as the underlying building blocks to store either RAW or CALCulated data. So, users create an Entity Template or Thing using these Items and then replicate this template hundreds of thousands of times using an ItemFactory.

For more information on MAP then click here

Support for rich and extensible data types

Introduction

This is part of a series of articles where we describe the way the Meniscus Analytics Platform (MAP) works. Theses articles jump into the features that make MAP different to other analytics applications by providing an Integrated Analytics Stack delivering real time analytics. IN this article we talk about extensible data types.

This article discusses how and why having extensible data types is a real benefit when developing your analytics applications

Why are extensible data types important?

Being able to use a wide variety of ‘standard’ data types, but also to create your own, delivers lots of benefits.

  • Provides flexibility. During the import stage you can re-process and store the initial raw data into a ‘pre-processed’ data type. When you want to use this data to deliver a calculation or other use then the data is already configured and available in exactly the format you want
  • Greatly increases data processing and calculation times.
  • Extensible data types give you the ability to control how you store and process your raw data

Examples of data types supported by MAP

We have a number of ‘standard’ extensible data types already configured in MAP but there is no limit to the number or variety that you can create.

  • Data Grid. One of the most important for our MAP Rain solution. Processes data in any size of two dimensional grid. Used for radar and forecast rainfall data, satellite imagery and the like
  • Block Grid. Used in conjunction with a Data Grid. Breaks a two dimensional Data Grid into a smaller three dimensional Block. Used for speeding up the processing of Data Grids by ensuring MAP only processes relevant data. See article on lazy loading of data sets
  • Vector Grid. Similar to a Data Grid but provides a two dimensional grid but includes vector and direction data as well. Used for processing grids of forecast wind speed and direction data.
  • Rainfall Location. Holds the location of a point of interest (Latitude and Longitude) as well as the current and historic rainfall data for that Location. Used in MAP Rain
  • Float – standard time series. This is a standard data type for processing time series data. Contains a Date/Time Value pair
  • Journey. Used to create and store a sequence of locations along the route of a journey. We use this data type to predict rainfall along this route using our Hyperlocal rainfall product

Examples of data types

About MAP

MAP is an Integrated Analytics Stack providing a framework for users to create and deploy calculations at scale using any source of raw data. MAP is based on IOT principles and uses Items as the underlying building blocks to store either RAW or CALCulated data. So, users create an Entity Template or Thing using these Items and then replicate this template hundreds of thousands of times using an ItemFactory.

For more information on MAP then click here

Benefits of a dynamically constructed dependency tree

Introduction

This is part of a series of articles where we describe the way the Meniscus Analytics Platform (MAP) works. Theses articles jump into the features that make MAP different to other analytics applications by providing an Integrated Analytics Stack delivering real time analytics. This article discusses the benefits of a dynamically constructed dependency tree.

What is a dynamic dependency tree?

A dependency tree is a list or tree of the way that any Item links to other Items. We use this to manage and understand which Items are required when calculating another Item. So, if Item 1 requires Item 3 and Item 2004 to calculate then any change in Item 3 or Item 2004 will place Item 1 on the calculation queue to be recalculated. The process of managing the Items placed on the queue is critical to MAP and we have a separate Invalidator module specifically to do this.

While our old MCE analytics platform held a dependency tree it was not dynamic and so, not really a scalable solution. MAP uses a dynamic dependency tree so that as new Items are added then MAP automatically creates its own tree by learning from the calculations as they run. This in turn means that MAP is scalable and can run on any size of database.

Benefits of using a dependency tree

  • Calculation speed. By knowing the relation between each and every Item ensures MAP processes data in the most optimal way possible. This is turn helps to ensure MAP can deliver lightning fast calculation speeds
  • Automated. Being an automated process means that a developer can just leave MAP to get on and do it’s own ‘thing’ whilst they focus on the critical aspects of developing their application

About MAP

MAP is an Integrated Analytics Stack providing a framework for users to create and deploy calculations at scale using any source of raw data. MAP is based on IOT principles and uses Items as the underlying building blocks to store either RAW or CALCulated data. So, users create an Entity Template or Thing using these Items and then replicate this template hundreds of thousands of times using an ItemFactory.

For more information on MAP then click here

Using Data Blocks and Data Versioning to deliver real time analytics

Introduction

This is part of a series of articles where we describe the way the Meniscus Analytics Platform (MAP) works. Theses articles jump into the features that make MAP different to other analytics applications by providing an Integrated Analytics Stack delivering real time analytics. In this article we discuss Data Blocks and Data Versioning.

In delivering real time analytics, disk IOPS (Input/output Operations Per Second) is one of the main rate limiting steps in achieving the calculation speeds required when processing high volume and high velocity raw data. An example of such a data is radar rainfall data where new values covering a large area arrive every 5 minutes.

To help reduce disk IOPS, we developed the concepts of Data Blocks and Data Versioning into MAP to drastically speed up data access, increase calculation speed and reduce the volume of data written back to the database.

Data Blocks

Rather than loading and persisting all data for an Item, data can be broken up into chunks called Blocks. So, only the chunks of data that are demanded for a query, or as an input to a calculation, are loaded from the database (i.e. delay loading), and only the chunks of data that actually change need to be persisted. Blocks are typically used with unbounded, time-related data such as sample arrays, where the size of a Block is limited and the maximum number of Block samples depends on the size of a sample. This provides efficiencies in real-time processing, whereby data changes are localised and typically at the end of the data.

Data Blocks are transparent to the user. It is purely an internal mechanism to reduce traffic to/from the database. When requested or persisted, Data Blocks are held in memory for a time. This ensures future retrieval is temporarily faster as the data is expected to be in demand.

Data Versioning

Data Blocks are complimented by the MAP concept of Data Versioning. All Item data in MAP is versioned, including Blocks (as such referred to as child data). A version is simply a unique timestamp. It allows users to query for the relative age of data. Specifically, when it last changed, and for calculated Items when the last calculation started and completed. A client application can then tell if data has changed without having to load the data itself. There are additional non-data versions on an Item. I.E when its properties or list of child items last changed.

It is this versioning technique that allows MAP to efficiently detect when calculated items need recalculating (referred to as dirtying as calculation).

About MAP

MAP is an Integrated Analytics Stack providing a framework for users to create and deploy calculations at scale using any source of raw data. MAP is based on IOT principles and uses Items as the underlying building blocks to store either RAW or CALCulated data. So, users create an Entity Template or Thing using these Items and then replicate this template hundreds of thousands of times using an ItemFactory.

For more information on MAP then click here