Lazy loading for processing large data sets

Introduction

This is part of a series of articles where we describe the way the Meniscus Analytics Platform (MAP) works. Theses articles jump into the features that make MAP different to other analytics applications by providing an Integrated Analytics Stack delivering real time analytics.

This article investigate the benefits of lazy loading of data and why this is important in MAP

What is lazy loading of data?

Quite simply, it means only loading the part of the data that is required to deliver the information requested. In terms of how MAP works then this principle is used to limit the data input and output from the the underlying MongoDB database into MAP. Whilst this may sound like quite a simple and obvious principle to apply it isn’t always used. Many developers will know the principle when developing dashboard and user interfaces but it is more important when considering the back end database operation.

Lazy loading is a design pattern commonly used in computer programming to defer initialization of an object until the point at which it is needed. It can contribute to efficiency in the program’s operation if properly and appropriately used. The opposite of lazy loading is eager loading. This makes it ideal in use cases where network content is accessed and initialization times are to be kept at a minimum, such as in the case of web pages.

Source

Why is lazy loading relevant in MAP?

MAP ingests and processes very large volumes of near real time data, specifically data associated with weather. More importantly, MAP holds historic data so that we can deliver historic analytics as used in our MAP Rain solution.

This means data IO is a key factor in delivering the lighting fast calculation speeds that MAP delivers. So, anything that can improve these IO times is of huge importance to MAP. Lazy loading reduces data volumes extracted and then written back to the database and so improves data IO times.

About MAP

MAP is an Integrated Analytics Stack providing a framework for users to create and deploy calculations at scale using any source of raw data. MAP is based on IOT principles and uses Items as the underlying building blocks to store either RAW or CALCulated data. So, users create an Entity Template or Thing using these Items and then replicate this template hundreds of thousands of times using an ItemFactory.

For more information on MAP then click here

Support for rich and extensible data types

Introduction

This is part of a series of articles where we describe the way the Meniscus Analytics Platform (MAP) works. Theses articles jump into the features that make MAP different to other analytics applications by providing an Integrated Analytics Stack delivering real time analytics. IN this article we talk about extensible data types.

This article discusses how and why having extensible data types is a real benefit when developing your analytics applications

Why are extensible data types important?

Being able to use a wide variety of ‘standard’ data types, but also to create your own, delivers lots of benefits.

  • Provides flexibility. During the import stage you can re-process and store the initial raw data into a ‘pre-processed’ data type. When you want to use this data to deliver a calculation or other use then the data is already configured and available in exactly the format you want
  • Greatly increases data processing and calculation times.
  • Extensible data types give you the ability to control how you store and process your raw data

Examples of data types supported by MAP

We have a number of ‘standard’ extensible data types already configured in MAP but there is no limit to the number or variety that you can create.

  • Data Grid. One of the most important for our MAP Rain solution. Processes data in any size of two dimensional grid. Used for radar and forecast rainfall data, satellite imagery and the like
  • Block Grid. Used in conjunction with a Data Grid. Breaks a two dimensional Data Grid into a smaller three dimensional Block. Used for speeding up the processing of Data Grids by ensuring MAP only processes relevant data. See article on lazy loading of data sets
  • Vector Grid. Similar to a Data Grid but provides a two dimensional grid but includes vector and direction data as well. Used for processing grids of forecast wind speed and direction data.
  • Rainfall Location. Holds the location of a point of interest (Latitude and Longitude) as well as the current and historic rainfall data for that Location. Used in MAP Rain
  • Float – standard time series. This is a standard data type for processing time series data. Contains a Date/Time Value pair
  • Journey. Used to create and store a sequence of locations along the route of a journey. We use this data type to predict rainfall along this route using our Hyperlocal rainfall product

Examples of data types

About MAP

MAP is an Integrated Analytics Stack providing a framework for users to create and deploy calculations at scale using any source of raw data. MAP is based on IOT principles and uses Items as the underlying building blocks to store either RAW or CALCulated data. So, users create an Entity Template or Thing using these Items and then replicate this template hundreds of thousands of times using an ItemFactory.

For more information on MAP then click here

Benefits of a dynamically constructed dependency tree

Introduction

This is part of a series of articles where we describe the way the Meniscus Analytics Platform (MAP) works. Theses articles jump into the features that make MAP different to other analytics applications by providing an Integrated Analytics Stack delivering real time analytics. This article discusses the benefits of a dynamically constructed dependency tree.

What is a dynamic dependency tree?

A dependency tree is a list or tree of the way that any Item links to other Items. We use this to manage and understand which Items are required when calculating another Item. So, if Item 1 requires Item 3 and Item 2004 to calculate then any change in Item 3 or Item 2004 will place Item 1 on the calculation queue to be recalculated. The process of managing the Items placed on the queue is critical to MAP and we have a separate Invalidator module specifically to do this.

While our old MCE analytics platform held a dependency tree it was not dynamic and so, not really a scalable solution. MAP uses a dynamic dependency tree so that as new Items are added then MAP automatically creates its own tree by learning from the calculations as they run. This in turn means that MAP is scalable and can run on any size of database.

Benefits of using a dependency tree

  • Calculation speed. By knowing the relation between each and every Item ensures MAP processes data in the most optimal way possible. This is turn helps to ensure MAP can deliver lightning fast calculation speeds
  • Automated. Being an automated process means that a developer can just leave MAP to get on and do it’s own ‘thing’ whilst they focus on the critical aspects of developing their application

About MAP

MAP is an Integrated Analytics Stack providing a framework for users to create and deploy calculations at scale using any source of raw data. MAP is based on IOT principles and uses Items as the underlying building blocks to store either RAW or CALCulated data. So, users create an Entity Template or Thing using these Items and then replicate this template hundreds of thousands of times using an ItemFactory.

For more information on MAP then click here

Yorkshire Water leakage hackathon – use of MAP IOT

Overview of the Open Data leakage hackathon run by Yorkshire Water to identify new ways of applying Big Data and analytics solutions to the issue or water leakage