Big data holds big promise, but taming it enough to make it useful will be a challenge.
We are standing at perhaps the most amazing crossroads of development in human history. In the next few years we will see the collision of faster, more accurate computing power and the rise of artificial intelligence make what was computationally impossible a short time ago very achievable in the near future.
The influx of supercomputing to our daily lives continues to happen, and sensors will be the gateway to the revolution now being called the Internet of Things (IoT).
We are benefiting from what futurist Ray Kurzweil calls human history’s Law of Accelerating Returns. This happens because more advanced societies have the ability to progress at a faster rate than less advanced societies—because they’re more advanced.
The question I have is whether this benefit will be used to effectively combat mankind’s most pressing problems, or simply sell more products to consumers.
For the first time in history we are able to effectively use digital sensors to record virtually everything—from the performance of anything, to energy consumption, to health data.
In my world of Wearable Sensors and Point of Care Diagnostics, Big Data will play a pivotal role in the adoption of IoT because it’s the only way to move beyond simply recording data to making it useful to everyone involved in the management, response, and policymaking continuum.
The good news is that a wide array of amazing data analytics tools are already available today—many with robust capabilities that enable researchers to discover and predict knowledge derived from complex sensor data and combine it with content previously hidden from view.
However, there is a significant barrier to effectively utilizing these sensors and the deep, rich data they generate—as well as much of the potentially related data sources—because they are generally unstructured and constantly changing. The effective use of these analytics tools requires a clean, multidimensional data warehouse from which to operate.
As the IoT grows, the regular servicing and structure required to prevent decay of the data model is overwhelming. Cleansing and dimensionally modeling data from disparate sources currently requires tremendous cost, skill and time. Data warehouses take time and money to build, and once a data warehouse is constructed, regular servicing of the structure and processing is necessary to prevent data model decay as noted.
One possible solution is a Virtual Data Warehouse (VDW) to address the problem of cleansing and dimensionalizing heterogeneous source data through a fundamentally different approach specifically adapted to IoT sensor data.
The common method and tooling used in data warehouse construction today is known as ETL (Extract Transform Load). For over a decade, ETL has successfully provided businesses and researchers with the ability to copy source data from various repositories and origins—while cleansing, transforming and modeling data in multidimensional structures in a manner that supported the performance of meaningful analysis.
Estimates indicate that ETL methods can account for over 75% of the total cost of building and maintaining data warehouses. The VDW approach could dramatically reduce the amount of time, expense and subject matter expertise required in the construction and maintenance of data warehouses through an evolutionary paradigm involving the following four concepts:
- Index: All original heterogeneous data sources are indexed (the current ETL method requires all modeled data to be copied, inherently establishing an N+1 data availability limitation).
- Discoverable Dimensions: Dimensions are automatically discovered through machine learning techniques and rendered visually, giving a business user the ability to visually refine and further define dimensions required for analysis (versus the current ETL method, in which all dimensions have to be specifically determined and declared by the ETL team).
- Intelligent Aggregations: Aggregations are identified and specified visually by a user through dimensional modeling and exploration, then processed and stored with the index (ensuring rapid and efficient data warehousing operations such as slicing, dicing, drilling, etc).
- Emulation: A standard interface is provided to the index, which emulates a traditional data warehouse, allowing any analytic tool the ability to function fully as if the "virtual data warehouse" were physically stored and managed using current techniques.
Use of the VDW appraoch instead of warehousing would allow an easier and more comprehensive integration of local and regional passive search queries and micro-blogging data, as well as actively collected crowd-sourced data for surveillance into mapping and data mining user interfaces. The success of these methods has been well documented and validated for a variety of devices but will never become mainstream unless we can solve the Big Problem in Big Data.
The sheer volume of sensor-equipped, product-related searches and personal accounts creates incredible new opportunities to monitor everything from population behavior and demographics to environmental conditions and to correlate relevant information in real time—even allowing users to explore and model data in new ways.
By using this combined information to build definitive extents in virtual databases, we could allow users to more fully consider the interconnections between their sensored products (be it human or machine), socio-cultural factors, demographics, weather, and other factors to make it easier to achieve best practice and improve all kinds of performance and outcomes.
By integrating data from sensors of all types in real time and correlating it with such relevant, social, demographic, environmental and related data—we should expect to be able to provide predictive measures. Such measures could help individuals, decision makers, policy makers and service providers to identify and develop effective and timely actions that could save resources and lives—helping the world of IoT do more than simply provide a data record.
The nuviun industry network is intended to contribute to discussion and stimulate debate on important issues in global digital health. The views are solely those of the author.