This article originally appeared on Konsultbolag1
Alongside ‘the Cloud’, one of the most commonly heard buzzwords in the software testing industry in recent years is ‘Big Data’. Big Data offers a great opportunity to use a cloud-based management and thus easily take advantage of scalable resources including processing power, storage and others. Nowadays, a growing set of companies specialize in this technology, meaning that a business does not itself need too have a vast technical knowledge or even own much infrastructure to take advantage of Big Data. There are already a large number of companies offering Big Data solutions in the cloud to customers.
Some cloud service providers target the technical part by providing management and analysis tools, while others are directed more toward the service side that can include data preparation, storage and aggregation of disparate data sources. The main advantage of this approach is that it frees up resources.
Data storage is becoming increasingly difficult to manage and can be a huge challenge for the business. An external cloud-based management offers a solution to this. Many organizations will in the future, choose to work with hybrid solutions with both cloud-based storage and locally saved data. Above all, it is important to build Big Data solutions that are flexible enough to meet customer demands.
What you should know before making a choice between an internal options and external cloud-based solution is that Big Data requires a variety of advanced technologies, knowledge and investment.
Requirements Management and the tests it brings
Requirements management is largely unchanged by Big Data although as a practice in general, it is going to have to put a lot of focus on scalability and other types of non-functional requirements.
When it comes to testing, however, it is required of the tester that they are extremely technical. In addition to knowledge of the technology, it is a great advantage to have knowledge of how to write your own MapReduce programs and master scripting languages.
The most important requirement in Big Data is scalability, since this requirement affects the entire solution architecture. As the volume of data is constantly growing, any solution will need to be scalable. Whatever the amount of data, it should be possible to handle this by adding new nodes. Big Data volumes are resistant in the sense that can not be some form of “single point of failure”. If a node in the cloud goes down, then the other nodes must take over the work from the failed node.
Analysis and processing occurs in parallel across all nodes and you need to write requirements on how the solution will handle streaming data, either in real time or with a few minutes delay. Requirements also need to be made about how the analyzes will be used either with predetermined questions or if the available data is explored and therefore be easy to configure for new analyzes.
You also need to have the ability to merge data over time. Big Data is too large to be handled in a backup. Data should be kept in the same storage as much as possible, and it is better to use separate versions instead of taking backup of the entire volume data at each change. As a tester of Big Data you will mostly be working with unstructured and semi-structured data. The tester needs to search for different possible input, and deduce that the given data source where the search is made dynamically.
When it comes to validating the actual information it is possible to use samples manually or automated. But when it comes to large volumes of data even samples can be difficult to verify. In the case of tools to support the tester is the industry still in its infancy and there is still much to be done.
Big Data is thus an enormous challenge for the tester especially as as tools to manage and analyze Big Data is still in their infancy. Knowledge of data warehousing at a fundamental level will shorten the learning curve and ensure that your path to Big Data is not strewn with big failures.