15
White paper Big Data projects aren’t one-man shows www.ipl.com For companies of all sizes, there are advantages to be realised from harnessing the power of their Big Data. These advantages are borne out of having far better levels of intelligence available than ever before, which can enable more informed and quicker decision-making across the board.

Big data Readiness white paper

Embed Size (px)

Citation preview

Whi

te p

aper

Big Data projects aren’t one-man shows

www.ipl.com

For companies of all sizes, there are advantages to be realised from harnessing the power of their Big Data. These advantages are borne out of having far better levels of intelligence available than ever before, which can enable more informed and quicker decision-making across the board.

Whi

te p

aper

www.ipl.com

Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

Preface

Abstract Big Data is no longer just the domain of large organisations: datasets that cannot be handled satisfactorily by traditional systems are now found throughout the public and private sectors. And because the technology to harness the power of Big Data has come of age, it is within financial reach of many more organisations. The benefits of exploiting Big Data are many, and include the ability to take better and more timely decisions, make money from selling the data, and improve internal information management. With more and more organisations looking to exploit their Big Data, those that do not risk being left behind.

But harnessing the power of Big Data still presents significant challenges – though not all are new. Organisations must choose the right technology to enable the project, and employ the right individuals or partner company to ensure it is a success from start to finish. The skills required are diverse and not yet in abundance.

This white paper will first look at what Big Data is, and the challenges and benefits of exploiting it. It will then explain some specific examples of how government bodies and firms in the telecoms and finance sectors can get the most out of their Big Data. Following this, the paper will look at some of the technologies that make such exploitation possible, and highlight the staff skills that are required to make the initiative a success.

IPLwww.IPL.com Tel: +44 (0)1225 475000 Grove Street, Eveleigh House, Bath BA1 5LR [email protected]

Chris is an author, thought leader and speaker on Information Management. He advises major multinationals on how to govern and exploit their information.

Chris BradleyInformation Management Specialist

Index

Whi

te p

aper

www.ipl.com

Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

Preface

Who is this report for? This paper is aimed at CIOs, CTOs and other executives interested in the potential of big data, and who have responsibility for managing and exploiting corporate data. It will also be of interest to information professionals, business managers and information analysts.

Key messages• Big Data is data that cannot be handled satisfactorily by traditional database systems, and it is now pervasive

throughout the public and private sectors

• If an organisation can harness the power of its Big Data, it will gain access to intelligence that can drive better decision-making, provide commercial benefits and aid internal information management

• Good information management practice must underpin an entire Big Data initiative

• There are concrete examples of how government bodies, financial sector and telecoms firms can derive tangible benefit from their Big Data

• Because the enabling technology is now within financial reach of many organisations, any that do not choose to take advantage of Big Data risk being left behind by forward-thinking competitors

• But harnessing Big Data still poses significant technology and analysis challenges, though not all of these are new: some have been successfully overcome before

• There is a range of technology available to aid organisations’ Big Data projects, and it is important to pick the right one (or blend) when planning the initiative. Which to select depends on the nature of the input data and the intelligence required

• Derived intelligence needs to be presented to decision-makers in a timely and actionable way

• As well as the right blend of technology, Big Data initiatives require a mix of skills among staff. There needs to be an effective blend of programming, analytics and presentation skills, as well as deep knowledge of the organisation’s domain

• These skills are very unlikely to be found in a single person or even a single company. Therefore, it is a good idea for organisations to enlist the help of expert partners as they set out on their journey to unlock the power of their Big Data

Whi

te p

aper

www.ipl.com

1 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

1

Index

1. Introduction Page 1

2. What is Big Data & What Problems do Organisations Face in Harnessing it? Page 2

3. TheBenefitsofHarnessingBigData–andtheDangersofnotDoingso Page3

Financial Sector Page 4

Telecoms Sector Page 5

Public Sector Page 6

Internal Data Systems Page 6

NewOpportunities–butUnderstandyourBusinessCase Page6

4. Technology Page 7

Relational Databases with Massively Parallel Processing Page 7

Hadoop & MapReduce Page 7

Cassandra Page7

Graph-Based Systems Page 7

Storm Page 8

ComplexEventProcessingSoftware Page8

MakingtheRightChoice Page8

5. Producing Actionable Intelligence Page 9

6. People Page 10

7. Conclusions Page11

References Page 13

Whi

te p

aper

www.ipl.com

1 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

2

Introduction

Big Data is the hot topic of the moment. Large, complex sets of structured and unstructured data are now pervasive throughout the public and private sectors and the technology required to exploit them is coming of age. Big Data exploitation is no longer solely the domain of a handful of large firms with deep pockets: it is now well within the grasp of many more organisations. The potential intelligence that could be derived from these datasets represents an enormous opportunity: most notably that it can provide insights that can drive better decision-making, whether in real time or retrospectively. This intelligence represents a true corporate asset that could provide competitive advantage, help internally with legislative or regulatory compliance or be a saleable commodity.

Despite the potential opportunities, many organisations are failing to capitalise on their data assets. This may be because they do not know they can, but is more likely because they simply do not know where to begin. There are undoubtedly challenges to overcome if a Big Data initiative is to succeed, not least the fact that it cannot be handled or analysed satisfactorily by many existing systems.

Importantly, though, amidst all the hype over Big Data, it is important not to lose sight of that fact that not all the challenges are new ones. While some do indeed require new and innovative solutions, others have been faced and overcome before. Organisations must not forget the good data practices they have learned over the past decades, because much of it is as valid in the world of Big Data as it has ever been.

This white paper will first explain what Big Data is and what organisations in the finance, telecoms and public sectors can gain from it. It will then look at the key elements that are required for Big Data initiatives to be successful from start to finish: the right technology, the right people and the right presentation of the derived intelligence. Underpinning all of this, there needs to be effective information management, data governance and analysis.

It is only by doing all of these things properly that businesses will gain optimum value from their Big Data, including access to more in-depth and timely intelligence, and benefits from being able to analyse datasets that were previously out of reach or too unwieldy to handle. Doing so can deliver competitive advantage.

Whi

te p

aper

www.ipl.com

2 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

3

What is Big Data & what problems do organisations face in harnessing it?

Defining Big DataBig Data is a relatively new term but many of the underlying challenges have been seen before. It is essentially data that cannot be handled or exploited satisfactorily by traditional systems and analytic techniques. It has come increasingly into focus over the past year because of the much-talked-about explosion of data. Some 2,200 petabytes of data are created every day, and it is estimated that 90% of all data in the world was created in the last two years alone[1]. These staggering figures highlight two of the three main characteristics of Big Data: its size and the speed at which it becomes available. Its third distinguishing feature is the range of structures it comes in. This variety is down to the multitude of Big Data sources, where some of the structures will be known, and therefore more easily stored and analysed. Structured data includes mobile telecoms operators’ call detail records, or banks’ stores of customer data. Other data will be semi-structured, such as trades and swaps data, or completely unstructured, such as publicly available social network updates, email, instant messaging or photographs on image-sharing services.

The three characteristics are often referred to as the ‘Three Vs’: volume, velocity and variety, though not all Big Data will possess each to the same extent.

Big Data challengesThe volume, velocity and variety of the data raise questions of the technology: how does one store that data, and how can it be analysed in an effective and timely manner? With the data volume only rising, can the systems scale up almost infinitely, without degrading performance intolerably? And with massive data velocity, how does one even contemplate calculating mean values across this constantly changing landscape?

The variety of data poses further challenges too. Where the contents or structure of Big Data is unknown, it is not possible to impose a known fixed structure on it. This means that the normal methods of analytics and creating reports do not apply.

Indeed, when it comes to analysis and reporting, organisations must ask themselves if the intelligence is required in near-real-time (stream processing), if they are looking to drill down into Big Datasets retrospectively, or both. These two applications of Big Data analysis potentially require different technology, but both are incredibly resource-intensive and must, above all else, be reliable.

Successful analysis may require the combination of multiple datasets, where each on its own may be unfocused and of limited value, but when overlaid, provide useful insights. Knowing which to combine and how best to do it requires skilled analysts to put the right processes in place. Failure to do so could lead to incorrect or irrelevant intelligence – or no intelligence at all.

Once the data has been analysed, the final step in the exploitation of Big Data is to present it to decision-makers in ways that enable them to make the right choices in a timely manner. This is not exclusively a Big Data challenge, but is vital if the data is to be exploited effectively. It requires careful thought and creative lateral thinking: how should the intelligence be displayed? Is the most obvious way necessarily the best way, from a decision-maker’s point of view? Again, this requires the best brains in the business.

Underpinning the whole process from start to finish must be robust Information Management (IM) and Information Assurance (IA) practices. Data is an enterprise-wide asset, in the same way people, property and trademarks are, so handling it deserves the same degree of rigour, care and diligence. For example, while collecting data that is publicly available may not raise privacy concerns, as soon as organisations begin to draw intelligence from the aggregated data, they have something potentially sensitive (or of a higher classification) on their hands. If necessary, can it be satisfactorily anonymised? Are the systems secure enough to be trusted with such intelligence? Are the appropriate processes in place to safeguard data? What is the organisation allowed to do with the intelligence, legally and ethically? And what about the question of intellectual property and the conditions under which organisations can use publicly available data – or even their own proprietary data – for commercial purposes?

Whi

te p

aper

www.ipl.com

3 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

4

ThebenefitsofharnessingBigData–andthedangersofnotdoingso

These questions need to remain at the forefront of everyone’s minds when planning and implementing a Big Data initiative.

Harnessing Big Data falls into two general areas: analysis of real-time incoming data to provide near-instant intelligence for decision-makers to act upon, and retrospective analysis of large datasets, going into a level of detail that was previously impossible due to the size or complexity of the dataset. This latter process will enable businesses to spot trends that previously went unnoticed. Both of these areas can provide organisations with levels of intelligence that far surpass what they currently have access to.

Being able to analyse larger volumes of data reduces or removes the need for sampling, meaning conclusions drawn from the analysis will be more accurate. And this improved knowledge of their customers and market segments will enable companies to design better products, target marketing activities more precisely, make more informed decisions at every level of the business and gain competitive advantage by having access to intelligence that competitors do not. Furthermore, intelligence gleaned from proprietary datasets could be sold to other organisations.

On the flip side, if an organisation chooses not to start harnessing the Big Data available to it, it risks being overtaken by competitors who sniff an opportunity to gain an advantage. The data is there to be exploited and the technology has come of age, which puts Big Data projects within reach: it represents an incredible new opportunity for businesses across numerous sectors – if gone about in the right way.

Let us consider a few more specific examples of how organisations in the financial, telecoms and public sectors could benefit from harnessing the power of their Big Datasets.

Financial sectorBy being able to combine multiple datasets that were previously out of reach (the unstructured data), and/or drill down in more depth than before to draw out intelligence through appropriate analysis, financial organisations can gain a deeper insight than ever before into their customers. They could, for example, drill down deeper into all of their own data on that customer, or even overlay all of this with their credit history to spot new characteristics. So when an individual asks their bank for a loan, the institution in question will be better able to assess the risk of lending the money. In this way, the bank could gain additional business by offering less risky customers more attractive loan rates.

This increased knowledge of the customer will benefit the financial institutions’ marketing departments, too, enabling them to target their campaigns far more effectively. Say Mr and Mrs Smith are expecting their first child and are posting regularly on Twitter about this. There is an opportunity here for the bank, which could offer the couple information about its range of children’s savings accounts, if only it knew they were interested. The problem is that the bank’s current datastores give little or no hint of this. What the bank needs is some way of harnessing the Big Data from Twitter and combining it with their own data, to provide actionable intelligence: in this case, to send out information on children’s savings accounts.

Financial institutions are also required by regulators to keep money aside to mitigate for risk. Some in the sector feel that the liquidity ratios that banks are required to adhere to are too high, and that this is due to the lack of and/or poor quality of the risk information that institutions are able to present to regulators. If the banks’ Big Data could be harnessed in the right way, it would enable them to provide better or more thorough information to the regulators, meaning they could push back and request more favourable liquidity ratio requirements. This could have a direct impact on the bottom line.

Fraud detection could also be enhanced by having more intelligence available on individuals – and by being

Whi

te p

aper

www.ipl.com

3 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

5

ThebenefitsofharnessingBigData–andthedangersofnotdoingso(cont.)

able to query it quickly. In this way, when new financial transactions are executed, they can be compared to that customer’s previous behaviour to look for anomalies, which would help pinpoint fraudulent transactions. While banks already have sophisticated fraud-monitoring systems, being able to analyse much bigger sets of data would make these hugely more reliable and effective. Much of this fraudulent behaviour detection and identification of organised criminal activities is being pioneered in the areas of national security and policing.

Telecoms sectorTelecoms companies can also gain a deeper understanding of their customers by harnessing their Big Data. For example, by mapping network usage information, such as who is calling who, telecoms operators can identify the key influencers in their networks. This information can additionally be overlaid on social networking data to help the marketing teams target campaigns at those who are not only likely to take up the offers, but who will also persuade their friends to do so.

Telecoms operators have another set of proprietary data that is of significant saleable value: real-time handset locations. Potential buyers include transport operators: a sudden build-up of people at a station ticket barrier could indicate a fault with the barriers, for example. Similarly, if there is any disruption to the transport network, such as a broken down train or blocked road, handset location data would help show where crowds are building up, and hence where relief efforts need to be targeted. Of course, some such analysis is already done from primitive road sensors or traffic cameras, but once again, overlaying mobile telecoms data augments and supplements what’s already available.

Handset movement patterns would also interest shopping centre planners and operators, who would want to see aggregate flows of people around a complex, helping them understand people’s movements better than they currently can. This would in turn help them optimise the layout of existing and new centres to increase customer spend. This kind of intelligence could be provided by a telecoms operator placing small cells around the complex, collecting and aggregating the data and selling it to the centre operator. This is something the telecoms operators are in a unique position to accomplish.

Public sectorPublic sector organisations – such as local and national government bodies, the NHS, DVLA and so on – could all benefit from harnessing the power of the Big Data they have available to them. The challenge that any such body faces, however, is that the majority of the public is inherently opposed to the government knowing anything about them and will be suspicious as to what it might do with this data. Any initiative that is going to increase the government’s knowledge of citizens is going to cause significant opposition, as was seen with the now-shelved bid to introduce national ID cards[2]. For this reason, any public sector Big Data project needs to be one that has clear benefits to the general population. Furthermore, there would need to be clear and well-enforced laws on who is granted access to what data, and the purposes it is allowed to be used for.

Coupled with this is the problem of cost: Big Data projects can be expensive, especially when the datasets are so vast and spread out across different systems. Big Data initiatives are long-term investments, which are more difficult to justify to governments that are inevitably going to be interested in quick results, especially from projects that involve major spending. Here again, the benefits need to be made crystal-clear for those who hold the purse strings (and the general public) to see.

A good place to start may be in HM Revenue and Customs’ (HMRC) tax collection processes. There is currently no central system that has a full picture of an individual’s income, which may come from a combination of sources (employment, additional self-employment, interest on savings, sale of assets and so on). The way things work at the moment is such that if an individual’s earnings from employment mean they should pay higher-rate income tax on their savings, this has to be manually configured by the banks. Similarly, if someone is exempt from paying tax on their savings, they have to request this manually: HMRC has no central place it can check this and feed

Whi

te p

aper

www.ipl.com

3 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

6

ThebenefitsofharnessingBigData–andthedangersofnotdoingso(cont.)

the information to banks. By overlaying data from its multiple systems that contain information on individuals’ earnings, HMRC could ensure people automatically pay the correct amount of interest on their savings, even if their circumstances change. Similarly, when it comes to filing a tax return, a lot of the pain to the public could be taken out if HMRC was able to pre-populate the majority of the form, with relevant data pulled from multiple systems.

The benefits of exploiting all this data are clear to the public, in that it takes a lot of effort and hassle out of tax. It would also save HMRC money, in that they would not need to chase people for unpaid tax, for example. An ideal world would be one where there was no tax fraud – this is something few would argue with. By harnessing its available Big Data, HMRC could set the country on the path towards that optimal situation.

Internal data systemsAs well as commercial exploitation of Big Data, organisations have an opportunity to harness Big Data technology to manage and exploit their internal bodies of data more effectively. Over time, huge amounts of data will have been built up, in all sorts of structures across numerous systems. This data has potential uses across the business – from regulatory compliance evidence and audit trails to reducing wastage caused by duplication. But much of it will currently not be exploited to the extent that it could be, because of the challenges posed by the variety of structures. Organisations therefore need a way to combine, manipulate and analyse their disparate sources of data effectively to produce useful intelligence, which can be shared across the business.

New opportunities – but understand your business caseThese are all new benefits that can be gleaned to gain commercial value or provide tangible benefits from data: they were not possible before, because the key enablers were not available (or were prohibitively expensive). Now that they are within reach, businesses can start harnessing their Big Data in order to reap the demonstrable rewards. Before going any further, however, it is important that business leaders think about what value they want to get from analysing Big Data. Complex data analysis projects should only be undertaken if there is a valid business reason for doing so.

Once the business reasons have been established and an organisation has decided to go ahead, the various challenges highlighted above need to be addressed. Let us consider some of these next.

Whi

te p

aper

www.ipl.com

4 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

7

Technology

As discussed, Big Data is large, can be complex, comes in quickly, and is probably not in a structured format. This poses substantial technology challenges: for a start, any system that is going to handle Big Data needs to be able to cope with significant (and ever-increasing but fluctuating) quantities, both in terms of basic storage and also analysis of it. The ability to scale up (and down) rapidly is critical. Systems also need to be able to ingest and store complex and unstructured data satisfactorily.

The important thing to realise is that there is no single solution that will successfully tackle every Big Data problem. Each of the technologies has its respective strengths: some are good for retrospective analysis, while others are built to provide intelligence from real-time data. Outlined below are some of the options, but the most important thing is that organisations understand what it is they wish to achieve with their Big Data project, and pick the appropriate technology to enable it. Both traditional relational databases and newer NoSQL technologies may have a role to play, depending on the circumstances. Let us consider a few examples.

Relational databases with massively parallel processingNot all Big Data is in unknown structures, and in cases where the structure is known, traditional relational databases using Massively Parallel Processing (MPP) may be the most suitable method for storage and analysis. This is because relational databases are built with analysis in mind and include such things as indexes to enable fast data retrieval. Where relational databases fall short, however, is when it comes to data where the structure is not known, especially where there is (or will be) an absolutely huge volume of it that requires querying in a tolerable timeframe. As you scale such databases up, performance is inevitably affected, to the point where the system can become unusable.

Hadoop & MapReduceWhere vast volumes of data require retrospective querying, perhaps to draw out trends that would otherwise go unnoticed, NoSQL solutions come into their own. Apache’s Hadoop framework[3], which queries using the MapReduce process[4], is one of the best-known in the Big Data field for this purpose. Its distributed and relatively fail-resistant architecture, combined with the fact it is almost infinitely scalable and does not need to know the format of the input data, makes it a popular option for retrospective analysis of vast bodies of unstructured data. The concept behind Hadoop clusters is not new, but what has given impetus to Hadoop is that the open source software movement is fully behind it.

However, the technology is not the most straightforward to set up and maintain, and will pose a significant challenge to anyone with little experience of it. Furthermore, its batch-oriented design means Hadoop is not ideal for providing near-real-time analysis. This is because the lack of indexing as data is ingested means that queries are carried out by a slow brute force process.

CassandraCassandra[5], another NoSQL solution, is a database that is more appropriate for systems that are working with and reporting on high volumes of real-time (or near-real-time) data. This is due to a number of factors, such as its peer-to-peer architecture, which enables linear performance scalability as new nodes are added, the use of in-memory caching on each node, and because the datastore is more structured and indexed – though this storage structure needs to be defined by the user before any data is ingested. Provided the structure is well designed, reading the data out again can be extremely quick. Cassandra can be built to work with Hadoop-based systems, but uses a different kind of architecture, and is arguably more fault-tolerant than Hadoop. This is because all nodes (by which we mean distinct processing units) in its distributed architecture are of the same type, where Hadoop clusters differentiate between the nodes that coordinate processing and where things are stored (namenodes) and the nodes that actually store and process the data (datanodes).

Whi

te p

aper

www.ipl.com

4 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

8

Technology(cont.)

Graph-based systemsWhen analysing relationships between different entities – such as individuals on a social network, or what different users of an e-commerce site have viewed and bought – distributable graph-based databases offer an attractive solution. These systems, such as InfiniteGraph[6], do not require a pre-defined data schema and enable quick ingestion of data as well as rapid querying. This is made possible by selective indexing of data as it flows in, combined with the fact it is graph-based, which enables rapid traversal between nodes in the database.

StormAnother option for the sort of near-real-time processing required for quick intelligence is Storm, which can be used for stream processing and near-real-time database updates. Much like a Hadoop cluster, a Storm cluster is distributed, extensible, reliable and fault-tolerant[7], in that it is distributed, reliable, extensible and tolerant of any given node in its architecture failing. Where it differs is that it can be used for real-time processing.

Complex event processing softwareOne more general technology area that is perhaps more established than the others here is Complex Event Processing (CEP) software, which is able to analyse incoming streams of data, monitoring for specified events or patterns. When the system detects a particular pattern of events, which could be those leading up to a fraudulent financial transaction, for example, it can alert decision-makers or carry out automated processes. This technology is already widely used in the financial sector, and could have an important part to play elsewhere when it comes to processing and analysing Big Data.

Making the right choiceThere is a big variety of technologies in the marketplace – of which the above is a small selection – each offering sometimes only subtly different strengths. This shows how important it is for organisations to make the right choices. While Big Data is about more than just the technology, getting this key element wrong could have serious implications for the success of the project.

When choosing the technology, it is important to realise that many of these systems can work together, meaning it is possible to combine the benefits of multiple systems. A quick web search will show up plenty of examples of Cassandra/Hadoop mixes, for example. Again, though, this kind of setup requires significant expertise to build and maintain.

Whi

te p

aper

www.ipl.com

5 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

9

Producing actionable intelligence

Solely gathering data would be a waste of time and resources if that data was then not analysed and appropriately presented to create meaningful insight to yield business advantage. The type of analysis employed should depend on the nature of the data, but could include social network analysis or geographic analysis. And because of the ability of these new systems to query just about any size of dataset reliably, there is significant scope for experimental queries to look for trends that would previously have gone undetected, because the technology was not capable of the depth of analysis.

Appropriate visualisation of the findings is essential: it needs to be presented to decision-makers in the best way possible to help them make the right choices. The best way may not be the easiest or the most obvious, so it requires analysts who can come up with innovative ways of doing so. For example, the best way to spot key influencers in a network (be that a social one such as Twitter or a mobile phone operator’s network), is to map links between individuals, and use smart maths to analyse the relationships. The resulting visualisation, as shown in Figure 1, can quickly show decision-makers who the key people are – far more rapidly than poring over call detail records for individual accounts. Furthermore, results should be interactive, enabling the decision-makers to drill deeper into the findings, should they need to.

Figure 1: By displaying the findings from Big Data analysis in innovative ways, decision-makers can take better, more timely choices. This social

network visualisation, for example, shows a number of key influencers.

Whi

te p

aper

www.ipl.com

6 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

10

People

While technology is a key enabler when it comes to drawing business value from Big Data, it is of little use without the right people to build and run the systems. Businesses must understand that there are different skillsets required to ‘do Big Data’ successfully. They require coders with Unix and other skills required to build and maintain the complex and rapidly developing database systems discussed above. They need people with the appropriate Business Intelligence (BI) expertise to analyse data effectively, and Information Management professionals to ensure that data is handled correctly throughout the Big Data process.

The importance of good Information Management was discussed earlier, while the BI side of things is vital because the analysis possibilities are limited only by the imaginations of those involved. It is therefore essential for organisations to employ individuals with the right BI skills to conceive creative and innovative ways of analysing and visualising the data to draw out intelligence. This is particularly important because many of the currently available underlying technology frameworks remain in their infancy and are not particularly easy to set up and use. Businesses must therefore employ people with proven knowledge of building BI systems on the fledgling technology frameworks.

There are also new ways emerging of driving out correlations, which require relatively new mathematics skills that are still quite scarce. The expertise required for the sort of deep cluster analysis that can reveal underlying and previously invisible trends is also not yet in abundance. It is also important to get in the right data mining and predictive analytics skills.

These are very different sets of skills, and organisations are extremely unlikely to find all of them in a single person. Given the complexity of many of the technology solutions, a good option for organisations wanting to embark on a Big Data project would be to choose an appropriate partner company to help. However, finding one with the breadth of skills required among its staff, as well as a proven track record in the sector in question can be a challenge, so choosing carefully is crucial. Few organisations truly have the Information Management professionals capable of helping properly manage this lifeblood as a genuine enterprise asset. Make the wrong choice, and the whole project could prove a costly mistake, but pick the right partner, and the initiative will deliver long-lasting benefits.

Whi

te p

aper

www.ipl.com

6 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

11

Conclusions

For companies of all sizes, there are advantages to be realised from harnessing the power of their Big Data. These advantages are borne out of having far better levels of intelligence available than ever before, which can enable more informed and quicker decision-making across the board.

To take advantage of Big Data, organisations need to choose technology systems that are capable of handling it satisfactorily and are able to deliver the kinds of benefits that are required, the main difference being between near-real-time intelligence and retrospective analysis. Firms must also employ innovative analysis and visualisation techniques that can draw out suitable intelligence from the data in question.

Technology alone will be insufficient to bring the required benefits, however. It is the people that will determine whether the initiative delivers the promised benefits. Organisations need to place the right individuals in the relevant roles to ensure the data is transformed into intelligence that is of value to decision-makers. The skillsets involved differ considerably, from Information Management to mathematics knowledge, and there are likely to be very few individuals out there who possess all the skills required.

Big Data initiatives therefore require teams with the right blend of people, and a good option for organisations looking to take advantage of their Big Data is to work with the right partner company. This partner organisation needs to have knowledge of the technology and proven experience of Information Management and Business Intelligence, combined with a thorough understanding and track record of delivering solutions in the domain area. The lack of any one of these areas of expertise could seriously hamper the project.

It should be clear that Big Data initiatives are not the domain of one department in an organisation and cannot be addressed solely by technology. They are major projects that require buy-in from individuals at different levels of the organisation: from the directors through to the information workers who create and manipulate any data. With the right groundwork, the right technology, the right people and appropriate data analysis and visualisation, businesses will be able to turn their vast unwieldy datasets into intelligence that gives them a tangible competitive advantage.

Whi

te p

aper

www.ipl.com

6 Big Data projects aren’t one-man shows Version 2.0. © Copyright IPL 2013

12

References

1. http://www-01.ibm.com/software/data/bigdata/

2. http://www.guardian.co.uk/uk/2008/feb/06/politics.idcards

3. http://hadoop.apache.org/

4. http://hadoop.apache.org/mapreduce/

5. http://cassandra.apache.org/

6. http://www.infinitegraph.com/

7. http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html