Google Launches Cloud Bigtable, A Highly Scalable And Performant NoSQL Database

With Cloud Bigtable, Google is launching a new NoSQL database offering today that, as the name implies, is powered by the company’s Bigtable data storage system, but with the added twist that it’s compatible with the Apache HBase API — which itself is based on Google’sBigtable project. Bigtable powers the likes of Gmail, Google Search and Google Analytics, so this is definitely a battle-tested service

Google promises that Cloud Bigtable will offer single-digit millisecond latency and 2x the performance per dollar when compared to the likes of HBase and Cassandra. Because it supports the HBase API, Cloud Bigtable can be integrated with all the existing applications in the Hadoop ecosystem, but it also supports Google’s Cloud Dataflow.

Setting up a Cloud Bigtable cluster should only take a few seconds, and the storage automatically scales according to the user’s needs.

png;base64f1e982d227e3a1a8It’s worth noting that this is not Google’s first cloud-based NoSQL database product. WithCloud Datastore, Google already offers a high-availability NoSQL datastore for developers on its App Engine platform. That service, too, is based on Bigtable. Cory O’Connor, a Google Cloud Platform product manager, tells me Cloud Datastore focuses on read-heavy workload for web apps and mobile apps.

“Cloud Bigtable is much the opposite — is designed for larger companies and enterprises where extensive data processing is required, and where workloads are more complex,” O’Conner tells me. “For example, if an organization needs to stream data into, run analytics on and serve data out of a single database at scale – Cloud Bigtable is the right system. Many of our customers will start out on Cloud Datastore to build prototypes and get moving quickly, and then evolve towards services like Cloud Bigtable as they grow and their data processing needs become more complex.”

The new service is now available in beta, which means it’s open to all developers but doesn’t offer an SLA or technical support.

Source : Here

For any inquiries or to learn more about Business Intelligence, Big Data and System Optimizations, please visit our website. Collaborative VisionZ specializes in BI, Cloud, Big Data and Optimization.

Artificial Intelligence Is Almost Ready for Business

Artificial Intelligence (AI) is an idea that has oscillated through many hype cycles over many years, as scientists and sci-fi visionaries have declared the imminent arrival of thinking machines. But it seems we’re now at an actual tipping point. AI, expert systems, and business intelligence have been with us for decades, but this time the reality almost matches the rhetoric, driven by the exponential growth in technology capabilities (e.g., Moore’s Law), smarter analytics engines, and the surge in data.

Most people know the Big Data story by now: the proliferation of sensors (the “Internet of Things”) is accelerating exponential growth in “structured” data. And now on top of that explosion, we can also analyze “unstructured” data, such as text and video, to pick up information on customer sentiment. Companies have been using analytics to mine insights within this newly available data to drive efficiency and effectiveness. For example, companies can now use analytics to decide which sales representatives should get which leads, what time of day to contact a customer, and whether they should e-mail them, text them, or call them.

Such mining of digitized information has become more effective and powerful as more info is “tagged” and as analytics engines have gotten smarter. As Dario Gil, Director of Symbiotic Cognitive Systems at IBM Research, told me:

“Data is increasingly tagged and categorized on the Web – as people upload and use data they are also contributing to annotation through their comments and digital footprints. This annotated data is greatly facilitating the training of machine learning algorithms without demanding that the machine-learning experts manually catalogue and index the world. Thanks to computers with massive parallelism, we can use the equivalent of crowdsourcing to learn which algorithms create better answers. For example, when IBM’s Watson computer played ‘Jeopardy!,’ the system used hundreds of scoring engines, and all the hypotheses were fed through the different engines and scored in parallel. It then weighted the algorithms that did a better job to provide a final answer with precision and confidence.”

Beyond the Quants

Interestingly, for a long time, doing detailed analytics has been quite labor- and people-intensive. You need “quants,” the statistically savvy mathematicians and engineers who build models that make sense of the data. As Babson professor and analytics expert Tom Davenport explained to me, humans are traditionally necessary to create a hypothesis, identify relevant variables, build and run a model, and then iterate it. Quants can typically create one or two good models per week.

However, machine learning tools for quantitative data – perhaps the first line of AI – can create thousands of models a week. For example, in programmatic ad buying on the Web, computers decide which ads should run in which publishers’ locations. Massive volumes of digital ads and a never-ending flow of clickstream data depend on machine learning, not people, to decide which Web ads to place where. Firms like DataXu use machine learning to generate up to 5,000 different models a week, making decisions in under 15 milliseconds, so that they can more accurately place ads that you are likely to click on.

Tom Davenport:

“I initially thought that AI and machine learning would be great for augmenting the productivity of human quants. One of the things human quants do, that machine learning doesn’t do, is to understand what goes into a model and to make sense of it. That’s important for convincing managers to act on analytical insights. For example, an early analytics insight at Osco Pharmacy uncovered that people who bought beer also bought diapers. But because this insight was counter-intuitive and discovered by a machine, they didn’t do anything with it. But now companies have needs for greater productivity than human quants can address or fathom. They have models with 50,000 variables. These systems are moving from augmenting humans to automating decisions.”

In business, the explosive growth of complex and time-sensitive data enables decisions that can give you a competitive advantage, but these decisions depend on analyzing at a speed, volume, and complexity that is too great for humans. AI is filling this gap as it becomes ingrained in the analytics technology infrastructure in industries like health care, financial services, and travel.

The Growing Use of AI

IBM is leading the integration of AI in industry. It has made a $1 billion investment in AI through thelaunch of its IBM Watson Group and has made many advancements and published research touting the rise of “cognitive computing” – the ability of computers like Watson to understand words (“natural language”), not just numbers. Rather than take the cutting edge capabilities developed in its research labs to market as a series of products, IBM has chosen to offer a platform of services under the Watson brand. It is working with an ecosystem of partners who are developing applications leveraging the dynamic learning and cloud computing capabilities of Watson.

The biggest application of Watson has been in health care. Watson excels in situations where you need to bridge between massive amounts of dynamic and complex text information (such as the constantly changing body of medical literature) and another mass of dynamic and complex text information (such as patient records  or genomic data), to generate and evaluate hypotheses. With training, Watson can provide recommendations for treatments for specific patients. Many prestigious academic medical centers, such as The Cleveland Clinic, The Mayo Clinic, MD Anderson, and Memorial Sloan-Kettering are working with IBM to develop systems that will help healthcare providers better understand patients’ diseases and recommend personalized courses of treatment. This has proven to be a challenging domain to automate and most of the projects are behind schedule.

Another large application area for AI is in financial services. Mike Adler, Global Financial Services Leader at The Watson Group, told me they have 45 clients working mostly on three applications: (1) a “digital virtual agent” that enables banks and insurance companies to engage their customers in a new, personalized way, (2) a “wealth advisor” that enables financial planning and wealth management, either for self-service or in combination with a financial advisor, and (3) risk and compliance management.

For example, USAA, the $20 billion provider of financial services to people that serve, or have served, in the United States military, is using Watson to help their members transition from the military to civilian life. Neff Hudson, vice president of emerging channels at USAA, told me, “We’re always looking to help our members, and there’s nothing more critical than helping the 150,000+ people leaving the military every year. Their financial security goes down when they leave the military. We’re trying to use a virtual agent to intervene to be more productive for them.” USAA also uses AI to enhance navigation on their popular mobile app. The Enhanced Virtual Assistant, or Eva, enables members to do 200 transactions by just talking, including transferring money and paying bills. “It makes search better and answers in a Siri-like voice. But this is a 1.0 version. Our next step is to create a virtual agent that is capable of learning. Most of our value is in moving money day-to-day for our members, but there are a lot of unique things we can do that happen less frequently with our 140 products. Our goal is to be our members’ personal financial agent for our full range of services.”

In addition to working with large, established companies, IBM is also providing Watson’s capabilities to startups. IBM has set aside $100 million for investments in startups. One of the startups that is leveraging Watson is WayBlazer, a new venture in travel planning that is led by Terry Jones, a founder of Travelocity and Kayak. He told me:

“I’ve spent my whole career in travel and IT. I started as a travel agent, and people would come in, and I’d send them a letter in a couple weeks with a plan for their trip. The Sabre reservation system made the process better by automating the channel between travel agents and travel providers. Then with Travelocity we connected travelers directly with travel providers through the Internet. Then with Kayak we moved up the chain again, providing offers across travel systems. Now with WayBlazer we have a system that deals with words. Nobody has helped people with a tool for dreaming and planning their travel. Our mission is to make it easy and give people several personalized answers to a complicated trip, rather than the millions of clues that search provides today. This new technology can take data out of all the silos and dark wells that companies don’t even know they have and use it to provide personalized service.”

What’s Next

As Moore’s Law marches on, we have more power in our smartphones than the most powerful supercomputers did 30 or 40 years ago. Ray Kurzweil has predicted that the computing power of a $4,000 computer will surpass that of a human brain in 2019 (20 quadrillion calculations per second). What does it all mean for the future of AI?

To get a sense, I talked to some venture capitalists, whose profession it is to keep their eyes and minds trained on the future. Mark Gorenberg, Managing Director at Zetta Venture Partners, which is focused on investing in analytics and data startups, told me, “AI historically was not ingrained in the technology structure. Now we’re able to build on top of ideas and infrastructure that didn’t exist before. We’ve gone through the change of Big Data. Now we’re adding machine learning. AI is not the be-all and end-all; it’s an embedded technology. It’s like taking an application and putting a brain into it, using machine learning. It’s the use of cognitive computing as part of an application.” Another veteran venture capitalist, Promod Haque, senior managing partner at Norwest Venture Partners, explained to me, “if you can have machines automate the correlations and build the models, you save labor and increase speed. With tools like Watson, lots of companies can do different kinds of analytics automatically.”

Manoj Saxena, former head of IBM’s Watson efforts and now a venture capitalist, believes that analytics is moving to the “cognitive cloud” where massive amounts of first- and third-party data will be fused to deliver real-time analysis and learning. Companies often find AI and analytics technology difficult to integrate, especially with the technology moving so fast; thus, he sees collaborations forming where companies will bring their people with domain knowledge, and emerging service providers will bring system and analytics people and technology. Cognitive Scale (a startup that Saxena has invested in) is one of the new service providers adding more intelligence into business processes and applications through a model they are calling “Cognitive Garages.” Using their “10-10-10 method” they deploy a cognitive cloud in 10 seconds, build a live app in 10 hours, and customize it using their client’s data in 10 days. Saxena told me that the company is growing extremely rapidly.

I’ve been tracking AI and expert systems for years. What is most striking now is its genuine integration as an important strategic accelerator of Big Data and analytics. Applications such as USAA’s Eva, healthcare systems using IBM’s Watson, and WayBlazer, among others, are having a huge impact and are showing the way to the next generation of AI.

Source : Here

For any inquiries or to learn more about Business Intelligence, Big Data and System Optimizations, please visit our website. Collaborative VisionZ specializes in BI, Cloud, Big Data and Optimization.

Azure Media Services integration with Azure CDN (Content Delivery Network)

Azure Media Services is now fully integrated with Azure CDN. Azure CDN can be easily enabled for any Media Services streaming endpoint by using the Azure Management Portal or via the Media Services REST API 2.9 or .NET SDK

This allows you to leverage the capabilities of our worldwide CDN POPs to easily scale from thousands to hundreds of thousands of viewers and to improve performance. The CDN is designed to cache and deliver content to end users from a CDN POP that is as close as possible to them. By moving content close to the user, delivery time is reduced and the overall user experience can be significantly improved.

Without CDN, users access media content directly from the data center associated with your Azure Media Services account. This is an ideal workflow to use when you have a limited number of users (e.g. a few hundred) and they are all located near the Azure data center you are using for Azure Media Services. When you have a larger number of local or globally distributed users (e.g. thousands) it is recommended that you enable Azure CDN to improve performance.

Pricing benefits:

  • Without CDN, you are charged for each streaming unit and outbound traffic from the Azure data center. With CDN, you are charged for each streaming unit and only the outbound traffic from the CDN POPs to end users. You are not charged for the outbound traffic from the Azure data center to the CDN.


Key characteristics:

  • Data center availability: CDN integration is enabled in all the Azure data centers: US West, US East, North Europe, West Europe, Japan West, Japan East, South East Asia and East Asia.
  • Streaming hostname: Your streaming endpoint hostname remains the same after enabling CDN. You don’t need to make any changes to your media services workflow after CDN is enabled. For example, if your streaming endpoint hostname, after enabling CDN, the exact same hostname is used.
  • Streaming unit: You need to have at least one media services streaming unit to be able to enable CDN. By default when you create a new streaming endpoint a streaming unit is automatically enabled.
  • Activation time: After enabling CDN, it takes two hours for the changes to be active across all the CDN POPs.
  • Enabling CDN: For new streaming endpoints, you can enable CDN simply by creating a new endpoint; for existing streaming endpoints, you will need to first stop the endpoint and then enable the CDN.
  • Allowed requests: When CDN is enabled for a streaming endpoint, clients cannot request content directly from the origin. If you need the ability to test your content with or without CDN you can create another streaming endpoint that isn’t CDN enabled.

Source : Here

For any inquiries or to learn more about Business Intelligence, Big Data and System Optimizations, please visit our website. Collaborative VisionZ specializes in BI, Cloud, Big Data and Optimization.

Where Big Data Projects Fail

Over the past 6 months I have seen the number of big data projects go up significantly and most of the companies I work with are planning to increase their Big Data activities even further over the next 12 months. Many of these initiatives come with high expectations but big data projects are far from fool-proof. In fact, I predict that half of all big data projects will fail to deliver against their expectations.

Screenshot 2015-03-17 13.44.13

Failure can happen for many reasons, however there are a few glaring dangers that will cause any big data project to crash and burn. Based on my experience working with companies and organizations of all shapes and sizes, I know these errors are all too frequent. One thing they have in common is they are all caused by a lack of adequate planning.

So, in no particular order, here are some of the most common causes of failure in business big data projects that I’ve come across.

Sometimes you will get lucky and hit on an interesting insight taking this approach, but it’s highly inefficient. In fact it’s a bit like sitting an exam and not bothering to read the question, simply writing out everything you know on the subject and hoping it will include the information the examiner is looking for.

Not making a good business case

A lot of people go into Big Data with a “me too!” attitude. The barriers to entry are constantly dropping, which is a great thing – open source software is becoming more accessible all the time, and there are an ever-growing number of “software-as-a-service” companies which often hugely cut down the need for infrastructure investment. On top of that, people like me are always saying that if you aren’t in, you’re going to get left behind!

Well, that’s all true – but you still need to establish why there is a particular need for your business to allocate time and resources to it. However much data you collect (but particularly if it’s big) you will need to make sure it is kept clean and secure and there will be ongoing costs associated with this that might not be anticipated. In short, you need to know why your business needs to use Big Data before you start doing it. If you don’t – wait until you do.

Management Failure

I admit, this is something of a catch-all and can affect any kind of business initiative. But in a time and resource-intensive Big Data initiative (skilled data scientists generally expect to be paid at least $100,000 a year) management failure can have disastrous consequences.

Sometimes it’s because those holding the purse strings haven’t taken into account some long-term or ongoing cost associated with the project, or sometimes the senior project managers just can’t talk productively to the data scientist workers in the lab. Sometimes it will be because senior managers don’t trust the algorithms – many got where they are today on gut instinct – and they aren’t going to start letting a computer tell them what to do now.

Mismanagement can come from many angles and the UK National Health Service’sfatally botched National Programme for IT is a prime example. The plan to bring all patient medical records into a central database was described as the “biggest IT failure ever seen” and was scrapped after more than £10 billion ($14.9 billion) had been spent.

Not starting with clear business objectives

It’s easy to get caught up in hype – and Big Data has certainly been hyped. When so many people (including me) are shouting about how earth-shatteringly important it is, and anyone not on board is likely to sink, it isn’t surprising that a lot of people start with the “how” without first considering the “why”.

What people who fall into that trap often failed to appreciate is that analytics in business in about problem solving – and first you need to know what problem you are trying to solve.

I worked with an airline which had thrown itself into a range of Big Data projects with great enthusiasm – cataloguing and collecting information on everything from meal preferences to the impact delays would have on drinks orders. Another client – a retailer – had 258 separate data projects on the go when they called me in. Some were interesting – such as by mining all of their stock and purchase data they had found that a particular bottle of wine sold exceptionally well on a Tuesday, and even more so if it was raining. But so what?  The issue is that shelf-space to pre-assigned and can’t be increased for this brand for just this one day. The only option is to ensure the allocated shelf-space is regularly restocked on Tuesdays. In isolation that insight isn’t going to provide them with huge growth or positive change.

Poor communication

To say the world of Big Data is made up of egghead scientists and corporate suits with dollar signs in their eyes is to use stereotypes which are offensive to both groups – but it works to illustrate the problem here. Big Data in business is about the interface between the analytical, experimental science that goes on in data labs, and the profit and target chasing sales force and boardroom. They are not always natural bedfellows and things can get lost in translation – with sometimes tragic consequences.

For me, the Space Shuttle Challenger disaster serves as an example – it may have been well before the term Big Data was coined but the analysts at NASA were dealing with very large amounts of information for the time – monitoring sensors equipped throughout the shuttle. Their reports to the higher-ups at mission control went into a great amount of detail and included information that would have shown the significant risks of the shuttle breaking up – if the mission controllers had been able to spot it among the superfluous data. If they had been presented with a document titled “The shuttle is likely to crash because …” things would have ended far more happily.

I feel that lessons have not yet been learnt. Those with responsibility for reporting need to think “who is this data for, and how can I package it to make sure the message gets through?” Analysts with one of my clients – a healthcare company – recently created a report for senior management which was 217 pages long. By replacing much of the text with infographics, we cut it down to 15 pages which still contained all the essential information but presented it far more clearly.

Not having the right skills for the job

Or just as fatally, not having the right skills at the right time. As I’ve explained in the examples here, companies are often fond of starting up data projects “left, right and centre” without thinking enough about how this might impact resources in the future. And skilled data science staff are certainly a very valuable resource. Having started a project with no clearly defined aim, in my experience businesses will often come unstuck when they do come across a valid opportunity for meaningful analysis and find their skilled staff otherwise engaged.

Data science staff are expensive and in extremely limited supply. Companies can benefit here from some out-of-the-box thinking – for example one of my banking client told me that while they have a lot of business analysts, they aren’t trained in Big Data and aren’t really data scientists. As there is a huge crossover in skills between the disciplines, I suggested that offering their existing staff specific Big Data training would almost certainly be cheaper than hiring in a whole new team of specialists. We identified the key skills gaps and developed a customised course to move people from business analyst to big data scientists. We also complemented the course with the many online resources where anyone can learn the necessary fundamentals for free. In addition to the training, the bank looked to universities and colleges which often offer the service of students or academics to provide analytical support to businesses. Today, the bank also sponsors a number of PhD students that are using their business’s own data for their study.

Source : Here

For any inquiries or to learn more about Business Intelligence, Big Data and System Optimizations, please visit our website. Collaborative VisionZ specializes in BI, Cloud, Big Data and Optimization.

Gartner Advanced Analytics Quadrant 2015: Gainers, Losers


Gartner published its latest Magic Quadrant for Advanced Analytics Platforms late last week, and once again it delivers a mix of good news and bad news for vendors hoping for recognition in the era of big data and data-driven decision-making.

Advanced analytics platforms have always been tools built for statisticians and data scientists. But Gartner noted in its latest report that these tools are “increasingly being directed at business analysts and ‘citizen data scientists.'” Thus, Gartner’s rankings on its well-known “completeness of vision” and “ability to execute” axes now ride as much on “ease of use” as they do on offering numerous and novel analytical algorithms.

It’s no surprise that this year’s Magic Quadrant for Advanced Analytics Platforms is topped by SAS, the perennial top dog in this category. But offsetting SAS strengths, including “the widest [analytics] stack in the industry,” high product scores, and “a high level of customer loyalty,” Gartner cautioned that “complexity of the learning curve for SAS products is a concern for nonexpert users.”

Gartner’s top-right “Leaders” quadrant includes SAS, IBM, Knime, and RapidMiner for the second year in a row. The biggest change is in the “Visionaries” quadrant, which has Microsoft, Alteryx, and Alpine Data Labs moving up and to the right.

Microsoft’s move out of last year’s “niche player” ranking is tied to the vendor’s 2014 launch of the Azure Machine Learning (Azure ML) service. Though the service was still in beta during Gartner’s review, Azure ML promises “deep integration with R,” the popular predictive analytics language, and “solutions from an entire ecosystem of third parties” supporting Azure, according to Gartner. The analyst firm was unimpressed with the 2014 update of Microsoft SQL Server Analysis Services, which, like its predecessors, lacks “breadth, depth, and usability in comparison with the Leaders’ offerings.”

Gartner noted Microsoft’s recently announced plan to acquire Revolution Analytics, but that vendor surprisingly drops from visionary to niche player in this year’s report. Gartner cautioned that Revolution Analytics’ platform is “demanding in terms of coding skills” and that “support for R is becoming less of a competitive differentiator.”

Another surprise in this year’s report is the complete absence of Oracle, which was in last year’s Niche Players quadrant. Gartner said Oracle was dropped from this year’s report because it “offers solutions as a tightly integrated component of its database, BI, and big data offerings, not as a stand-alone offering.” Much the same could be said of Microsoft — at least before it introduced the Azure ML service. We suspect Oracle didn’t want to be listed in any Quadrant as an also-ran, but don’t be surprised to see Oracle step up in this category in 2015 through an acquisition or new-product introduction.

Magic Quadrant for Advanced Analytics Platforms, 2015 & 2014

Alteryx, which is poised to enter the Leaders’ quadrant, and Alpine Data Labs have both put an emphasis on business-user accessibility. Gartner noted Alteryx’s “drag-and-drop functionality” and “ease of use,” as well as its depth in location intelligence. Alpine stands out, said Gartner, for its “single, collaborative environment for data analysts and business users.”

The only notable change in Gartner’s top-left “Challengers” quadrant is the first-time appearance of Dell, which shows up in pretty much the same spot as StatSoft in the 2014 report. Dell acquired StatSoft last year, and Gartner said Dell’s financial resources and sales presence should help it to “accelerate the development and market presence” of StatSoft’s Statistica software.

Poised on the cusp of the Leaders quadrant are SAP and FICO, both of which are large vendors that are investing aggressively in improving their advanced analytics capabilities. SAP acquired KXEN in late 2013, and Gartner said the vendor is “close to fully integrating the KXEN InfiniteInsight product into its SAP Predictive Analysis product.” FICO acquired cloud-based predictive analytics vendor InfoCentricity in 2014, and Gartner said this deal should help it extend its strong reputation for credit scoring and fraud detection into the cloud.

Making predictive analysis dead easy for business users remains an elusive, distant goal in this category. But this latest Magic Quadrant shows that the category is dynamic, with acquisitions and new players including Predixion, Pronoz, and Salford Systems joining the fray. Better user interfaces and ease-of-use are priorities for all these vendors, so explore vendor product roadmaps and ask your newbie users which tools they find to be most accessible.

Source : Here

For any inquiries or to learn more about Business Intelligence, Big Data and System Optimizations, please visit our website. Collaborative VisionZ specializes in BI, Cloud, Big Data and Optimization.

How big data tool experience tracks with tech salaries

Big data pays big money.

That’s the conclusion of a couple of recent surveys that found that data analysts and engineers with big data chops are earning more than $120,000, compared with the reported average IT salary of $89,450. And Storm and Spark users can pull down $150,000, compared to the median total salary of all data analysts and engineers of $98,000.

“Big data made a big showing last year and we’re seeing it this year too,” said Shravan Goli, president of tech job consultancy, in a statement. “Tech professionals who analyze and mine information in a way that makes an impact on overall business goals have proven to be incredibly valuable to companies. The proof is in the pay.”

And while that is surely good news for data scientists in the financial and marketing sectors, government agencies are getting pinched. Like private-sector enterprises, they see there are insights and efficiencies to be had through analysis of big data, but agencies can’t compete on salary.

The median total salary of government data analysts and engineers was significantly lower– by approximately $17,000 – than median salaries earned by data analysts and engineers across other industries, according to a recent salary survey by O’Reilly Media, which also analyzed the tools used by data professionals.  Unsurprisingly, respondents who work for government vendors reported higher salaries.

Other contributing salary factors included age, gender, years in the field, employee level, degrees held and usage of cloud technology.  Among O’Reilly’s findings:

  • Every year of age added $1,100, with an additional $1,400 for every year of experience working in data.
  • Women earned a median of $13,000 less than men, a number consistent with the general U.S. population.
  • Those with doctorates earned $11,000 more, and every position increase added an average $10,000 to salary.
  • Those using cloud technology earned $13,000 more than those who didn’t.

High-end, high pay

Data engineers who have experience with  Storm and Spark earn the highest median salaries, according to O’Reilly.

Apache Storm is a distributed, fault tolerant, real-time computation system for processing large volumes of high-velocity data. Its speed makes it useful for real-time analytics, machine learning and continuous computation.

Apache Spark is a big data processing framework that improves traditional Hadoop-based analytics. It uses in-memory primitives and other enhanced technologies to outperform MapReduce and offers more computational options, with tool libraries for enhanced SQL querying, streaming data analytics, machine learning and more.

Other high-salary tools were IBM’s Netezza, Cassandra, Amazon Elastic MapReduce, Homegrown (avt), Pig, Hortonworks, Teradata and Hbase (all with median salaries over $130,000).

The more tools a data professional used, the higher the salary, with those using up to 10 tools earning a median salary of $82,000 rising to $110,000 for those using 11 to 20 tools and $143k for those using more than 20.

The tools most typically used by respondents were programming languages, databases,  Hadoop  distributions,  visualization  applications,  business intelligence  programs,  operating  systems or  statistical  packages.

Aside from operating systems, SQL was the most commonly used tool, with R and Python closely behind Excel. Over 50 percent of respondents used these four top data tools, followed by Java and JavaScript with 32 percent and 29 percent respectively. MySQL was the most popular database, closely followed by Microsoft SQL Server.

The study also looked at tools commonly used together and tried to determine the relationship between tool clusters and salaries.

These clusters were:

  • Cluster 1: Windows; C#; SPSS; Visual Basic, VBA; SQL; Business Objects; Oracle BI; PowerPivot; Excel; Oracle; SAS; Microstrategy; MS SQL Server.
  • Cluster 2: Linux; Java; Redis; Hive; Amazon; ElasticMapReduce (EMR); MongoDB; Homegrown ML Tools; Storm; Cloudera; Apache Hadoop; Hortonworks; Spark, MapR; Cassandra; Hbase; Pentaho; Mahout; Splunk; Scala; Pig.
  • Cluster 3: Python; R; Matlab; Natural Language/Text Processing; Continuum Analytics (NumPy + SciPy); Network/Social Graph; libsym; Weka.
  • Cluster 4: Mac OS X; JavaScript; MySQL; PostgreSQL; D3; Ruby; Google Chart Tools/Image API; SQLite.
  • Cluster 5: Unix; C++; Perl; C.

After discarding clusters 4 and 5 because they were not significant indicators of salary, O’Reilly determined that users of Cluster 2 and 3 tools earn more, with each tool from Cluster 2 contributing $1,645 to the expected total salary and each tool from Cluster 3 contributing $1,900.

The report confirms trends that have been evolving for some time: Hadoop is on the rise, cloud-based data services are important and those who know how to use the advanced, recently developed tools of big data typically earn high salaries.

“For  future  research  we  would  like  to  drill  down  into  more  detail about the actual roles, tasks, and goals of data scientists, data engineers,  and  other  people  operating  in  the  data  space.  After all,  an individual’s contribution – and thus his salary – is not just a function of demographics, level/position, and tool use, but also of what he actually does at his organization,” noted John King and Roger Magoulas, writers of the report.

Source : Here

For any inquiries or to learn more about Business Intelligence, Big Data and System Optimizations, please visit our website. Collaborative VisionZ specializes in BI, Cloud, Big Data and Optimization.

Big Data company proposes servers in space, we have no idea why!!

GPS satellite, artist render

Space technology is cool. Whether you’re a fan of Elon Musk’s SpaceX, NASA’s Curiosity, or the ESA’s ongoing exploration of Comet 67P, ongoing space exploration is at the heart of some of humanity’s most impressive endeavors. This, however, doesn’t makeevery space-based idea a good idea. We’re honestly scratching our heads over how Big Data analytics company ConnectX intends to build a business around its “Space-based supercomputing platform to solve the problems of Big Data.”

ConnectX has proposed a network of server-satellites that would sit in orbit and transmit data to and from ground-based stations using a propriety network of symbols that the company has developed and claims can represent shifting relationships between multiple, related variables. We can’t comment on the nature of the company’s supposed breakthroughs in both data transmission or information processing, but it wants to put these new server farms in orbit, with huge solar installations to power them. This is problematic for a number of reasons.

Forget Big Data: Let’s talk about space

Space is hard. Even in low Earth orbit (LEO), which provides substantial shielding from cosmic rays, probes and computer systems are exposed to significantly higher levels of cosmic radiation. This doesn’t prevent the use of conventional hardware in all scenarios — many of the computers on the International Space Station are commercial off the shelf (COTS) systems — but if you want to build a satellite in LEO orbit with signifcant processing capability, you have to account for very different cooling and operating requirements. Traditional forced-air cooling doesn’t work (for obvious reasons), and without gravity heat won’t rise — one problem on the ISS is that heat generated by COTS laptops will stagnate around the systems rather than drifting away.

SRAM failure rates

Similarly, using huge solar arrays to generate power on a per-satellite basis is an intriguing idea — solar radiation above the atmosphere is signifcantly stronger than below, which is one reason by space-based solar power has occasionally been floated as an option for modern equipment, but building a space-based solar installation at large scale as a means of powering significant server installations is completely unproven.
Just because you can put something in space doesn’t mean you should.

This is where ConnectX’s space-based proposal starts to look a bit odd. The company references ultra-cheap satellites like CubeSats or TubeSats, but the processing power of these enclosures is exceedingly modest — not the sort of equipment you’d use to build a high-speed data access network with unproven rich symbolic language processing and enormous bandwidth connections back to Earth. While Fortune correctly notes that the cost of satellite launches has gone down dramatically thanks to these initiatives, lifting significant processing power into orbit is still going to be a costly endeavor. No one has proposed building a comprehensive data analysis network for Fortune 500 customers out of CubeSats, after all.

The bottom line is this: ConnectX claims it has a new method of radio transmission that can dramatically boost total bandwidth. It claims to have a condensed, non-binary symbol structure for data transmission and representation that will dramatically improve overall network efficiency and improve Big Data analysis by packing more information into each symbol — and that’s also possible. ConnectX is claiming it can offer enormous reductions in storage capacity and cost thanks to a unique space-based platform, while handwaving the fundamental difficulty and redundancy required to offer reliable enterprise-class hardware on Earth. Every redundant rack, hard drive, and radio is going to take up weight in a launch vehicle where weight and size are at a premium.

Tellingly, ConnectX’s staff contains an economist, futurist, biologist, security consultant, managers, a professor of electrical engineering, a professor of mathematics, research physicist, other various executives, and not a single engineer. This lack makes ConnectX’s blithe claims of offering vastly reduced costs thanks to space-based operation vastly more dubious. Even if its radio and symbology claims are true, there’s nothing about these advances that requires space-based operation to be successful.

Source : Here

For any inquiries or to learn more about Business Intelligence, Big Data and System Optimizations, please visit our website. Collaborative VisionZ specializes in BI, Cloud, Big Data and Optimization.