Tuesday, December 13, 2011

Special Situations

Special Situations in Investing Download to know more !

http://www.seletics.com/1.pptx

Monday, November 14, 2011

Columnar DBMS vendor customer metrics - from dbms2.com

Last April, I asked some columnar DBMS vendors to share customer metrics. They answered, but it took until now to iron out a couple of details. Overall, the answers are pretty impressive.

Sybase said that Sybase IQ had > 2000 direct customers and >500 indirect customers (i.e., end customers of OEMs). That’s counting by customers; I know from prior discussions that Sybase IQ is running at close to two installations per customer. I also believe that Sybase counts different divisions of the same large enterprise as separate customers.

Vertica cited a figure of 500 customers as of April (end Q1?), which is close to 600 now, about 40% or a little more direct. The difference between this and a 2010 year-end figure of 328 is not only new sales, but also slow reporting by OEMs. One cool figure — a single OEM reported 82 end sales in a single (quarterly?) report. And a number of those direct customers are substantial; Vertica’s customer logo page features lots of telcos, lots of internet companies, and the national operation of Blue Cross/Blue Shield.

Pay no attention to small inconsistencies in the number of Vertica direct customers (250 at year-end, no more than that now); Colin Mahony just estimates these numbers for me from memory, and minor inaccuracies are quite excusable.

Even cooler — Vertica reports 7 customers with a petabyte or more of user data each. About 5 of the 7 are obvious-suspect big-name firms; but unsurprisingly, those big names are NDA. I did secure permission to say that there are 2 telecom companies, a mobile gaming vendor, another internet company, and 3 financial services outfits of various kinds.

SAND Technology reported >600 total customers, including >100 direct.Since SAND has been around since the 1990s, those aren’t great average annual figures, but they’re probably more than many people (including me) thought.

Infobright reported around 200 total paying customers, 130 direct.There are surely a lot more users of open source Infobright, but precise numbers are of course hard to come by.

If I asked ParAccel in the April go-round, I’ve misplaced their answer, but back in October the figure was >30 customers, 2 of them over 100 terabytes. I’ve seen published figures of 40+ for ParAccel since.

Sunday, October 23, 2011

Economist - The data deluge

Businesses, governments and society are only starting to tap its vast potential

Feb 25th 2010 | from the print edition

EIGHTEEN months ago, Li & Fung, a firm that manages supply chains for retailers, saw 100 gigabytes of information flow through its network each day. Now the amount has increased tenfold. During 2009, American drone aircraft flying over Iraq and Afghanistan sent back around 24 years’ worth of video footage. New models being deployed this year will produce ten times as many data streams as their predecessors, and those in 2011 will produce 30 times as many.

Everywhere you look, the quantity of information in the world is soaring. According to one estimate, mankind created 150 exabytes (billion gigabytes) of data in 2005. This year, it will create 1,200 exabytes. Merely keeping up with this flood, and storing the bits that might be useful, is difficult enough. Analysing it, to spot patterns and extract useful information, is harder still. Even so, the data deluge is already starting to transform business, government, science and everyday life (see our special report in this issue). It has great potential for good—as long as consumers, companies and governments make the right choices about when to restrict the flow of data, and when to encourage it.

Plucking the diamond from the waste

A few industries have led the way in their ability to gather and exploit data. Credit-card companies monitor every purchase and can identify fraudulent ones with a high degree of accuracy, using rules derived by crunching through billions of transactions. Stolen credit cards are more likely to be used to buy hard liquor than wine, for example, because it is easier to fence. Insurance firms are also good at combining clues to spot suspicious claims: fraudulent claims are more likely to be made on a Monday than a Tuesday, since policyholders who stage accidents tend to assemble friends as false witnesses over the weekend. By combining many such rules, it is possible to work out which cards are likeliest to have been stolen, and which claims are dodgy.

Mobile-phone operators, meanwhile, analyse subscribers’ calling patterns to determine, for example, whether most of their frequent contacts are on a rival network. If that rival network is offering an attractive promotion that might cause the subscriber to defect, he or she can then be offered an incentive to stay. Older industries crunch data with just as much enthusiasm as new ones these days. Retailers, offline as well as online, are masters of data mining (or “business intelligence”, as it is now known). By analysing “basket data”, supermarkets can tailor promotions to particular customers’ preferences. The oil industry uses supercomputers to trawl seismic data before drilling wells. And astronomers are just as likely to point a software query-tool at a digital sky survey as to point a telescope at the stars.

There’s much further to go. Despite years of effort, law-enforcement and intelligence agencies’ databases are not, by and large, linked. In health care, the digitisation of records would make it much easier to spot and monitor health trends and evaluate the effectiveness of different treatments. But large-scale efforts to computerise health records tend to run into bureaucratic, technical and ethical problems. Online advertising is already far more accurately targeted than the offline sort, but there is scope for even greater personalisation. Advertisers would then be willing to pay more, which would in turn mean that consumers prepared to opt into such things could be offered a richer and broader range of free online services. And governments are belatedly coming around to the idea of putting more information—such as crime figures, maps, details of government contracts or statistics about the performance of public services—into the public domain. People can then reuse this information in novel ways to build businesses and hold elected officials to account. Companies that grasp these new opportunities, or provide the tools for others to do so, will prosper. Business intelligence is one of the fastest-growing parts of the software industry.

Now for the bad news

But the data deluge also poses risks. Examples abound of databases being stolen: disks full of social-security data go missing, laptops loaded with tax records are left in taxis, credit-card numbers are stolen from online retailers. The result is privacy breaches, identity theft and fraud. Privacy infringements are also possible even without such foul play: witness the periodic fusses when Facebook or Google unexpectedly change the privacy settings on their online social networks, causing members to reveal personal information unwittingly. A more sinister threat comes from Big Brotherishness of various kinds, particularly when governments compel companies to hand over personal information about their customers. Rather than owning and controlling their own personal data, people very often find that they have lost control of it.

The best way to deal with these drawbacks of the data deluge is, paradoxically, to make more data available in the right way, by requiring greater transparency in several areas. First, users should be given greater access to and control over the information held about them, including whom it is shared with. Google allows users to see what information it holds about them, and lets them delete their search histories or modify the targeting of advertising, for example. Second, organisations should be required to disclose details of security breaches, as is already the case in some parts of the world, to encourage bosses to take information security more seriously. Third, organisations should be subject to an annual security audit, with the resulting grade made public (though details of any problems exposed would not be). This would encourage companies to keep their security measures up to date.

Market incentives will then come into play as organisations that manage data well are favoured over those that do not. Greater transparency in these three areas would improve security and give people more control over their data without the need for intricate regulation that could stifle innovation. After all, the process of learning to cope with the data deluge, and working out how best to tap it, has only just begun.

Economist on BigData - New rules for big data

Regulators are having to rethink their brief

Feb 25th 2010 | from the print edition

TWO centuries after Gutenberg invented movable type in the mid-1400s there were plenty of books around, but they were expensive and poorly made. In Britain a cartel had a lock on classic works such as Shakespeare’s and Milton’s. The first copyright law, enacted in the early 1700s in the Bard’s home country, was designed to free knowledge by putting books in the public domain after a short period of exclusivity, around 14 years. Laws protecting free speech did not emerge until the late 18th century. Before print became widespread the need was limited.

Now the information flows in an era of abundant data are changing the relationship between technology and the role of the state once again. Many of today’s rules look increasingly archaic. Privacy laws were not designed for networks. Rules for document retention presume paper records. And since all the information is interconnected, it needs global rules.

New principles for an age of big data sets will need to cover six broad areas: privacy, security, retention, processing, ownership and the integrity of information.

Privacy is one of the biggest worries. People are disclosing more personal information than ever. Social-networking sites and others actually depend on it. But as databases grow, information that on its own cannot be traced to a particular individual can often be unlocked with just a bit of computer effort.

This tension between individuals’ interest in protecting their privacy and companies’ interest in exploiting personal information could be resolved by giving people more control. They could be given the right to see and correct the information about them that an organisation holds, and to be told how it was used and with whom it was shared.

Today’s privacy rules aspire to this, but fall short because of technical difficulties which the industry likes to exaggerate. Better technology should eliminate such problems. Besides, firms are already spending a great deal on collecting, sharing and processing the data; they could divert a sliver of that money to provide greater individual control.

The benefits of information security—protecting computer systems and networks—are inherently invisible: if threats have been averted, things work as normal. That means it often gets neglected. One way to deal with that is to disclose more information. A pioneering law in California in 2003 required companies to notify people if a security breach had compromised their personal information, which pushed companies to invest more in prevention. The model has been adopted in other states and could be used more widely.

In addition, regulators could require large companies to undergo an annual information-security audit by an accredited third party, similar to financial audits for listed companies. Information about vulnerabilities would be kept confidential, but it could be used by firms to improve their practices and handed to regulators if problems arose. It could even be a requirement for insurance coverage, allowing a market for information security to emerge.

Current rules on digital records state that data should never be stored for longer than necessary because they might be misused or inadvertently released. But Viktor Mayer-Schönberger of the National University of Singapore worries that the increasing power and decreasing price of computers will make it too easy to hold on to everything. In his recent book “Delete” he argues in favour of technical systems that “forget”: digital files that have expiry dates or slowly degrade over time.

Yet regulation is pushing in the opposite direction. There is a social and political expectation that records will be kept, says Peter Allen of CSC, a technology provider: “The more we know, the more we are expected to know—for ever.” American security officials have pressed companies to keep records because they may hold clues after a terrorist incident. In future it is more likely that companies will be required to retain all digital files, and ensure their accuracy, than to delete them.

Processing data is another concern. Ian Ayres, an economist and lawyer at Yale University and the author of “Super-Crunchers”, a book about computer algorithms replacing human intuition, frets about the legal implications of using statistical correlations. Rebecca Goldin, a mathematician at George Mason University, goes further: she worries about the “ethics of super-crunching”. For example, racial discrimination against an applicant for a bank loan is illegal. But what if a computer model factors in the educational level of the applicant’s mother, which in America is strongly correlated with race? And what if computers, just as they can predict an individual’s susceptibility to a disease from other bits of information, can predict his predisposition to committing a crime?

A new regulatory principle in the age of big data, then, might be that people’s data cannot be used to discriminate against them on the basis of something that might or might not happen. The individual must be regarded as a free agent. This idea is akin to the general rule of national statistical offices that data gathered for surveys cannot be used against a person for things like deporting illegal immigrants—which, alas, has not always been respected.

Privacy rules lean towards treating personal information as a property right. A reasonable presumption might be that the trail of data that an individual leaves behind and that can be traced to him, from clicks on search engines to book-buying preferences, belong to that individual, not the entity that collected it. Google’s “data liberation” initiative mentioned earlier in this report points in that direction. That might create a market for information. Indeed, “data portability” stimulates competition, just as phone-number portability encourages competition among mobile operators. It might also reduce the need for antitrust enforcement by counteracting data aggregators’ desire to grow ever bigger in order to reap economies of scale.

Ensuring the integrity of the information is an important part of the big-data age. When America’s secretary of state, Hillary Clinton, lambasted the Chinese in January for allegedly hacking into Google’s computers, she used the term “the global networked commons”. The idea is that the internet is a shared environment, like the oceans or airspace, which requires international co-operation to make the best use of it. Censorship pollutes that environment. Disrupting information flows not only violates the integrity of the data but quashes free expression and denies the right of assembly. Likewise, if telecoms operators give preferential treatment to certain content providers, they undermine the idea of “network neutrality”.

Governments could define best practice on dealing with information flows and the processing of data, just as they require firms to label processed foods with the ingredients or impose public-health standards. The World Trade Organisation, which oversees the free flow of physical trade, might be a suitable body for keeping digital goods and services flowing too. But it will not be quick or easy.