Tuesday, April 28, 2015
To me, real Cloud and real Big Data start with using scalable PaaS OS, like Azure or AWS. If you got no PaaS, don't talk cloud and big data.
My first Azure bill, despite MSDN subscription was above $150. I run a 24/7 application server on a slow VM there, where majority of the cost comes from, but small experiments, like with the VPN for example, added $25 before I noticed that it's really expensive. That was unpleasant.
Now, I want to experiment with something other than what I can run on local Azure simulator (Tables, Blobs and Queues). I would like to play with ESB, API Management, DocumentDB and just about everything else, but I am really concerned about how much cost I am going to incur by using each part of the stack, especially if I forget to power everything down on Azure for a night. The point is, my impression is that simple poking around and learning Azure stuff is rife with unpleasant and costly surprises. All these risks and impediments combine into a single big turn off for a potential Azure developer.
With Azure lagging in adoption behind AWS, slapping down even dedicated fans like myself, means that Microsoft is very unlikely to ever catch up with AWS. "Developers! Developers! Developers!" indeed. I know it is possible to plan the cost ahead using complex Excel spreadsheet, and there is an almost-meaningless 30-day free trial (learn new enterprise OS in 30 days, really??), but all these impediments add up to one huge fogget-bout-it!
Here's what Microsoft could do to make it easy for me to learn Azure: create closed-off sand-box on Azure infrastructure, inaccessible from the outside, but make all Azure stack technologies and components available there for free to all MSDN subscribers. Let us play with and learn Azure for free, and charge us only when our wares is made available public. That would be a huge differentiator for Azure.
If you agree with the statement above, talk at Microsoft about this.
But while Microsoft is ignoring us, I decided to do my part, in the upcoming weeks and months will do public service so to speak, and will pay out of my pocket for learning Azure, and will blog about my experience, including costs. I will tag those posts with "azure" and "paas". Follow this blog to take pleasure in my future misfortunes on the path of conquering Azure.
Monday, April 27, 2015
On a PC, hit F11 to put your browser to full-screen and then click on the picture to see it in the album.
|Rain in Wyoming
|Paris, of course
|Caminito, Buenos Aires
|Key Largo, FL
Sunday, April 26, 2015
Most programmers, myself included, are notoriously bad at estimating how long it will take to finish the job. So after much research I am ready to finally make the estimates very precise and scientific: whatever number programmer gives you, use The Vlad Law and multiply the estimate by PI. Not by 3 - that would be just a worthless guess. No, multiply it by 3.1415 - that's the objective truth, I tell ya!
Saturday, April 25, 2015
That struck me as very odd and made me think: in my day-to-day life even mid-level developers usually have decent grasp of what transactions are for and often don't need supervision in applying them, as long as we are talking about SQL programming or writing data access layer of business applications. But as soon as people leave database world, somehow even professional enterprise software integrators become completely unconcerned about transactions. That matched my experience of virtually every enterprise system I ever encountered having lots of garbage data in it, requiring lots of effort/money to cleanse data. But the conclusion was inescapable: by and large, as a practical matter, corporations are pretty comfortable with not having transactions guarding integrity of their data. As a matter of fact, companies don't care about their data consistency.
Now, my thinking went like this: if people voluntarily give up transactions without getting anything in return, what can be gained if transactions are avoided not by neglect, but instead, by design? Well, if we believe CAP theorem, letting go of data consistency should let us gain high availability and partition tolerance, which translates into high scalability. And, ladies and gentlemen, that's what "big data" management systems offer: high scalability and high availability if you can deal with eventual consistency of data. And since lots of companies are not even trying to achieve data consistency, switching to NoSQL-based "big data" platforms becomes a no-brainer.
Now, NoSQL and "big data" have becomes such an incredibly abused buzzwords, that I need to stop for a moment and state that, for example, MongoDB, in my opinion, although a very fast NoSQL data management system, is not necessarily a *bit data* system, because it was not designed to be one - it was built for speed and sacrificed parts of CAP to achieve high performance. Then let's look at Hadoop. No question that's a big data management system. But the main problem is that it's not for on-line data processing - it's strictly batch processing using map/reduce approach. And if you want to set up Hadoop cluster, it's a pretty expensive proposition.
All that said, I argue that first truly useful general-purpose big data on-line processing data management system was Amazon AWS DynamoDB. It has eventual consistency, no transactions, limited returned data set size, and other limitations, but it scales in a nearly linear manner. Then Microsoft came up with Azure Tables and now Document Database. Even though you may say that eventual consistency in not really what data online processing is, I say these systems latency is tolerable enough that these systems could be considered pseudo-online.
Now lets review the landscape again: transactions are abandoned, non-transactional highly-scalable data management systems are available as a part of PaaS stacks from Amazon and Microsoft, so... there is pretty much no reason to have your data processing strategy depend completely on ACID databases. Moreover, if we, developers, train ourselves to deal with more complex DAL tiers underpinned by Azure and AWS eventually-consistent big data engines, there is no real reason to use ACID databases as a default position, which is equal to "everyone, to the cloud!"