Posted:
The emergence of affordable high IOPS storage, such as Google Compute Engine local SSDs, enables a new generation of technologies to re-invent storage. Helium, an embedded key-value store from Levyx, is one such example -- designed to scale with multi-core CPUs, SSDs, and memory efficient indexing.

At Levyx, we believe in a “scale-in before you scale-out” mantra. Often times technology vendors advertise scale-out as a way to achieve high performance. It is a proven approach, but it is often used to mask single node inefficiencies. Without a well balanced system where CPU, memory, network, and local storage are properly balanced, this is simply what we call “throwing hardware at the problem”. Hardware that, virtual or not, customers pay for.

To demonstrate this, we decided to check Helium’s performance on a single node on Google Cloud Platform with a workload similar to the one previously used to showcase Aerospike and Cassandra (200 byte objects and 100 million operations). With Cassandra, the data store contained 3 billion indices. Helium starts with an empty data store. The setup consists of:

  1. Single n1-highcpu-32 instance -- 32 virtual CPUs and 28.8 GB memory.
  2. Four local SSDs (4 x 375 GB) for the Helium datastore. (Note: local-SSDs is limited in terms of create time flexibility and reliability compared to persistence-disks, but the goal of this blog post is to test with highest performing GCP IO devices).
  3. OS: Debian 7.7 (kernel 3.16-0.bpo.4-amd64, NVMe drivers).
  4. The gists and tests are on github.

Scaling and Performance with CPUs

The test first populates an empty datastore followed by reading the entire datastore sequentially and then randomly. Finally, the test deletes all objects. The 100 million objects are in memory with persistence on SSD, which acts as the local storage every replicated system requires. The total datastore size is kept fixed.
image (8).png
Takeaways
  • Single node performance of over 4 Million inserts/sec (write path) and over 9 Million gets/sec (read path) with persistence that is as durable as the local SSDs.
  • 99% (in memory) latency for updates < 15 usec, and < 5 usec for gets.
  • Almost linear scaling helps with the math of provisioning instances.

Scaling with SSDs and Pure SSD Performance

Cloud Platform provides high IOPS, low latency local SSDs. To demonstrate a case where data is read purely from SSDs (and not take advantage of memory), let’s run the same benchmark with 4K object size x 5 million objects, and reduce Helium’s cache to a minimal 2% (400 MB) of total data size (20GB). Only random gets performance is shown below because it is a better stress test than sequential gets.

Take aways:
  • Single node SSDs capable of updates at 1.6 GB/sec (400K IOPS) and random gets at 1.9 GB/sec (480K IOPS).
  • IOPS scaling with SSDs.
  • Numbers comparable to fio, a pure IO benchmark.
  • With four SSDs and 256 threads, median latency < 600 usec, and 95% latency < 2 msec.
  • Deterministic memory usage (< 1GB) by not relying on OS page caches.image (10).png
image (12).png
image (11).png

Cost Analysis

The cost of this Google Cloud Platform instance for one hour is $1.22 (n1-highcpu-32) + $0.452 (4 x Local SSD) = $1.67. Based on 200-byte objects, this boils down to:

  • 2.5 Million updates per dollar
  • 4.6 Million gets per dollar

To put this in perspective, New York’s population is ~8.4 million; therefore, you can scan through a Helium datastore containing everyone’s record (assuming each record is less than 200 bytes. Eg: name, address and phone) in one second on a single Google Cloud Platform instance for under $2 per hour.

Summary

Helium running on Google Cloud Platform commodity VMs enables processing data at near memory speeds using SSDs. The combination of Cloud Platform and Helium makes high throughput, low latency data processing affordable for everyone. Welcome to the era of dollar store priced datastores at enterprise grade reliability!

For details about running Helium on Google Cloud Platform, contact info@levyx.com.

- Posted by Siddharth Choudhuri, Principal Engineer at Levyx

Posted:
Today's guest post comes from our friends at Tableau: Jeff Feng, Product Manager & Ellie Fields, Vice President of Product Marketing. Tableau, a Google Cloud Platform partner,  is a leader of interactive data visualization software.

“It’s a beautiful thing when best-of-breed technologies Tableau, Google BigQuery and Twitter come together to operate seamlessly in concert with one another.” - Jeff Feng

Next, a Google Cloud Platform Series
Over the month of June, the Tableau team traveled around the world with the Google Cloud Platform team as a proud sponsor of Next, a Google Cloud Platform event series. The teams made stops in New York, San Francisco, Tokyo, London, and Amsterdam where attendees learned about the latest services and features on the platform, and fellow developers and IT professionals shared how they are using Google Cloud Platform to move from idea to an application and/or decision quickly.
Ellie presented a joint demo on Twitter during the Data & Analytics Talk at Next, New York City (Left).  Jeff discussed the activity of Tweets around #GCPNext in Amsterdam (Right).

Visualizing Streamed Tweets with Tableau, Google BigQuery & Twitter
As a part of our presence at the events, we wanted to develop a live demo that highlighted and showcased our technologies. Google BigQuery has the ability to process petabytes of data within seconds and ingest data rapidly. Tableau’s live connectivity to BigQuery enables users to create stunning dashboards within minutes with our drag-and-drop interface, extending the usefulness of BigQuery to all users. For this demo, we decided to visualize  real-time Tweets from Twitter about the #GCPNext conference series.
Overall architecture for visualizing streamed Tweets in BigQuery using Tableau.

We worked together with our friends at Twitter (@TwitterDev) who developed an open-source connector called Twitter-for-BigQuery that streams Tweets directly into BigQuery.  Additionally, the connector can retrieve the last 30 days of data for the defined Tweet stream.  The APIs for the connector are provided by Gnip, which offers enterprise-grade access and filtering for the full Twitter stream. The connector enables users to define the filters for certain hashtags and usernames, and consequently streams tweets matching these filters in real time directly into BigQuery using the Tabledata.insertAll method. For the purposes of our demo, our Tweet stream included hashtags such as #bigdata, #IoT, and #GCPNext as well as usernames such as @Google.

Once the data lands in BigQuery’s tables, the data may be accessed using super-fast, SQL-like queries using the processing power of Google’s infrastructure. Google provides a console with a command line interface that’s great for analysts and developers who know how to write SQL. Tableau enhances the joint solution by providing a drag-and-drop visual interface to the data so that anybody can use it. Plus our live native connector to Google using the BigQuery REST API means a user can leverage our interface while optimized against Google’s massive infrastructure.  Additionally, Tableau and the Google BigQuery team have co-published a best practices whitepaper to help you maximize the value of our joint solution.

Using Tableau Desktop, we connected to the data and built the dashboard below, enabling users to search for keywords within the filtered Tweet stream. Then we published the live data connection to BigQuery and the dashboard to Tableau Online, our hosted analytics platform. Tableau Online is the perfect compliment to BigQuery because the solution is completely no-Ops and maintenance-free. It also supports a live connection to Google BigQuery.

Not only does the dashboard show the overall number of Tweets in the stream and the percentage occurrence of the keyword by date, but you can also visualize the actual Tweet itself by hovering over the marks in the scatter plot below.
Interactive Tableau Online dashboard visualizing live streamed Tweets in Google BigQuery.

In the video below, Ellie shares how you can interact with the Tableau Online visualization we created as well as build a new visualization using the live data connection to BigQuery directly from Tableau Online.

Demo video featuring Tableau Online visualizing live streamed Tweets in Google BigQuery.

What’s Up Next?
At Tableau, we believe that the future of data is in the cloud. We love how Google is innovating on cloud infrastructure and building the cloud services of tomorrow today. That’s why we recently announced a new named connector to Google Cloud SQL. The connector moves Google Cloud Platform and Tableau Online customers one step closer to being able to both host and analyze data completely in the cloud. This connector also compliments our existing native connectors to Google BigQuery and Google Analytics. In the future, we are committed to building broader and deeper integrations with Google to delight our users.

Try It For Yourself!
The beautiful thing about this demo is that the technologies used in the solution are easy to use. To learn more and try it for yourself, please see the following links below:


- Posted by Jeff Feng, Product Manager, and Ellie Fields, VP of Product Marketing, both at Tableau.

Posted:
Do you want the power and flexibility of public cloud, but are concerned about losing control over data security? We can help. Security is at the core of Google’s architecture - we’ve spent years developing one of the world’s most advanced and secure infrastructures.  We’re committed to providing you great security, and giving you more control over how you manage security on Google Cloud Platform.

Today, we are adding Customer-Supplied Encryption Keys for Google Compute Engine in beta, which allow you to bring-your-own-keys to encrypt compute resources. Google Compute Engine already protects all customer data with industry-standard AES-256 bit encryption. Customer-Supplied Encryption Keys marries the hardened encryption framework built into Google’s infrastructure with encryption keys that are owned and controlled exclusively by you. You create and hold the keys, you determine when data is active or at rest, and absolutely no one inside or outside Google can access your at rest data without possession of your keys. Google does not retain your keys, and only holds them transiently in order to fulfill your request.

Customer-Supplied Encryption Keys are now available in beta in select countries. Starting today, you can access Customer-Supplied Encryption Keys through our API, our Developers Console, and our command-line interface, gcloud. This new functionality is currently rolling out to the Free Trial and will be available soon.

Customer-Supplied Encryption Keys provides you unprecedented control over encryption in the public cloud:

  • Secure: All of your compute assets are encrypted using the industry-leading AES-256 standard, and Google never retains your keys, meaning Google cannot decrypt your data at rest.
  • Comprehensive: Unlike many solutions, Customer-Supplied Encryption Keys cover all forms of data at rest for Compute Engine, including data volumes, boot disks, and SSDs.
  • Fast: Google Compute Engine is already encrypting all of your data at rest, and Customer-Supplied Encryption Keys gives you greater control, without additional overhead.
  • Included Free: We feel that encryption should be enabled by default for cloud services; we’re not going to charge you more for the option to bring your own keys.

"Google Compute Engine gives us the performance and scale to process high-volume transactions in the financial markets. With Customer-Supplied Encryption Keys, we can independently control data encryption for our clients without incurring additional expenses from integrating third-party encryption providers. This control is critical for us to realize the price/performance benefits of the cloud in a highly regulated industry."  
Neil Palmer, CTO of Sungard Consulting Services

Security is as much about control as it is about data protection. With Customer-Supplied Encryption Keys, we are giving you control over how your data is encrypted with Google Compute Engine. Keep in mind, though, if you lose your encryption keys, we won’t be able to help you recover your keys or your data - with great power comes great responsibility!  

Retain control while taking advantage of the cloud. Try Customer-Supplied Encryption Keys and let us know how it’s going on the Google Compute Engine forum. We love hearing from you.

- Posted by Leonard Law, Product Manager

Posted:
At our NEXT events series this summer, I had an opportunity to talk with developers all over the world about what they’re struggling with, and one of the issues that’s always top of mind when evaluating cloud is cost. Not just list prices or discounts — what customers really want to understand is the long-term view of true cloud economics and the fiscal ramifications attached to the myriad of different ways you can build systems today. 
It turns out that even seasoned experts with years of infrastructure experience under their belts can often find the flexibility and performance/price ratio of cloud services extraordinarily difficult to compare to what they’re familiar with on-premise. There are a lot of very standard behaviors (capacity planning, supply chain management, real estate/facilities operations, decommissioning, etc) that are simply irrelevant in a cloud environment, and it can be difficult to recognize the degree to which unlearning these behaviors changes the game. 
My team has taken several approaches to help customers understand their costs in the cloud. What these efforts focus on is providing context and clarity. For example: Google Cloud Platform’s finely-grained, on-demand, pay-as-you-go, zero lock-in model is just different than other clouds, and helping customers understand those differences helps them make better choices. Or another: Google Cloud Platform provides cutting edge technology built on the incredible power of distributed software, machine learning, and fast, fast, fast gear, which delivers spectacular performance, and helping customers harness that performance for their applications is what my team is all about. We combine these two, a great model and great tech, at an outrageously low total cost of ownership; now that’s a design for success.
We’ve built an easy-to-use Pricing Calculator to help you estimate your monthly Google Cloud Platform bill and TCO tools for both compute and storage. We’ve also written several blog posts diving into specific examples where we can really compare and see how nuanced changes to implementations can make a big difference in TCO. And we’re not done: we’re continuing to update our pricing page and build tools that help customers optimize their systems for cost. Keep the feedback coming!
But our approach to price/performance goes well beyond just ensuring we have incredibly competitive list prices. We focus on delivering technical leadership in cloud economics, with innovations like automatic sustained use discounts, per-minute billing and flat-price preemptible VM’s.
I play sousaphone, so I suppose it’s not too surprising that from time to time I’ll toot my own horn, but we definitely know that third party validation of these facts is crucial for any wise business or technical decision maker.  Do not take all of your car advice from the Tesla engineer! 
Enterprise Strategy Group, an independent analyst firm with years of experience evaluating the price/performance of technology systems took a close look at the tools we’ve built and the statements we’ve made about TCO on Google Cloud Platform.   These aren’t easy things to evaluate, but they took a very painstaking, deliberate, systematic approach to digesting our service, and following their own modeling and assumption setting (TCO evaluations are as much an exercise in constructing a justified framework of assumptions as they are in doing the math) have delivered a whitepaper analysis of TCO for Google Cloud Platform. For clarity, we commissioned and paid ESG to do this analysis but the method of evaluation was decided by them.
Google Cloud Platform InstaGraphic June 2015.png
In many of the ways that we originally asserted our TCO advantage — and in a few new ones that ESG Lab has highlighted — this analysis validates our earlier work, which frankly is a big relief!  It’s always nice to have bright, inquisitive folks check your work and give you the two thumbs up. 
I encourage folks wrestling with understanding the complexities of cloud TCO to take a look at ESG Lab's work; I found it easy to understand, clear, and factually dense.  Let us know what you think, and what other kinds of comparisons and evaluations would be useful for your decision-making process.

- Miles Ward, Global Head of Solutions, Google Cloud Platform