Posted:

Fun fact: around 170 million taxi journeys occur across New York City yearly, holding vast amounts of information each time someone steps in and out of one of those bright yellow cabs. How much information exactly? Being a not-so-secret maps enthusiast, I made it my challenge to visualize a NYC taxi dataset on Google Maps.

Anyone who’s tried to put a large amount of data points on a map knows about the difficulties one faces when working with big geolocation data. That's why I want to share with you how I used Cloud Dataflow to spatially aggregate every single pick-up and drop-off location with the objective of painting the whole picture on a map. For background info, Google Cloud Dataflow is now in alpha stage and can help you gain insight into large geolocation datasets. You can try experimenting with it by applying for the alpha program or learn more with yesterday's update.

When I first sat down to think through this data visualization, I knew I needed to create a thematic map, so I built a simple pipeline that was able to geofence all the 340 million pick-up and drop-off locations against 342 different polygons that resulted from converting the NYC neighbourhood tabulation areas into single-part polygons. You can find the processed data in this public BigQuery table. (In order to access BigQuery you need to have at least one project listed in your Google Developers Console. After creating a project you can access the table by following this link.)
Thematic map showing the distribution of taxi pick-up locations in NYC in 2013. Midtown South is New Yorkers’ favourite area to get a cab with almost 28 million trips starting there, which is roughly 1 trip per second. You can find an interactive map here.

This open data, released by the NYC Taxi & Limo Commission, has been the foundation for some beautiful visualizations. By utilizing the power of Google Cloud Platform's tools, I’ve been able to spatially aggregate the data using Cloud Dataflow, and then do ad hoc querying on the results using BigQuery, to gain fast and comprehensive insight into this immense dataset.

With the Google Cloud Dataflow SDK, which parallels the data transformations across multiple Cloud Platform instances, I was able to build, test and run the whole processing pipeline in a couple of days. The actual processing, distributed across five workers, took slightly less than two hours.

The pipeline’s architecture is extremely simple. Since Cloud Dataflow offers a BigQuery reader and writer, most of the heavy lifting is already taken care of. The only thing I had to provide was the geofencing function that could be parallelised across multiple instances. For a detailed description on how to do complex geofencing using open source libraries see this post on the Google Developers Blog.

When executing the pipeline, Cloud Dataflow automatically optimizes your data-centric pipeline code by collapsing multiple logical passes into a single execution pass and deploys the result to multiple Google Compute Engine instances. At the time of deploying the pipeline you can read in files from Google Cloud Storage that contain data you need for your transformations, e.g., shapefiles or GeoJSON formats. Alternatively you can call an external API to load in the geofences you want to test against.

I utilized an API I built on App Engine which exposes a list of geofences stored in Datastore. Using the Java Topology Suite I created a spatial index maintained in a class variable in the memory of each instance for fast querying access.

Distributed across five workers, Cloud Dataflow was able to process an average of 25,000 records per second, each record having two locations, ploughing through more than 170 million table rows in just under two hours. The amount of workers can be flexibly assigned at the time of deployment. The more workers you use, the more records can be processed in parallel, the faster the execution of your pipeline.
The interactive Cloud Dataflow graph of your Pipeline, helping you to monitor and debug your Pipeline in your Google Developer Console in the browser.
Having the data preprocessed and written back into BigQuery, we were then able to run super fast queries over the whole table answering questions like, “where do the best-paid trips start from?”.

Unsurprisingly they start from JFK airport with an average fare of $46 and an average tip of 20.7%*. Okay, this is probably not a secret, but did you know that, even though the average fare from LGA airport is $15 less, there are roughly 800,000 trips more starting from LGA? And with 22.2%*, passengers from LGA airport actually tip best. *As cash tips aren’t reported, only 52% of trips have a tip noted, therefore the values regarding tips could be inaccurate.

Most of the taxi trips start in Midtown-South (28 million) with an average fare of $11. Carnegie Hill in the Upper East Side comes fourth with 12 million pick-ups, however these trips are fairly short. Journeys that start there mostly stay in the Upper East Side and therefore only generate an average fare of $9.80. Here's an interactive map visualizing where people went to, what they paid on average and how they tipped at and some other visualizations of of how people tip from where:




The processed data is publicly available in this BigQuery table. You can find some interesting queries to run against this data in this gist.

Though NYC taxi cab journeys may not seem to amount to much, they actually that conceal a ton of information, which Google Cloud Dataflow, as a powerful big data tool, helped reveal by making big data processing easy and affordable. Maybe I'll try London's black cabs next.

- Posted by Thorsten Schaeff, Sales Engineer Intern

Posted:
The value of data lies in analysis -- and the intelligence one generates from it. Turning data into intelligence can be very challenging as data sets become large and distributed across disparate storage systems. Add to that the increasing demand for real-time analytics, and the barriers to extracting value from data sets becomes a huge challenge for developers.

In June 2014, we announced a significant step toward a managed service model for data processing. Aimed at relieving operational burden and enabling developers to focus on development, Google Cloud Dataflow was unveiled. We created Cloud Dataflow, which is now currently an alpha release, as a platform to democratize large scale data processing by enabling easier and more scalable access to data for data scientists, data analysts and data-centric developers. Regardless of role or goal - users can discover meaningful results from their data via simple and intuitive programing concepts, without the extra noise from managing distributed systems.

Today, we are announcing availability of the Cloud Dataflow SDK as open-source. This will make it easier for developers to integrate with our managed service while also forming the basis for porting Cloud Dataflow to other languages and execution environments.

We’ve learned a lot about how to turn data into intelligence as the original FlumeJava programming models (basis for Cloud Dataflow) have continued to evolve internally at Google. Why share this via open source? It’s so that the developer community can:
  • Spur future innovation in combining stream and batch based processing models: Reusable programming patterns are a key enabler of developer efficiency. The Cloud Dataflow SDK introduces a unified model for batch and stream data processing. Our approach to temporal based aggregations provides a rich set of windowing primitives allowing the same computations to be used with batch or stream based data sources. We will continue to innovate on new programming primitives and welcome the community to participate in this process.
  • Adapt the Dataflow programming model to other languages: As the proliferation of data grows, so do programming languages and patterns. We are currently building a Python 3 version of the SDK, to give developers even more choice and to make dataflow accessible to more applications.
  • Execute Dataflow on other service environments: Modern development - especially in the cloud - is about heterogeneous service and composition. Although we are building a massively scalable, highly reliable, strongly consistent managed service for Dataflow execution, we also embrace portability. As Storm, Spark, and the greater Hadoop family continue to mature - developers are challenged with bifurcated programming models. We hope to relieve developer fatigue and enable choice in deployment platforms by supporting execution and service portability.

We look forward to collaboratively building a system that enables distributed data processing for users from all backgrounds. We encourage developers to check out the Dataflow SDK for Java on GitHub and contribute to the community.

Interested in adding to the Cloud Dataflow conversation? Here’s how:


- Posted by Sam McVeety, Software Engineer

Posted:

What do you get when you combine a group of engineers obsessed about cutting edge technology and add a hint ton of geek? A bunch of tech enthusiasts that make up the Developer Advocate Team at Google. You may have already seen some of our work or seen us speak.

We love helping make all of you as successful as possible as you build apps that take full advantage of everything that Google Cloud Platform has to offer. We like talking to you, but even more than that, we like to listen to your feedback. We want to be your voice to the Google Cloud Platform product and engineering teams and use what we hear to help create the best possible developer experience.

You’ll often meet us at technology events (conferences, meetups, user groups, etc.), where we talk about the many products and technologies that get us excited about coming to work everyday. If you do see us, don’t be shy--come say hi!

Ask us anything and everything regarding Google Cloud Platform on Twitter and learn more through our videos on the Google Developers and Google Cloud Platform channels.

Without further ado, please meet your friendly neighborhood Cloud Developer Advocates!


Aja Hammerly
@thagomizer_rb
Aja just joined Google as a Developer Advocate. Before Google she spent 10 years working as an engineer building websites at a variety of web companies. She came to Google in order to help people use Google's amazing cloud resources effectively on their own projects.

Fun Fact
Aja learned to solve a Rubik's Cube by racing the build at her first dev job.


Brian Dorsey
@briandorsey+BrianDorsey
Brian Dorsey aims to help you build cool stuff with our APIs and focuses on Kubernetes and Containers. He loves Python and taught it at the University of Washington. He’s spoken at both PyCon & PyCon Japan. Brian is currently learning Go and enjoying it.

Fun Fact
Brian speaks Japanese.



David East
@_davideast
David is passionate about creating resources and speaking about them to help educate developers. A military brat, David has moved over a dozen times in his life.

Fun Fact
David once broke his leg in the middle of the wilderness and had to crawl back to civilization.




Felipe Hoffa
@felipehoffa, +FelipeHoffa
Felipe Hoffa is originally from Chile and joined Google as a Software Engineer. Since 2013 he's been a Developer Advocate on big data - to inspire developers around the world to leverage Google Cloud Platform tools to analyze and understand their data in ways they could never before. You can find him in several YouTube videos, blog posts, and conferences around the world.

Fun Fact
He once went to the New York Film Academy to produce his own 16mm short films.




Francesc Campoy Flores
@francesc+FrancescCampoyFlores, Site
Francesc Campoy Flores focuses on Go for Google Cloud Platform. Since joining the Go team in 2014, he has written several didactic resources and traveled the world attending conferences, organizing live courses, and meeting fellow Go-phers. He joined Google in 2011 as a backend software engineer working mostly in C++ and Python, but it was with Go and Cloud Platform that he re-discovered how fun programming can be.

Fun Fact
Francesc celebrated his 30th birthday riding a bike wearing a red tutu from San Francisco to Los Angeles.


Greg Wilson
@gregsramblings+GregWilsonDev, Site
Greg Wilson leads the Google Cloud Platform Developer Advocacy team and has over 25 years of software development experience spanning multiple platforms, including cloud, mobile, web, gaming, and various large-scale systems.

Fun Fact
Greg is a part-time pro-photographer and a struggling jazz piano player.


Jenny Tong
@baconatedgeek+JennyMurphy, Site
Jenny comes from the Firebase family at Google and helps developers build realtime stuff on all sorts of platforms. If she's away from her laptop, she's probably skating around a roller derby track, or hanging from aerial silk.

Fun Fact
Jenny once ate discount fugu puffer fish from a supermarket. It was priced less than $0.10 per piece. Somehow, she survived.


Julia Ferraioli
@juliaferraioli, +JuliaFerraioli, Site
Julia helps developers harness the power of Google’s infrastructure to tackle their computationally intensive processes and jobs. She comes from an industrial background in software engineering and an academic background in machine learning and assistive technology.

Fun Fact

Julia once deleted her entire thesis with a malformed regular expression, which she blames on lack of sleep and bad coffee. One good night's sleep outside the sysadmin's door restored it from the tape backup, and luckily only a couple of paragraphs were lost!


Kazunori Sato
@kazunori_279+KazunoriSato
Kazunori Sato recently joined the team after working as a Cloud Platform Solutions Architect for 2.5 years. During that time, he has produced over 10 solutions and has been hosting the largest Google Cloud Platform community event in Japan for the past 5 years, as well as hosting Docker Meetup in Tokyo. He will be one of our resident experts in Japan on BigQuery, BigData, Docker, Kubernetes, mBaaS and IoT.

Fun Fact
Kaz’s hobby is playing with littleBits, RasPi, Arduino and FPGA and having fun connecting them to BigQuery.


Mandy Waite
@tekgrrl+MandyWaite, about.me
Mandy is working to make the world a better place for developers building applications for Cloud Platform. She came to Google from Sun Microsystems where she worked with partners on performance and optimisation of large scale applications and services before moving on to building an ecosystem of Open Source applications for OpenSolaris. In her spare time she is learning Japanese and plays the guitar.

Fun Fact
Mandy has been studying Japanese for some time now, in the hopes of of one day working in Japan and travelling the country in search of Cicadas.


Ossama Alami
@ossamaalami+OssamaAlami
Ossama is focused on Firebase, making sure developers have a great experience building realtime apps on Google Cloud Platform. He has worked as a software engineer, consultant, developer advocate and engineering manager at a variety of small and big companies. Prior to Firebase he was Head of Developer Relations for Glass at Google[x]. In the winter he can be found snowboarding in the Sierras.

Fun Fact
Ossama has worked on 8 different Google developer products: Ads APIs, Geo APIs, Android, Commerce APIs, Google TV, Chromecast, Glass and now Firebase.


Paul Newson
@newsons_nybbles+PaulNewsonSite
Paul currently focuses on helping developers harness the power of Google Cloud Platform to solve their big data problems. Previously, he was an engineer on Google Cloud Storage. Before joining Google, Paul founded a startup which was acquired by Microsoft, where he worked on DirectX, Xbox, Xbox Live, and Forza Motorsport, before spending time working on machine learning problems at Microsoft Research.

Fun Fact
Paul is a private pilot.

Ray Tsang
@saturnism, +RayTsang, about.me
Ray had extensive hands-on cross-industry enterprise systems integration delivery and management experiences during his time at Accenture, managed full stack application development, DevOps, and ITOps.  Ray specialized in middleware, big data, and PaaS products during his time at Red Hat while contributing to open source projects, such as Infinispan. Aside from technology, Ray enjoys traveling and adventures.

Fun Fact
Ray has been posting at least one picture a day on Flickr since 2010.


Sara Robinson
@srobtweets
Sara joins Google from the Firebase family. She previously worked as an analyst at Sandbox Industries, a venture firm and startup foundry. She's passionate about learning to code, running, and finding the best ice cream in town.

Fun Fact
Sara wrote her senior thesis on Harry Potter, and enjoys finding ways to relate Harry Potter to almost anything.

Terrence Ryan
@tpryan+TerrenceRyan
Terrence (Terry) Ryan is a Developer Advocate for the Cloud Platform team. He has a passion for web standards and 15 years of experience working with both front- and back-end applications for both industry and academia.

Fun Fact
Before doubling down on technology in the early aughts, Terry was a semi-professional improv comic. 

-Posted by Greg Wilson, Head of Developer Advocacy

Posted:
If you’ve hunted for new office space for your company in recent years, you know what a nightmare it can be: dealing with quickly outdated spreadsheets and flyers, finding inaccurate data on listings, or even missing out on a great spot because it wasn’t listed properly. The commercial real estate industry today is technologically behind, and RealMassive aims to fix that.

RealMassive uses Google App Engine, Google Compute Engine, Google Cloud Storage, and Google Maps to bring transparency to the commercial real estate industry. The company gives its customers accurate up-to-the-minute digital real estate listings and eliminates conventional operating models. With more than 1 billion square feet of properties in their database, they’re well on their way to transforming an old industry.

Read our new case study on RealMassive here to learn more about how Cloud Platform helped the company achieve 1,360% growth in data in just three months.

-Posted by Chris Palmisano, Senior Key Account Manager

Posted:
Can you change the world for the better in 24-hours? That was the challenge 39 teams tackled at the Bayes Hack data-science challenge in November.

Bayes Impact is a Y Combinator-backed nonprofit which runs programs to bring data-science solutions to high impact social problems. In addition to a 12-month full-time fellowship supporting leading data scientists to work with civic and nonprofit organizations such as the Gates Foundation, Johns Hopkins and the White House, the organization runs an annual 24-hour hackathon to bring together data scientists and engineers to tackle social problems.

Starting from a set of 20 challenge problems proposed by government and non-profit organizations, teams drawn from the Silicon Valley’s top data-science talent applied their skills to finding impactful ways to use already available data to solve pressing social problems.

Google Cloud Platform sponsored the event with $500 Google Cloud Starter pack credit for each team, and a prize of $100K of Google Cloud Platform Credits to the winning team.

With only only 24 hours and large quantities of data to process, teams were able to leverage the power of tools such as Google Compute Engine and BigQuery to quickly chew through terabytes of information looking for ways to make meaningful impacts on people’s lives.

The winning team, comprised of five local Bay Area data scientists, used their data savvy and their Cloud Platform credits to identify prostitution rings by analyzing patterns of phone numbers and text in postings to adult escort websites. Using a cluster of Compute Engine nodes, the team processed a dataset provided by the non-profit group Thorn. They indexed 38,600 phone numbers and combined that with a heuristic phrase matching strategy to detect 143 separate networks or cells operating in the US.

“Realizing that it was going to take 76 days to process the data on a local laptop, we saw this as a place to use our Cloud Platform credits,” notes Peter Reinhardt, the lead for the winning team. “We found it really straightforward to get SSH access to our first compute instance right from the console. Once that was running, we were able to use that image to quickly bring up 10 machines, and went from nothing to a high powered compute cluster in just over half an hour.”

Paul Duan, President of Bayes Impact, observed that Cloud Platform “enabled the participants to get going quickly and focus on their application without having to spend too much time setting up infrastructure.”

It is estimated that 100,000 to 300,000 children are at risk of commercial sexual exploitation in the United States and one million children are exploited by the global commercial sex trade each year.* As the winning entry, the team’s work will be adopted and expanded as a resident Bayes Impact project.

Companies use data-science and Google’s Big Data tools to quickly answer tough data-intensive questions. Bayes Impact and Google worked together to show what is possible when human and technology resources are brought to bear against social problems.


Posted by Preston Holmes, Google Cloud Platform Solutions Architect

*U.S. Department of State, The Facts About Child Sex Tourism: 2005.

Posted:
Today’s post is about Cloud Platform customer Akselos, a platform that enables engineers to design and assess critical infrastructure - such as bridges, buildings and aircraft - via advanced simulation software.

When you enter a tall office building or drive over a giant bridge, it’s likely you don’t think twice about the work that went into ensuring these massive structures stay standing.

Lucky for us, engineers answer myriad design questions before the structures are ever built: How thick do the beams need to be? How will different materials weather over time? Lucky for these engineers, software like Akselos helps answer these questions. And now, students around the world can use this same software when they participate in MIT’s massive open online course, Elements of Structure.

Akselos, which is built on Google Compute Engine, enables software-based large-scale simulations, allowing engineers to virtually prototype complex infrastructures-- keeping us all safe on those bridges.

Computational simulations are a key tool in all engineering disciplines today. The current industry-standard technology is called Finite Element Analysis (FEA). However, large-scale 3D FEA simulations are computationally intensive. It can be unfeasible to use FEA for many applications of practical interest, such as modeling large infrastructures like bridges, buildings, port equipment, offshore structures or airframes in full 3D detail. These types of simulations require amounts of RAM that often exceed the capacity of a desktop workstation (sometimes over a terabyte). Even if the simulation does fit in RAM, it may require hours or even days of computation time. If time is at a premium, 3D FEA of large-scale systems is too slow.

Akselos aims to make high-end simulation technology faster and easier to access. Its software is based on new algorithms (developed at MIT and other universities in the US and Europe over the past decade) that are 1000x faster than FEA for large-scale, highly detailed simulations. Fast response times are crucial in practice because engineers typically need to do hundreds or even thousands of simulations to perform studies for a piece of critical infrastructure, such as analyzing the vibrational characteristics of an entire gas turbine under all operating frequencies. With Akselos, studies like this can be completed within one day.

With Akselos, each simulation model is composed of hundreds or even thousands of components. And each component contains various properties (e.g. density, stiffness) or geometry (length, curvature, crack depth) that can be changed with the click of a button. In order to handle this giant data footprint, Akselos’s software runs on Google Cloud Platform and utilizes Google’s storage solutions as well as Replica Pools to scale its computing resources.


Akselos’s initial deployment on Google Compute Engine occurred when Dr. Simona Socrate, a Senior Lecturer in the Mechanical Engineering Department at MIT, decided to leverage its fast simulation technology to help students in her structural analysis course, 2.01x, on edX. Dr. Socrate wished to integrate simulation apps that run in the web browser into her course so students could explore subtle effects in structural mechanics in an interactive and visual way. Previous attempts to integrate simulations within university courses had been unsuccessful because the tools are typically too complicated for students to master.

Following Dr. Socrate’s direction, Akselos developed a series of WebGL browser apps to support the course’s learning experience. To handle the scale required for the 7,500 students who were signed up for the course, Akselos deployed the simulation back-end on Compute Engine. The apps were tested to sustain up to 15,000 simulation queries per hour at 99.9% uptime. The simulations ran on Google Compute Engine without a hitch during the 4 month course, with a very positive response from the students.

In parallel with the edX deployment, Akselos has opened up its cloud-based simulation platform, which is now used by a growing community of engineers around the world. The company aims to put powerful simulation technology into the hands of as many people as possible to enhance design and analysis workflows across many engineering disciplines. With the software deployed on Compute Engine, Akselos is well on its way to providing faster, easier, more detailed simulations for every engineer.

Posted:
Our customers, large and small, have put a number of things on their holiday wish lists, including better support of their Windows-based workloads, leveraging the performance and scale of Google datacenters. Today, we're releasing three additional enhancements to Google Compute Engine that make it a great place for customers to run highly performant Windows-based workloads at scale.

First, we’re happy to offer Microsoft License Mobility for Google Cloud Platform. This enables our customers to move their existing Microsoft server application software licenses, such as SQL Server, SharePoint and Exchange Server, from on-premises to Google Cloud Platform without any additional Microsoft software licensing fees. Not only does license mobility make the transition easier for existing customers, it provides customers who prefer to purchase perpetual licenses the ability to continue doing so while still taking advantage of the efficiencies of the cloud. You can learn more about Microsoft License Mobility for Google Cloud Platform here. Use of Microsoft products on Google Compute Engine is subject to additional terms and conditions (you can view the Google Cloud Platform service terms here).

Second, Windows Server 2008 R2 Datacenter Edition is now available to all Google Cloud Platform customers in beta on Google Compute Engine. We know our customers run some of their key workloads on Windows and want rapid deployment, high performance and the ability to stretch their datacenters to the cloud. And with awesome features like Local SSD (which also supports live migration), and multiple ways to connect your datacenter to the cloud, Google Cloud Platform is the best place to run your Windows workloads. And just so you know, we are working on support for Windows Server 2012 and 2012 R2, we’ll have more on this soon!

And lastly, a version of the the popular Chrome RDP app from Fusion Labs optimized for Google Cloud Platform is now available for free to our customers for use with Windows in Google Compute Engine. This enables customers using the Chrome browser to create remote desktop sessions to their Windows instances in Google Compute Engine without the need for additional software by simply clicking on the RDP button in the Google Developer Console. In addition, because Google Developers Console stores and passes the login for the Windows credentials to the RDP app, customers are able to leave the complexity of managing unique user IDs and passwords for each Windows instance to Google.

We’re constantly amazed to see what our customers build and run on Google Cloud Platform, from high performance animated movie rendering to rapid scale distributed applications to near instant-on VMs to cloud bursting.

For example, IndependenceIT, a leading software provider of simplified IT management solutions for application and DaaS delivery, has been working to certify its Cloud Workspace Suite ("CWS") with Windows Server 2008 R2 Datacenter Edition running on Google Compute Engine. CWS is software that allows IT administrators to rapidly orchestrate and provision all elements necessary for automated, multi-platform, hypervisor/device agnostic workspaces for use with public, private or hybrid-cloud IT environments. The software offers a robust API set for ease of integration with existing customer business support systems, simplifying deployment while speeding time to market. IndependenceIT has been testing Windows on Google Compute Engine, and their customers will have the ability to use CWS to provision Windows Server based desktops and application deployments into Google Cloud Platform.

We’d love to hear feedback from our customers who use Windows, as well as how you’d like to see us expand support for the Windows ecosystem. What are you building next?

-Posted by Martin Buhr, Product Manager