Kotlin static code analysis using Detekt

It addresses weaknesses in the source code, this could be achieved through manual code reviews; however, using automated tools is much more effective and a preferred way. For example, in the below…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Running out of RAM in Tensorflow?

Using Deephaven Tables to Back Large Tensorflow Datasets

By Matthew Runyon

Deephaven can be used to store and manipulate large amounts of data quickly and efficiently Some sets of data can exceed hundreds of gigabytes and the amount of memory a server would have. This could lead to some issues with analyzing your data with an external library such as Tensorflow.

In this article, we will cover how to utilize generators to use a Deephaven table that is too big to fit in memory with Tensorflow.

A generator is a function which yields a value of some sort and may be infinite. One benefit of generators is they can be used when memory is a concern and the values follow some pattern. Generators can be thought of as a function which is paused every time it yields a value until somebody asks the function for another value.

Conveniently, Tensorflow supports the use of generators as a data source in addition to loading a data source in memory or from a directory of files. The benefit of using a generator or directory of files is that the size of the data can exceed the amount of available memory without causing errors. Generators are a good choice since we can use the Deephaven Query Language to quickly filter our data and then access it from the table. One important thing to note is Tensorflow requires generators to return both the feature and labels for a set of data, so the syntax when training a network is slightly different.

In an earlier article, we covered how to utilize different libraries and data structures with Deephaven tables. From this, we can quickly convert a Deephaven table in Python to a Tensorflow tensor. The only problem is the data we care about in this article is too big to fit in memory, so we would just get an error that we ran out of memory if we tried to load the entire table. We could load the table 1 row at a time, but that would be extremely slow and not utilize the memory we have.

We can solve this by loading the table in chunks. Since generators can keep some state, we could just set a chunk size which we know will fit in memory and then loop until we run out of rows. This seems like a good solution except for one part: Tensorflow treats each yielded generator value as a batch and large batch sizes are bad for generalization of a neural network.

Since we want a small batch size, we could just set our chunk size to something like 32 rows. This would alleviate the problems caused by training with large batch sizes, but we again run into the issue of loading tiny amounts of data from our table being inefficient.

Instead, we can use our generator to cache a large section of our table in RAM while yielding small batches from the cached rows. With this we can get the speed benefit of loading millions of rows at a time without the drawback of training our neural network on giant batch sizes.

Combining these ideas and accounting for cases where our batch size and cache size do not align (we still want to return a full batch until the very last batch), then we end up with something along the lines of the following code:

This may look a bit daunting, but it follows the logic described so far:

Deephaven is a great platform for storing and manipulating large amounts of data — so much data that sometimes you can’t fit it all in memory! Tensorflow benefits from large sets of data, so using Deephaven as the source of a big data set for Tensorflow is a natural connection. With the ideas and code shown in this article, you can easily use your giant Deephaven data sets to train a neural network in Tensorflow while keeping the benefits and speed of the Deephaven query language.

Add a comment

Related posts:

The only safe thing to talk about in Eritrea is Football

Having lived all my life in Eritrea, I left the country in January 2012. Some European countries have recently claimed the situation in Eritrea has improved in order to justify accepting less…

I Have a Choice

I knew getting back into the swing of things would be difficult for me. I have thrived in this time of stillness. I’ve given my brain a great deal of attention over the past five months. At some…

Bertahanlah

Aku mengkhawatirkan diriku di kemudian hari jika harus kecewa atas keputusan pergi yang kuambil hari ini, untuk itu aku bertahan. Mengingat “Kapanpun kamu merasa ingin menyudahi, maka ingatlah alasan…