215

I'm facing an issue with allocating huge arrays in numpy on Ubuntu 18 while not facing the same issue on MacOS.

I am trying to allocate memory for a numpy array with shape (156816, 36, 53806) with

np.zeros((156816, 36, 53806), dtype='uint8')

and while I'm getting an error on Ubuntu OS

>>> import numpy as np
>>> np.zeros((156816, 36, 53806), dtype='uint8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
numpy.core._exceptions.MemoryError: Unable to allocate array with shape (156816, 36, 53806) and data type uint8

I'm not getting it on MacOS:

>>> import numpy as np 
>>> np.zeros((156816, 36, 53806), dtype='uint8')
array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ...,

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)

I've read somewhere that np.zeros shouldn't be really allocating the whole memory needed for the array, but only for the non-zero elements. Even though the Ubuntu machine has 64gb of memory, while my MacBook Pro has only 16gb.

versions:

Ubuntu
os -> ubuntu mate 18
python -> 3.6.8
numpy -> 1.17.0

mac
os -> 10.14.6
python -> 3.6.4
numpy -> 1.17.0

PS: also failed on Google Colab

12
  • 2
    Are there other processes running in memory?
    – WiseDev
    Aug 15, 2019 at 9:52
  • hmmm. weird. That shouldn't taking that much memory. How much memory did it occupy on Macos?
    – WiseDev
    Aug 15, 2019 at 9:54
  • 1
    + it should occupy 35GB in ram, theoretically
    – ivallesp
    Aug 15, 2019 at 9:56
  • 1
    Unlikely, but you don't happen to be running a 32 bit Python interpreter in Ubuntu right?
    – javidcf
    Aug 15, 2019 at 10:04
  • 1
    np.zeros does not create a sparse matrix. There maybe a delay in filling in the zeros. But see stackoverflow.com/q/27464039
    – hpaulj
    Aug 15, 2019 at 11:17

8 Answers 8

197

This is likely due to your system's overcommit handling mode.

In the default mode, 0,

Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. The root is allowed to allocate slightly more memory in this mode. This is the default.

The exact heuristic used is not well explained here, but this is discussed more on Linux over commit heuristic and on this page.

You can check your current overcommit mode by running

$ cat /proc/sys/vm/overcommit_memory
0

In this case, you're allocating

>>> 156816 * 36 * 53806 / 1024.0**3
282.8939827680588

~282 GB and the kernel is saying well obviously there's no way I'm going to be able to commit that many physical pages to this, and it refuses the allocation.

If (as root) you run:

$ echo 1 > /proc/sys/vm/overcommit_memory

This will enable the "always overcommit" mode, and you'll find that indeed the system will allow you to make the allocation no matter how large it is (within 64-bit memory addressing at least).

I tested this myself on a machine with 32 GB of RAM. With overcommit mode 0 I also got a MemoryError, but after changing it back to 1 it works:

>>> import numpy as np
>>> a = np.zeros((156816, 36, 53806), dtype='uint8')
>>> a.nbytes
303755101056

You can then go ahead and write to any location within the array, and the system will only allocate physical pages when you explicitly write to that page. So you can use this, with care, for sparse arrays.

11
  • 4
    This is specifically a feature of the Linux kernel so there isn't nececssarily a direct equivalent on MacOS, though possibly something similar. I don't think it's as easy on Macs to twiddle kernel settings.
    – Iguananaut
    Aug 19, 2019 at 8:35
  • 3
    @Iguananaut what is the exact meaning of the "with care" warning? ie. What is the worst case scenario of something going wrong with this on an Ubuntu 18 server with GTX 1080 GPU? Sep 11, 2019 at 13:30
  • 1
    @mLstudent33 For one, this has nothing to do with your GPU, which has its own memory. All I mean is you can still fill up your memory--every time you write to some page in memory that page (typically 4k bytes) must be committed to physical memory. So the worst case scenario is you run out of memory.
    – Iguananaut
    Sep 11, 2019 at 15:12
  • 2
    Does this change take effect immediately or do we need to restart our shell or the machine itself?
    – dumbledad
    Mar 12, 2020 at 9:41
  • 4
    It takes immediate effect, but it will not persist beyond reboot without additional measures. Search other questions on how best to persist /proc/sys settings on your distribution.
    – Iguananaut
    Mar 13, 2020 at 17:20
134

I had this same problem on Window's and came across this solution. So if someone comes across this problem in Windows the solution for me was to increase the pagefile size, as it was a Memory overcommitment problem for me too.

Windows 8

  1. On the Keyboard Press the WindowsKey + X then click System in the popup menu
  2. Tap or click Advanced system settings. You might be asked for an admin password or to confirm your choice
  3. On the Advanced tab, under Performance, tap or click Settings.
  4. Tap or click the Advanced tab, and then, under Virtual memory, tap or click Change
  5. Clear the Automatically manage paging file size for all drives check box.
  6. Under Drive [Volume Label], tap or click the drive that contains the paging file you want to change
  7. Tap or click Custom size, enter a new size in megabytes in the initial size (MB) or Maximum size (MB) box, tap or click Set, and then tap or click OK
  8. Reboot your system

Windows 10

  1. Press the Windows key
  2. Type SystemPropertiesAdvanced
  3. Click Run as administrator
  4. Under Performance, click Settings
  5. Select the Advanced tab
  6. Select Change...
  7. Uncheck Automatically managing paging file size for all drives
  8. Then select Custom size and fill in the appropriate size
  9. Press Set then press OK then exit from the Virtual Memory, Performance Options, and System Properties Dialog
  10. Reboot your system

Note: I did not have the enough memory on my system for the ~282GB in this example but for my particular case this worked.

EDIT

From here the suggested recommendations for page file size:

There is a formula for calculating the correct pagefile size. Initial size is one and a half (1.5) x the amount of total system memory. Maximum size is three (3) x the initial size. So let's say you have 4 GB (1 GB = 1,024 MB x 4 = 4,096 MB) of memory. The initial size would be 1.5 x 4,096 = 6,144 MB and the maximum size would be 3 x 6,144 = 18,432 MB.

Some things to keep in mind from here:

However, this does not take into consideration other important factors and system settings that may be unique to your computer. Again, let Windows choose what to use instead of relying on some arbitrary formula that worked on a different computer.

Also:

Increasing page file size may help prevent instabilities and crashing in Windows. However, a hard drive read/write times are much slower than what they would be if the data were in your computer memory. Having a larger page file is going to add extra work for your hard drive, causing everything else to run slower. Page file size should only be increased when encountering out-of-memory errors, and only as a temporary fix. A better solution is to adding more memory to the computer.

5
  • 2
    @Azizbro I've gone back to the default now but just adjusted the values until out-of-memory error disappeared. Nov 24, 2019 at 4:33
  • 2
    I've done this and I still get MemoryError: Unable to allocate 10.3 PiB for an array with shape (38137754, 38137754) and data type float64 Dec 23, 2020 at 15:53
  • 1
    This works for me. But we'd better not allocate the virtual memory in the system desk (C:). Otherwise, it is slow when you restart your computer. [Don't say how do I know this, I have done it T.T T.T]
    – Hong Cheng
    Mar 3, 2021 at 7:43
  • 1
    @GeorgeAdams it wasn't working on my machine too. I'm working with vs code and noticed that i have to run it as an admin. this worked for me
    – slb20
    Jun 6, 2021 at 1:14
  • My C: do not have enough capacity. And unfortunately allocating more on D: does not solve this issue.
    – Jérémy
    Nov 3, 2021 at 13:27
43

I came across this problem on Windows too. The solution for me was to switch from a 32-bit to a 64-bit version of Python. Indeed, a 32-bit software, like a 32-bit CPU, can adress a maximum of 4 GB of RAM (2^32). So if you have more than 4 GB of RAM, a 32-bit version cannot take advantage of it.

With a 64-bit version of Python (the one labeled x86-64 in the download page), the issue disappears.

You can check which version you have by entering the interpreter. I, with a 64-bit version, now have: Python 3.7.5rc1 (tags/v3.7.5rc1:4082f600a5, Oct 1 2019, 20:28:14) [MSC v.1916 64 bit (AMD64)], where [MSC v.1916 64 bit (AMD64)] means "64-bit Python".

Sources :

5
  • 3
    How do I enter interpreter?
    – Shayan
    Sep 27, 2020 at 16:24
  • 2
    Solved my problem too. Using Pycharm. Uninstalled 32-bit version, reinstalled 64-bit one, changed project interpreter to the new 64-bit python.
    – Jia Gao
    Oct 5, 2020 at 5:32
  • 2
    @Shayan : to enter the interpreter, open a terminal (Win + R cmd) and type python -i.
    – kotchwane
    Feb 22, 2021 at 13:59
  • 7
    I have already Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32. My compute overall memory is 16GM and available is 4.4 GM, but I still have the problem.
    – Sht
    Jul 5, 2021 at 14:39
  • @Sht I also still have the problem.
    – bardulia
    Jun 12, 2023 at 15:14
12

In my case, adding a dtype attribute changed dtype of the array to a smaller type(from float64 to uint8), decreasing array size enough to not throw MemoryError in Windows(64 bit).

from

mask = np.zeros(edges.shape)

to

mask = np.zeros(edges.shape,dtype='uint8')
5

change the data type to another one which uses less memory works. For me, I change the data type to numpy.uint8:

data['label'] = data['label'].astype(np.uint8)
0

I faced the same issue running pandas in a docker contain on EC2. I tried the above solution of allowing overcommit memory allocation via sysctl -w vm.overcommit_memory=1 (more info on this here), however this still didn't solve the issue.

Rather than digging deeper into the memory allocation internals of Ubuntu/EC2, I started looking at options to parallelise the DataFrame, and discovered that using dask worked in my case:

import dask.dataframe as dd
df = dd.read_csv('path_to_large_file.csv')
...

Your mileage may vary, and note that the dask API is very similar but not a complete like to like for pandas/numpy (e.g. you may need to make some code changes in places depending on what you're doing with the data).

0

I was having this issue with numpy by trying to have image sizes of 600x600 (360K), I decided to reduce to 224x224 (~50k), a reduction in memory usage by a factor of 7.

X_set = np.array(X_set).reshape(-1 , 600 * 600 * 3)

is now

X_set = np.array(X_set).reshape(-1 , 224 * 224 * 3)

hope this helps

-1
from pandas_profiling import ProfileReport

prof = ProfileReport(df, minimal=True)
prof.to_file(output_file='output.html')

worked for me

1
  • Thanks for sharing suggestion. The input of function is Pandas DataFrame or matrix generated by Numpy? If DataFrame, what was the maximum row and column?
    – Cloud Cho
    Jun 1, 2023 at 21:20

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.